Yesterday I promised at least one sequel to my redux article in which I would actually discuss the “state of the art” of AFT, as I perceived it to be from discussions and presentations at the AA-FTT workshop. Well, here goes.
The state of the art of functional testing
Claim:
The state of the art of functional testing is to produce a domain-specific language (DSL) that can describe the high-level functions of your application in terms that are understood by all stakeholders in the project. This language will be used to describe and to execute your functional tests.
As an example, if you were writing an application to discover and monitor the performance of servers in your datacenter, a possible program in such a DSL might look like:
(Use the application to...)
Discover machines in domain "mydomain"
Assert size of domain "mydomain" is 5
Monitor CPU usage of machines in domain "mydomain"
Wait five minutes
For all machines 'machine' in domain "mydomain"
Assert CPUUsage of machine is > 0%
I wouldn’t want to be the one to have to implement that language, but it’s good enough to present the basic concepts. Specifically, the DSL is designed in terms of (a) what the application can do and (b) concepts in the business domain of the application.
Now, let us assume that it is expensive to design and implement a DSL. The question one would probably ask at this point is why we would undertake such an expense.
Recall that the primary stakeholders under consideration in this discussion are the developers, the testers (if they are considered to be a separate group), and the product owners.
What is the value to the product owners? Let us characterize the product owners as nonprogrammers who understand the business domain well. In this case, the DSL acts as a pidgin language” via which the PO is able to concisely communicate the requirements and related business concepts to the programmers and testers with minimal ambiguity. It’s not obvious to me at this juncture that this is a terrific argument — I imagine that there are cheaper ways of doing this, which would obviate the need to design a common language between the groups that is also executable. I’d be curious to hear people’s opinions on this. One interesting claim made at the workshop was that every functional testing project should be approached as an education for the developers.
Moving on, what is the value to the developers? If we assume that this language is executable, then the functional/acceptance tests can be executed with some regularity as a regression test suite. That one is pretty obvious. If the developers are also involved in implementing and maintaining the test suite, the ability to write tests at this level of abstraction makes it simpler to write tests, and also makes it much easier to maintain the tests.
Finally, what is the value to the testers, if a separate test team exists? If the testers are “test developers” i.e. if they are competent programmers, then they reap the same benefits as the developers would. If they are not so great at programming, then this pidgin language still presumably enables them to work with product owners to write test cases, because the specialized language is simple enough to grasp that they can throw the gist of the test cases together and then hand them off to the developers to polish up a bit. In the end, being able to automatically run regression tests written in this language frees up the testers to spend more time doing “exploratory testing” (i.e. trying their damnedest to break the product), which exposes bugs that might otherwise be found by customers after release.
So, really, the main benefits that the DSL is providing is that it is simultaneously:
- A medium of communication and arbitration between stakeholders,
- Executable,
- Enabling the necessary abstraction to permit sane maintenance of the executable tests.
Now, one thing that I immediately inferred from this analysis is that if your product owners are or were also competent programmers, your “DSL” might simply be a well-abstracted OO representation of your application in a vanilla programming language — such as the one that you are also using to write your application. That is, if requirement 1 is not so important, and all you need is 2 and 3, you don’t need a custom simplified syntax. If the PO is comfortable reading or writing tests in code, or if they are good at specifying automatable acceptance tests that could be described in such a way, then I don’t personally see the need to pull out the parser guns. You’ll still have to design your “language” by carefully considering what should go into the API of this application “robot”, but at least you won’t have to worry so much about making it read like English.
Also, note that this representation doesn’t describe at all what “seams” of the application are being used by the DSL implementation to drive its execution. That DSL snippet up there could be manipulating the application UI to produce the desired effects, or it could instead be talking to web services at the presentation layer that the UI also talks to in order to elicit the desired state changes in the app. This was sort of a point of contention at the workshop: I was really astonished to see how many people actually advocated the idea of testing through the UI. I think that only two participants put up fierce objections to this. What the opposition lacked in numbers, however, it made up for in volume and eloquence.
So what do the tools do?
It’s important to remember that the name of the group sponsoring this workshop strongly implies that it is focused on producing tools to enable AFT. So, if we claim that the current “holy grail” of AFT is to make every application under test “scriptable” through some DSL that we have to design ourselves, what the heck are the drivers and frameworks going to do for us?
The basic answer to that question, in my opinion, is “the easy and boring stuff” — which is fine by me. If it’s easy and boring, I’d rather that someone else do it for me.
So what are these easy and boring things?
- Providing difference execution “modes” for tests in your DSL, much like a debugger would. Examples of such modes include “normal speed” (you’re watching the test execute in front of you), “hyperspeed” (the test is running on a regression build), “interruptible” (runs at hyperspeed until a user-specified “breakpoint” is hit so that you can quickly get to a point where you want to do some exploratory testing).
- Implementing general mechanisms for test fixture setup/teardown (e.g. installing the app, seeding the database, tearing down the database between each fixture run).
- Associating certain tests with “database fixtures” — cooked datasets that you want installed right before the test run. (Managing and describing these data fixtures is a hard problem, and the topic has come up on the AA-FTT mailing list recently.)
- Tagging/grouping/labeling/organizing tests into different, possibly intersecting groups so that you can execute subsets of the tests as desired.
- Producing reports about the test runs.
- Given that you have implemented the atoms of your DSL, allowing you to write higher-level constructs in the DSL strictly in terms of these atoms without having to write any actual glue code.
That last one there is the most important, because it identifies what the framework won’t do for you. Your job, as the “test infrastructure” developer, is to decide upon the atoms in your DSL and to actually write glue code that will be invoked when the atoms in your DSL are evaluated.
For instance, let’s assume our datacenter monitoring application has exposed some of its core functionality as web services over SOAP, and that we’ve decided that one of the atoms in our DSL is going to be “discover a machine by IP address”. Further assume that there’s a web service in our application that does exactly this. We’d have to write some sort of code that we’d associate with the atom in question that would query this web service when the atom was evaluated, extract the salient results from the response, and pass those results back to the executing script. I tend to think of these atoms as the “terminals” in your DSL, although this is probably a bad vocab decision because now we’re starting to tie the idea of the DSL to a CFG. But there it is.
Once you’ve got a few of these atoms in place, though, you can start describing higher-level functions (the “nonterminals”) in terms of them without writing any glue code at all. For instance, given the existence of a “supply credentials” atom and a “discover machine” atom, we can write a function in terms of these atoms called “discover secure machine with credentials” that will compose these two operations. No glue code required!
Lots of frameworks like this already exist, and most of them seem to be centered on through-the-browser web testing. I expect that this is because UI manipulation and data extraction is almost always possible with HTML-based interfaces, even if it is unpleasant. However, if you’re trying to do the same thing with a Windows forms app that uses two or three third-party UI “enhancement” libraries that are completely opaque to any kind of inspection or driver tool, this isn’t nearly as easy. There are other frameworks like the Robot framework that don’t really care what driver you’re using, as long as you can talk to it through Python or Java. Some folks from the workshop (myself included) are getting together to compile an annotated compendium of these sorts of tools, so if you’re interested in this sort of thing I’ll be sharing a link soon enough.
Bits and pieces
My brain suddenly feels much less tense, and I am experiencing a sense of mild euphoria. This suggests to me that I’ve managed to expunge most of the information overload which I accrued over the past week or so. So, I think I’ve said most of what I needed to say here. There are still some bits and pieces of things that I’d like to mention, as well as one medium-sized idea that I’d like to work out in writing, so there will probably be a third part to this series that is loosely related to what I’ve just discussed.
As I mentioned in the first article, I’m sure I’ve misunderstood and misquoted all sorts of people and ideas, so if you have comments or criticism, I’d love to hear them. I suspect if these two redux posts were cleaned up (a lot) with the help of all those people out there who are way smarter than me, they might just turn into the sort of “introductory” resource that I was hunting for when I first started my “functional testing quest” for information a little less than two weeks ago.