« Archives in August, 2008

Bisogno = omg espresso

When I moved back to Toronto, it occurred to me that I’d never really _lived_ in the city before. I’ve thus resolved to be a tourist in my own city, because I hardly know it at all. So, I’ll be scribbling down brief blurbs about the places that really stick out in my mind as I stumble across them.

Bisogno was one of the first “holyshits” I had in Toronto. It’s a little coffee shop just south of Adelaide on Sherbourne, where I have taken to doing my weekly catch-up reading each Tuesday at noon. Von (?), the owner, takes it upon himself to learn all of his customer’s names, and to introduce them to one another. Very community oriented guy, it seems. The biggest seating section at Bisogno is communal as well — it’s a big, antique-looking dining table.

But the main attraction for me at Bisogno is the espresso. Three different kinds (of which the third is my favorite, although I can’t remember the name), all brewed such that there is a crema on top that looks like sweet beige magma. I’m certainly not an espresso connoiseur, but I’ve ordered hundreds of them now and this beats the snot out of anything that I even had in Florence during my month there.

(Edit: I think I remembered the name of my favourite espresso variety — 49th Parallel?)

Check it out (61 Sherbourne):

On BlogTO
On StopFinder

AA-FTT 2008 workshop redux, part 2

Yesterday I promised at least one sequel to my redux article in which I would actually discuss the “state of the art” of AFT, as I perceived it to be from discussions and presentations at the AA-FTT workshop. Well, here goes.

The state of the art of functional testing

Claim:

The state of the art of functional testing is to produce a domain-specific language (DSL) that can describe the high-level functions of your application in terms that are understood by all stakeholders in the project. This language will be used to describe and to execute your functional tests.

As an example, if you were writing an application to discover and monitor the performance of servers in your datacenter, a possible program in such a DSL might look like:

  (Use the application to...)
  Discover machines in domain "mydomain"
  Assert size of domain "mydomain" is 5
  Monitor CPU usage of machines in domain "mydomain"
  Wait five minutes
  For all machines 'machine' in domain "mydomain"
    Assert CPUUsage of machine is > 0%

I wouldn’t want to be the one to have to implement that language, but it’s good enough to present the basic concepts. Specifically, the DSL is designed in terms of (a) what the application can do and (b) concepts in the business domain of the application.

Now, let us assume that it is expensive to design and implement a DSL. The question one would probably ask at this point is why we would undertake such an expense.

Recall that the primary stakeholders under consideration in this discussion are the developers, the testers (if they are considered to be a separate group), and the product owners.

What is the value to the product owners? Let us characterize the product owners as nonprogrammers who understand the business domain well. In this case, the DSL acts as a pidgin language” via which the PO is able to concisely communicate the requirements and related business concepts to the programmers and testers with minimal ambiguity. It’s not obvious to me at this juncture that this is a terrific argument — I imagine that there are cheaper ways of doing this, which would obviate the need to design a common language between the groups that is also executable. I’d be curious to hear people’s opinions on this. One interesting claim made at the workshop was that every functional testing project should be approached as an education for the developers.

Moving on, what is the value to the developers? If we assume that this language is executable, then the functional/acceptance tests can be executed with some regularity as a regression test suite. That one is pretty obvious. If the developers are also involved in implementing and maintaining the test suite, the ability to write tests at this level of abstraction makes it simpler to write tests, and also makes it much easier to maintain the tests.

Finally, what is the value to the testers, if a separate test team exists? If the testers are “test developers” i.e. if they are competent programmers, then they reap the same benefits as the developers would. If they are not so great at programming, then this pidgin language still presumably enables them to work with product owners to write test cases, because the specialized language is simple enough to grasp that they can throw the gist of the test cases together and then hand them off to the developers to polish up a bit. In the end, being able to automatically run regression tests written in this language frees up the testers to spend more time doing “exploratory testing” (i.e. trying their damnedest to break the product), which exposes bugs that might otherwise be found by customers after release.

So, really, the main benefits that the DSL is providing is that it is simultaneously:

  1. A medium of communication and arbitration between stakeholders,
  2. Executable,
  3. Enabling the necessary abstraction to permit sane maintenance of the executable tests.

Now, one thing that I immediately inferred from this analysis is that if your product owners are or were also competent programmers, your “DSL” might simply be a well-abstracted OO representation of your application in a vanilla programming language — such as the one that you are also using to write your application. That is, if requirement 1 is not so important, and all you need is 2 and 3, you don’t need a custom simplified syntax. If the PO is comfortable reading or writing tests in code, or if they are good at specifying automatable acceptance tests that could be described in such a way, then I don’t personally see the need to pull out the parser guns. You’ll still have to design your “language” by carefully considering what should go into the API of this application “robot”, but at least you won’t have to worry so much about making it read like English.

Also, note that this representation doesn’t describe at all what “seams” of the application are being used by the DSL implementation to drive its execution. That DSL snippet up there could be manipulating the application UI to produce the desired effects, or it could instead be talking to web services at the presentation layer that the UI also talks to in order to elicit the desired state changes in the app. This was sort of a point of contention at the workshop: I was really astonished to see how many people actually advocated the idea of testing through the UI. I think that only two participants put up fierce objections to this. What the opposition lacked in numbers, however, it made up for in volume and eloquence.

So what do the tools do?

It’s important to remember that the name of the group sponsoring this workshop strongly implies that it is focused on producing tools to enable AFT. So, if we claim that the current “holy grail” of AFT is to make every application under test “scriptable” through some DSL that we have to design ourselves, what the heck are the drivers and frameworks going to do for us?

The basic answer to that question, in my opinion, is “the easy and boring stuff” — which is fine by me. If it’s easy and boring, I’d rather that someone else do it for me.

So what are these easy and boring things?

  • Providing difference execution “modes” for tests in your DSL, much like a debugger would. Examples of such modes include “normal speed” (you’re watching the test execute in front of you), “hyperspeed” (the test is running on a regression build), “interruptible” (runs at hyperspeed until a user-specified “breakpoint” is hit so that you can quickly get to a point where you want to do some exploratory testing).
  • Implementing general mechanisms for test fixture setup/teardown (e.g. installing the app, seeding the database, tearing down the database between each fixture run).
  • Associating certain tests with “database fixtures” — cooked datasets that you want installed right before the test run. (Managing and describing these data fixtures is a hard problem, and the topic has come up on the AA-FTT mailing list recently.)
  • Tagging/grouping/labeling/organizing tests into different, possibly intersecting groups so that you can execute subsets of the tests as desired.
  • Producing reports about the test runs.
  • Given that you have implemented the atoms of your DSL, allowing you to write higher-level constructs in the DSL strictly in terms of these atoms without having to write any actual glue code.

That last one there is the most important, because it identifies what the framework won’t do for you. Your job, as the “test infrastructure” developer, is to decide upon the atoms in your DSL and to actually write glue code that will be invoked when the atoms in your DSL are evaluated.

For instance, let’s assume our datacenter monitoring application has exposed some of its core functionality as web services over SOAP, and that we’ve decided that one of the atoms in our DSL is going to be “discover a machine by IP address”. Further assume that there’s a web service in our application that does exactly this. We’d have to write some sort of code that we’d associate with the atom in question that would query this web service when the atom was evaluated, extract the salient results from the response, and pass those results back to the executing script. I tend to think of these atoms as the “terminals” in your DSL, although this is probably a bad vocab decision because now we’re starting to tie the idea of the DSL to a CFG. But there it is.

Once you’ve got a few of these atoms in place, though, you can start describing higher-level functions (the “nonterminals”) in terms of them without writing any glue code at all. For instance, given the existence of a “supply credentials” atom and a “discover machine” atom, we can write a function in terms of these atoms called “discover secure machine with credentials” that will compose these two operations. No glue code required!

Lots of frameworks like this already exist, and most of them seem to be centered on through-the-browser web testing. I expect that this is because UI manipulation and data extraction is almost always possible with HTML-based interfaces, even if it is unpleasant. However, if you’re trying to do the same thing with a Windows forms app that uses two or three third-party UI “enhancement” libraries that are completely opaque to any kind of inspection or driver tool, this isn’t nearly as easy. There are other frameworks like the Robot framework that don’t really care what driver you’re using, as long as you can talk to it through Python or Java. Some folks from the workshop (myself included) are getting together to compile an annotated compendium of these sorts of tools, so if you’re interested in this sort of thing I’ll be sharing a link soon enough.

Bits and pieces

My brain suddenly feels much less tense, and I am experiencing a sense of mild euphoria. This suggests to me that I’ve managed to expunge most of the information overload which I accrued over the past week or so. So, I think I’ve said most of what I needed to say here. There are still some bits and pieces of things that I’d like to mention, as well as one medium-sized idea that I’d like to work out in writing, so there will probably be a third part to this series that is loosely related to what I’ve just discussed.

As I mentioned in the first article, I’m sure I’ve misunderstood and misquoted all sorts of people and ideas, so if you have comments or criticism, I’d love to hear them. I suspect if these two redux posts were cleaned up (a lot) with the help of all those people out there who are way smarter than me, they might just turn into the sort of “introductory” resource that I was hunting for when I first started my “functional testing quest” for information a little less than two weeks ago.

AA-FTT 2008 workshop redux, part 1

It has been quite a while since I started practicing test-driven development as a rule rather than as an exception. It was certainly the agile practice that was easiest and most pleasant for me to adopt, and it aligns nicely with the musings on quality in the grander scheme of things that I’ve been having for a long time.

However, only recently have I become very interested in testing with coarser granularity. As such, I’ve been chewing over the concepts of integration testing/acceptance testing/functional testing, or whatever else you’d like to call it. The resources in this area are much spottier than that for plain old TDD, and so I’ve been mucking about mailing lists and blogs and such trying to grok the common opinions on best practices in these fields.

Luckily, the Agile Alliance Functional Testing Tools group held a free workshop as part of Agile2008 that is happening right here in Toronto this week, and I found out about it and was able to register about one hour before the advertised deadline. The workshop ran from 8:30am to about 6pm today, and I met so many incredibly creative and well-spoken people and had my brain rammed so full of ideas that I came home and burned through about five pages of braindumpage in addition to the eight pages of notes that I took at the workshop itself.

My goal in attending this workshop was to get a rough idea of what the “state of the art” of functional testing is, and where it is headed. This article — and its planned sequels — are an attempt to summarize what I was able to learn about this today. I’ve almost certainly made all sorts of errors in understanding or recording these ideas, so hopefully people will read this and lambaste me for it so that I can learn for free — which is my favorite pastime.

The modern values of testing

In one of the lightning talks this morning, the presenter expressed his idea of what the modern tenets or ‘values’ when developing functional tests. While this probably felt like preaching to the choir for many of the attendees, it certainly serves as an excellent starting point for this article, because these tenets guide the methodologies that we are trying to enable by building testing tools. These tenets, in no particular order, are:

  • Use functional tests to specify/encode your requirements.
  • Continuously test the software product as it is built.
  • The entire team takes ownership of testing the product.

At first blush, this sounds a lot like what a TDD zealot would mutter over and over to herself in her sleep, minus the “functional” part. However, it is important to note that…

Functional testing is not the same as TDD

Simply put, even if you are writing your functional or acceptance tests in code before you write the code to make those tests pass, FTDD (or any sort of functional testing in an agile context — let’s call it AFT, shall we?) is a different beast than TDD in many different ways. I think that many of these differences were stated as reasons for why AFT is considered by many to have failed, especially in comparison to TDD.

The first obvious difference between TDD and AFT is that it’s much easier to learn about TDD. The literature is more plentiful and there is less conflict and contradiction between sources in the field.

Another difference stated today is that AFT does not produce the pleasant “cadence” that TDD gives you, which is a large part of what makes TDD so addictive. TDD is “test a little, code a little, see the green bar, feel warm and fuzzy, refactor a bit, and do it again.” AFT requires much more up-front effort to write the test, and you won’t see the result of the test execution for quite a while after you’ve written it. If we presuppose that humans aren’t especially good with delayed gratification, then this could definitely be a
source of friction.

The last difference that I’d like to enumerate here is that the value of a functional test is much more “dispersed” than that of a developer or unit test. In TDD, it is generally true that the developer who writes a unit test will directly receive his ROI in a very short time, right about at the moment when he sees that green bar. With a functional test, the person who spends the effort to write the test may not reap most of the “psychological” and the real value of the test. For example, if I am a dedicated tester and I spend an afternoon writing a functional test that is then used by a developer as a measure of acceptance, the tester gets little of the feeling of satisfaction of being finished. If this same test later catches a bug during a nightly regression testing run that might otherwise have taken a couple of days to detect and unearth, it is the developer’s time who has been saved, not the tester’s. Sure, it’s all about the net value of the test to the team, but that can be a hard sell to our hormones sometimes.

This last point is the most important one, I think, because it makes it very clear that functional tests are a shared artifact across multiple stakeholder groups (product owner, developer team, dedicated testing team if you have one) much moreso than developer test suites are. I think that this explains why the current state of the art of functional testing has taken the form that it has.

However, this article is already getting longish, so I will defer an actual description of what I understood the state of the art to be for another article which I intend to write tomorrow.