Referrals
FreeAgent Small Business Online Accounting
I can honestly recommend Freeagent for painless small business bookkeeping. I can also honestly say that I get a referral bonus every time someone signs up using this link

Entries in c++ (6)

Friday
May272011

Unit Testing in C++ and Objective-C just got ridiculously easier still

Spider web in morning sun

'Spider Web in Morning Sun' by Rob van Hilten

In my previous post I introduced Catch - my unit testing framework for C++ and Objective-C.

The response was overwhelming. Thanks to all who commented, offered support - and even contributed to the code with fixes and features.

It certainly gave me the motivation to continue active development and a lot has changed since that post. I'm going to cover some highlights, but first I want to focus on what has been one of the most distinguishing features of Catch that has attracted so much attention - and how I have not rested but made that even better!

How easy is easy enough?

Back in April I gave a five minute lightning talk on Catch at the ACCU conference in Oxford (I highly recommend the conference). With just five minutes to talk about what makes Catch special what was I going to cover? The natural operator-based comparison syntax? The use of Sections instead of class-based fixtures? Data generators?

Well I did touch on the first point. But I decided to use the short amount of time to drive home just how quickly and easily you can get up and running with Catch. So after a 30 second intro I went to the GitHub page for Catch (now aliased as catch-test.net), downloaded the zip of the source (over a 3G connection), unzipped and copied to a central location, fired up XCode, started a fresh C++ project, added the path to Catch's headers, #include'd "catch_with_main.hpp", wrote an anonymous test case, compiled and ran it, demonstrated how it caught a bug, fixed the bug and finally recompiled and re-ran to see the bug go away.

Phew! Not bad for five minutes, I thought. And from the feedback I got afterwards it really did drive the point home.

Compare that with my first experience of using Google Test. It took me over an hour to get it downloaded and building in XCode (the XCode projects don't seem to have been maintained recently - so perhaps that is a little unfair). There are other frameworks that I've tried where I have just run out of patience and never got them going.

Of course I'm biased. But I have had several people tell me that they tried Catch and found it to be the easiest C++ Unit Test framework they have used.

But still I wasn't completely satisfied with the initial experience and ease of incorporating Catch into your own projects.

In particular, if you maintain your own open source project and want to bundle it with a set of unit tests (and why wouldn't you?) then it starts to get fiddly. Do you list Catch as an external dependency that the user must install on their own? (no matter how easy they are to install external dependencies are one or my least favourite things). Do you include all the source to Catch directly in your project tree? That can get awkward to maintain and makes it look like your project is much bigger than it is. If you host your project on GitHub too (or some other Git based repository) you could include Catch as a submodule. That's still not ideal, has some of the problems of the first two options, and is not possible for everyone.

There can be only one

Since Catch, as a library, is fully header-only I decided provided a single header version that is ideal for direction inclusion in third-party projects.

How did I do this?

Go on guess.

Did you guess that I wrote a simple Python script to partially preprocess the headers so that the #includes within the library are expanded out (just once, of course), leaving the rest untouched?

If you did you're not far off. Fortunately some of the conventions I have used within the source meant I could drastically simplify the script. It doesn't need to be a full C preprocessor. It only needs to understand #include and #ifndef/#endif for include guards. Even those are simplified. The whole script is just 42 lines of code. 42 always seems to be the answer.

The result is https://github.com/philsquared/Catch/blob/master/single_include/catch.hpp

I see no reason why this should not be the default way to use Catch - unless you are developing Catch itself. So I'm now providing this file as a separate download from within GitHub. Think of it as the "compiled" header. The lib file of the header-only world.

Licence To Catch

But Open Source is a quagmire of licensing issues, isn't it?

Well it certainly can be. Those familiar with GPL and similar open source licences may be very wary of embedding one open source library (Catch) within another (their own).

IANAL but my understanding is that, contrary to what might seem intuitive, source code with no license at all can be more dangerous, legally speaking, than if it does have one (and if you thought that sentence was difficult to parse you should try reading a software license).

So Catch is licensed. I've used the Boost license. For a number of reasons:

  • It is very permissive. In particular it is not viral. It explicitly allows the case of including the source of Catch along with the distribution of your own source code with no requirements on your own code
  • It's been around for a while now - long enough, I think, that most people are comfortable with it. I work with banks, who can be very nervous about software licensing issues - especially open source. But every one I have worked at has already got Boost through it's compliance process. I'm hoping that will ease any barriers to adoption.
  • I'm familiar with Boost, know many of it's contributors personally, and generally trust the spirit of the licence. Boost itself is a very well known and highly respected set of libraries - with very widespread adoption. A large part of Boost is in header-only libraries and people are already comfortable including them in their own projects.

So what's the Catch? The catch is that I retain the right to keep using that joke - well beyond it's humorous lifetime.

The important bit:

In short: any open source author who wants to use Catch to write unit tests for their own projects should feel very free to do so and to include the single-header (or full) version of the library in their own repository and along with their distribution.

That fully applies to commercial projects too, of course.

What else?

Here's a quick run down of some of the other changes and features that have gone in:
  • Single evaluation of test expressions. The original implementation evaluated the expression being tested twice - once to get the result, and then again to get the component values. There were some obstacles to getting this to work whilst only evaluating the expression once. But we got there in the end. This is critical if you want to write test expressions that have side-effects.
  • Anonymous test cases. A little thing, but I find them really handy when starting a new project or component and I'm just exploring the space. The idea is that you don't need to think of a name and description for your test - you can just dive straight in and write code. If you end up with something more like a test case it's trivial to go back and name it.
  • Generators. These are in but not fully tested yet. Consider them experimental - but they are very cool and very powerful.
  • Custom exception handlers. (C++) Supply handlers for your own exception types - even those that don't derive from std::exception, so you can report as much detail as you like when an exception is caught within Catch. I'm especially pleased this went in - given the name of the library!
  • Low build time overhead. I've been aggressive at keeping the compile-time footprint to a minimum. This is one of the concerns when using header only libraries - especially those with a lot of C++ templates. Catch uses a fair bit of templates, but nothing too deeply recursive. I've also organised the code so that as much as the implementation as possible is included in only one translation unit (the one with main() or the test runner). I think you'll be pushed to notice any build-time overhead due to Catch.
  • Many fixes, refactorings and minor improvements. What project doesn't have them? This is where a lot of the effort - possibly the majority - has gone, though. I've wanted to keep the code clean, well factored, and the overhead low. I've also wanted it to be possible to compile at high warning levels without any noise from Catch. This has been challenging at times - especially after the Single Evaluation work. If you see any Catch-related warnings please let me know.

Are we there yet?

As well as my own projects I've been using Catch on a large scale project for a bank. I believe it is already more than just a viable alternative to other frameworks.

Of course it will continue to be refined. There are still bugs being found and fixed.

But there are also more features to be added! I need to finish the work on generators. I'd like to add the tagging system I've mentioned before. I need to look at Matchers. Whether Catch provides its own, or whether I just provide the hooks for a third-party library to be integrated, I think Matchers are an important aspect to unit testing.

I also have a stub project for an iPhone test runner - for testing code on an iOS device. Several people have expressed an interest in this so that is also on my list.

And, yes, I will fill out the documentation!

Tuesday
Dec282010

Unit Testing in C++ and Objective-C just got easier

Day 133-365 : Catching the bokeh.jpg

Back in May I hinted that I was working on a unit testing framework for C++. Since then I've incorporated the technique that Kevlin Henney proposed and a whole lot more. I think it's about time I introduced it to the world:

Introducing CATCH

CATCH is a brand new unit testing framework for C, C++ and Objective-C. It stands for 'C++ Adaptive Test Cases in Headers', although that shouldn't downplay the Objective-C bindings. In fact my initial motivation for starting it was dissatisfaction with OCUnit.

Why do we need another Unit Testing framework for C++ or Objective-C?

There are plenty of unit test frameworks for C++. Not so many for Objective-C - which primarily has OCUnit (although you could also coerce a C or C++ framework to do the job).

They all have their strengths and weaknesses. But most suffer from one or more of the following problems:

  • Most take their cues from JUnit, which is unfortunate as JUnit is very much a product of Java. The idiom-mismatch in C++ is, I believe, one of the reasons for the slow uptake of unit testing and TDD in C++.
  • Most require you to build libraries. This can be a turn off to anyone who wants to get up and running quickly - especially if you just want to try something out. This is especially true of exploratory TDD coding.
  • There is typically a certain amount of ceremony or boilerplate involved. Ironically the frameworks that try to be faithful to C++ idioms are often the worst culprits. Eschewing macros for the sake of purity is a great and noble goal - in application development. For a DSL for testing application code, especially since preprocessor information (e.g. file and line number) are required anyway) the extra verbosity seems too high a price to pay to me.
  • Some pull in external dependencies
  • Some involve a code generation step

The list goes on, but these are the criteria that really had me disappointed in what was out there, and I'm not the only one. But can these be overcome? Can we do even better if we start again without being shackled to the ghost of JUnit?

What's the CATCH?

You may well ask!

Well, to start, here's my three step process for getting up and running with CATCH:

  1. Download the headers from github into subfolder of your project
  2. #include "catch.hpp"
  3. There is no step 3!

Ok, you might need to actually write some tests as well. Let's have a look at how you might do that:

[Update: Since my original post I have made some small, interface breaking, changes - for example the name of the header included below. I have updated this post to reflect these changes - in case you were wondering]

#include "catch_with_main.hpp"

TEST_CASE( "stupid/1=2", "Prove that one equals 2" )
{
    int one = 2;
    REQUIRE( one == 2 );
}

Short and to the point, but this snippet already shows a lot of what's different about CATCH:

  • The assertion macro is REQUIRE( expression ), rather than the, now traditional, REQUIRE_EQUALS( lhs, rhs ), or similar. Don't worry - lhs and rhs are captured anyway - more on this later.
  • The test case is in the form of a free function. We could have made it a method, but we don't need to
  • We didn't name the function. We named the test case. This frees us from couching our names in legal C++ identifiers. We also provide a longer form description that serves as an active comment
  • Note, too, that the name is hierarchical (as would be more obvious with more test cases). The convention is, as you might expect, "root/branch1/branch2/.../leaf". This allows us to easily group test cases without having to explicitly create suites (although this can be done too).
  • There is no test context being passed in here (although it could have been hidden by the macro - it's not). This means that you can freely call helper functions that, themselves, contain REQUIRE() assertions, with no additional overhead. Even better - you can call into application code that calls back into test code. This is perfect for mocks and fakes.
  • We have not had to explicity register our test function anywhere. And by default, if no tests are specified on the command line, all (automatically registered) test cases are executed.
  • We even have a main() defined for us by virtue of #including "catch_with_main.hpp". If we just #include that in one dedicated cpp file we would #include "catch.hpp' in our test case files instead. We could also write our own main that drives things differently.

That's a lot of interesting stuff packed into just a few lines of test code. It's also got more wordy than I wanted. Let's take a bit more of a tour by example.

Information is power

Here's another contrived example:

TEST_CASE( "example/less than 7", "The number is less than 7" )
{
    int notThisOne = 7;

    for( int i=0; i < 7; ++i )
    {
        REQUIRE( notThisOne > i+1  );
    }
}

In this case the bug is in the test code - but that's just to make it self contained. Clearly the requirement will be broken for the last iteration of i. What information do we get when this test fails?

    notThisOne > i+1 failed for: 7 > 7

(We also get the file and line number, but they have been elided here for brevity). Note we get the original expression and the values of the lhs and rhs as they were at the point of failure. That's not bad, considering we wrote it as a complete expression. This is achieved through the magic of expression templates, which we won't go into the details of here (but feel free to look at the source - it's probably simpler than you think).

Most of the time this level of information is exactly what you need. However, to keep the use of expression templates to a minimum we only decompose the lhs and rhs. We don't decompose the value of i in this expression, for example. There may also be other relevant values that are not captured as part of the test expression.

In these cases it can be useful to log additional information. But then you only want to see that information in the event of a test failure. For this purpose we have the INFO() macro. Let's see how that would improve things:

TEST_CASE( "example/less than 7", "The number is less than 7" )
{
    int notThisOne = 7;

    for( int i=0; i < 7; ++i )
    {
        INFO( "i=" << i );
        REQUIRE( notThisOne > i+1  );
    }
}

This gives us:

    info: 'i=6'
    notThisOne > i+1 failed for: 7 > 7

But if we fix the test, say by making the for loop go to i < 6, we now see no output for this test case (although we can, optionally, see the output of successful tests too).

A SECTION on specifications

There are different approaches to unit testing that influence the way the tests are written. Each approach requires a subtle shift in features, terminology and emphasis. One approach is often associated with Behaviour Driven Development (BDD). This aims to present test code in a language neutral form - encouraging a style that reads more like a specification for the code under test.

While CATCH is not a dedicated BDD framework it offers a several features that make it attractive from a BDD perspective:

  • The hiding of function and method names, writing test names and descriptions in natural language
  • The automatic test registration and default main implementation eliminate boilerplate code that would otherwise be noise
  • Test data generators can be written in a language neutral way (not fully implemented at time of writing)
  • Test cases can be divided and subdivided into SECTIONs, which also take natural language names and descriptions.

We'll look at the test data generators another time. For now we'll look at the SECTION macro.

Here's an example (from the unit tests for CATCH itself):

TEST_CASE( "succeeding/Misc/Sections/nested", "nested SECTION tests" )
{
    int a = 1;
    int b = 2;
    
    SECTION( "s1", "doesn't equal" )
    {
        REQUIRE( a != b );
        REQUIRE( b != a );

        SECTION( "s2", "not equal" )
        {
            REQUIRE_FALSE( a == b);
        }
    }
}

Again, this is not a great example and it doesn't really show the BDD aspects. The important point here is that you can divide your test case up in a way that mirrors how you might divide a specification document up into sections with different headings. From a BDD point of view your SECTION descriptions would probably be your "should" statements.

There is more planned in this area. For example I'm considering offering a GIVEN() macro for defining instances of test data, which can then be logged.

In Kevlin Henney's LHR framework, mentioned in the opening link, he used SPECIFICATION where I have used TEST_CASE, and PROPOSITION for my top level SECTIONs. His equivalent of my nested SECTIONs are (or were) called DIVIDERs. All of the CATCH macro names are actually aliases for internal names and are defined in one file (catch.hpp). If it aids utility for BDD or other purposes, the names can be aliased differently simply by creating a new mapping file and using that.

CATCH up

There is much more to cover but I wanted to keep this short. I'll follow up with more. For now here's a (yet another) list of some of the key features I haven't already covered:

  • Entirely in headers
  • No external dependencies
  • Even test fixture classes and methods are self registering
  • Full Objective-C bindings
  • Failures (optionally) break into the interactive debugger, if available
  • Floating point tolerances supported in an easy to use way
  • Several reporter classes included - including a JUnit compatible xml reporter. More can be supplied

Are there any features that you feel are missing from other frameworks that you'd like to see in CATCH? Let me know - it's not too late. There are some limiting design goals - but within those there are lots of possibilities!

Friday
May212010

The Ultimate C++ Unit Test Framework

Last night I saw Kevlin Henney's ACCU London presentation on Rethinking Unit Testing In C++. I had been looking forward to this talk for a while as I had started working on my own C++ unit test framework. I had not been satisfied with any of the other frameworks I found, so decided to write my own. I had a few guiding principles that I felt I had come through on:

  1. I wanted to capture more information than usual. I felt I could capture the expression under test, as written, as well as the important values (that is the values on the LHS and RHS of binary expressions, or just the value for unary expressions).
  2. I wanted the test expressions to be natural C++ syntax. That is I wanted comparisons to use operators such as ==, instead of a macro like ASSERT_EQUALS.
  3. I wanted automatic test registration and descriptive test names. Tests should be implementable as functions or methods.

I called my framework YACUTS (Yet Another C++ Unit Test System) and a typical test looks something like:

YACUTS_FUNCTION( testThatSomethingDoesSomethingElse )
{
	MyClass myObj;
	myObj.setup1();

	ASSERT_THAT( myObj.someValue() ) == CAP( 7 );
	ASSERT_THAT( myObj.someOtherValue() ) == CAP( myObj.someValue() + 3 );
}

If someValue() returned 7 and someOtherValue() returned 11 I'd get a result like:

testThatSomethingDoesSomethingElse failed in expression myObj.someOtherValue() == ( myObj.someValue() + 3 ).
myObj.someOtherValue() = 11, but ( myObj.someValue() + 3 ) = 10

Which I thought was pretty good. I didn't really like the way the expression had to be broken up between the two macros, but thought it a reasonable price to pay for such an unprecedented level of expressiveness. I did think about whether expression templates could help - but didn't see a way around it.

So I sat up straight when Kevlin showed how he'd achieved the same goals with something like the following:

SPECIFICATION( something_that_does_something )
{
	MyClass myObj;
	myObj.setup1();

	PROPOSITION( "values are 7 and 7+3" )
	{
		IS_TRUE( myObj.someValue() == 7 );
		IS_TRUE( myObj.someOtherValue() == myObj.someValue() + 3 );
	}
}

What I'm focusing on here is how he pulled off his IS_TRUE macro. You pass it a complete expression and it decomposes it such that you get the values of LHS and RHS, the original expression as a string, and the evaluated result - but without any additional syntax!

The details of how he achieved this are too much to go into here - and I don't remember them sufficiently to do a good job anyway. But the core trick he used to be able to "grab the first value" as he put it (and from there it's just a case of overloading operators to get those and the RHS) was to create a capturing mechanism involving the ->* operator. The reason this is significant is twofold: (1) ->* happens to have the highest precedence of the (overloadable) operators and (2) nobody else uses it (well, almost). As a result it can introduce an expression that captures everything up to the next operator (at the same level).

There is more going on here too, which is interesting. Kevlin spent most of his talk building up to the idea of "test cases" being "propositions" in a "specification". The end result is something that grammatically encourages a more declarative, specification driven, flow of assertions. His mechanism also allows him to use strings as test names (or proposition names) and to declare specification scoped variables without the need to setup a class (a specification is really a function). As well as the slight shift in emphasis it also drops a small amount of ceremony, and so is a welcome technique.

Interestingly, although I hadn't fully implemented it, I had experimented with using strings as test names too, even though my unit of test case is still the function. My mechanism was to completely generate and hide the function name, but use the string to pass to the auto registration function. However to reuse state between tests I still had to declare a class (although tests could be functions or methods), so Kevlin's approach is still an improvement here.

What interested me most was perhaps not what Kevlin did that I couldn't (although that is very interesting). But rather how remarkably similar the rest of the code looked to mine! I know that Sam Saariste has also worked on similar ideas - and Jon Jagger was having some thoughts in the same direction (although not as far I think). It seems we were all converging on not just the same goals but, to a large extent, the same implementation! Given that we were already off the beaten track that reassures me that there is a naturalness to this progression that transcends our own efforts.

Having said all that I think I prefer my name, "Yacuts" over Kevlin's LHR (for London Heathrow, the environs of which most of his work was conceived) :-)

UPDATE:
I have since fleshed out my framework - now called CATCH - and posted an entry on it.

Wednesday
Nov112009

Code formatting in C++ Part Three

In this article I am going to present my recommendation for a C++ code formatting style (although it applies to most free-formatted languages, especially those that are C/C++ like).

I have covered the background to most of my choices in some detail (some would say too much - but I invoke Blaise Pascal here) in the first two articles of this series, the rather consistently named:

Code formatting in C++ Part One
Code formatting in C++ Part Two
Since the style I am about to present is a little unusual in places, and arbitrary in others, I encourage you to take a look a the previous articles if you have not already done so. Also, where arbitrary looking numbers are used, follow the spirit of the rule rather than the letter (or, in this case, number).

Page width

The proposals below refer often to page width, and by this I mean the number of characters that you would normally expect to be visible while reading and writing code in an editor. For example, it used to be common to keep within 80 characters (or less) due to text mode screen sizes. These days windows can easily be sized to much greater character widths, but I would still recommend adopting a page width of between 80-100 characters. It is not a hard limit, although it is more important in some areas than others. Personally I still try to stick to 80.

Proposal 1: Formatting variable declaration blocks

char*          txt = "hello";
int            i = 7;
std::string    txt2 = "world";
std::vector<std::string>            v;
std::map<std::string, std::string>  m;

Variable declarations should be grouped together where possible (without violating the principle of locality - ie, keeping them close to first use) in "islands" of no more than 16 lines at a time. If there are more than 16 variable declarations in a group then use single lines of whitespace to break them up. Try to keep variables of similar length in the same block.

Within each block, align the variable names as much as possible. Where there is a large variation in type name length, sub-group longer names and shorter names together and align variable names in sub-group blocks instead (note how the vector and map, above, are separated out this way)

This proposal applies both in function body code and within class declarations (and at global scope, if you must).

Proposal 2: Formatting function signatures

Function signatures come in two forms, and we make a distinction here. The first form is the prototype, usually found in a header file, if at all. The second form is part of the definition and is followed by the function body (if applying this to a language without the separate prototype stage, the first form does not exist, of course). We shall start with that:

Function definition signatures

////////////////////////////////////////////////////////////////////////////////
void ClassName::MethodName
(
	char*          txt = "hello",
	int            i = 7,
	std::string    txt2 = "world",
	std::vector<std::string>            v,
	std::map<std::string, std::string>  m
)
const
{
	// ... method body
}

The example here is for a method of a class, but the formatting would be the same for a free function

The return value and function or method name (along with any modifier prefixes - e.g. static, or namespace prefixes) appear on their own line.
Next are the parentheses - both of which appear on their own line - indented to the same level as the preceding line.
Within the parentheses, each on their own lines, are the arguments - formatted according to Proposal 1. Any post-fix modifiers (just const, in this case) appear on their own line, followed by the function or method body.

This is almost certainly the most controversial proposal and I will take up my additional rationale in the next article

If a comment block does not already precede the signature, use a line of forward slashes for about a page width (e.g. I run them up to the 80th column).

An additional point worth mentioning here is that this style lends itself well to being used with the Doxygen "inline comment" method of documenting function and method arguments.

Function prototype signatures

void MethodName
	(	char*          txt,
		int            i,
		std::string    txt2,
		std::vector<std::string>            v,
		std::map<std::string, std::string>  m ) const;

If a separate prototype is required there are some differences to the formatting. This might seem a little odd but I'll provide the rationale in the following article.

First, the parentheses appear in-line with the arguments block, rather than on their own lines. Furthermore the whole block itself is indented with respect to the function name. Finally, any post-fix modifiers (which may include the pure virtual marker here) appear on the same line as the closing parenthesis.

Note that no line of comment characters precedes the signature. Ideally functions and methods would be fully documented at the implementation site and the documentation extracted from comments using a tool such as Doxygen. There are reasons to consider keeping the prototypes clear of too many comments, but obviously you can put them here if you are sure that is best for you

Proposal 3: Function calls

If a function call fits within a normal page width then write it on one line. Long lines should be split across lines according to one of the following two examples:
LongMethodCall1( "some text",
                  aString,
                  anInt,
                  anotherArgument );
string returnVal = ReallyLongMethodNameCall
		( "some text",
		  aString,
		  anInt,
		  anotherArgument );

In both cases the arguments are aligned, one line each, with parentheses on the first and last lines. The first line should share the line with the function or method name itself, unless that would push the argument list across such that any of the arguments end up beyond the page width

General Principles

The proposals above are deliberately narrow in focus, concentrating on those areas that are often left out of standards, or not sufficiently described, and where using an ad-hoc approach is often less than satisfactory. However there are some simple emergent themes that can be carried through to other areas of code:

  • Types and identifier names are separated into columns through alignment
  • Code is kept within a page width where possible. This is especially significant for code that you need to refer to at a glance, such as function prototypes - and is often where it is most overlooked!
  • For "structural" code, such as function signatures, consistency is especially important, even in places where it seems unnecessary (e.g. splitting short function signatures across multiple lines - even empty constructors). For implementation code the choice of when to split can be guided by the page width.

I have made some recommendations in these proposals that are not specifically backed by the discussion in the previous articles. I will attempt to cover these in the next, and final, article in this series.

Monday
Oct052009

Code formatting in C++ Part Two

In the mid nineties I worked at Dr. Solomon's on their Anti-Virus toolkit. I spent some time in the virus labs working with live viruses (which I am told is the correct pluralisation). In those days viruses were mostly DOS based and attached themselves either to an exe image or a disk boot sector. Dr. Solomon's had their own scripting language for describing how a particular virus was identified, and then how it should be removed. This was significant as viruses already had plenty of sophistication with encryption and poly-morphism (each infection looked different). So the guys in the lab would write code in this scripting language on a regular basis.

One thing I noticed as I did my stint in the labs was that the guys who worked there all the time didn't use any indentation. While the script was not really procedural, it did have sections and block scopes, and yet these were never being highlighted in the textual layout of the code. There was nothing in the language that prevented this, so when I wrote my own scripts I indented as I thought best and happily showed my code to the head of the lab. His appraisal?:

We don't use indentation here

I was shocked! Why would you deliberately hide the structure of the code when there was virtually no overhead in bringing it out?
Of course I knew that different people have different ideas about code formatting, but I hadn't come across such an extreme case before.

As my career progressed I learned more and more that the subject of code formatting was very delicate. Developers may grudgingly adopt a "house style" for the sake of consistency (or, increasingly commonly, just adopt the style of the source file they are editing at the time). But ask them to change what they think is best and you'll be lucky to walk away with all your teeth!

Despite this I did pay attention to my own formatting style. Rather than stick to what I'd always done, if I saw a new style I questioned myself on whether there was anything about it that gave it an advantage. If so I adopted the style. For example, when I started out I used the common style of placing spaces on the outside of parentheses, like so:

if (condition==expected)
{
    doSomething (argument);
}

Note the space before the opening (.

Then I saw someone who put the spaces on the inside:

if( condition==expected )
{
    doSomething( argument );
}

This looked really strange and I wondered why he was so keen to depart from the norm. But after a while I realised that, for me at least, I find the second version easier to read. The difference was only slight, of course (or so I thought), but I found that if I was looking at a screenful of code and needed to home in on the interesting bits, having the spaces on the inside of the parentheses helped those parts of the code to come out of the screen at me. Logically, the parentheses belong to the function or keyword preceding it, whereas the arguments or expressions passed in where external and varied independently - so the use of whitespace captured that relationship.
At least that's how I see it.

After a few more years I began to think about whether any sort of objective metrics could be extracted on what aspects of code formatting style enhanced readability - independently of an individual's "preferred" (ie, existing, often ground in) style.

If I could find any such metrics or recommendations they would, I surmised, need to satisfy the following requirements:

They would derive from objective sources that are ideally not connected with software development
They wouldn't necessarily follow my own existing style (ok this isn't a requirement - but if it diverges from my own style it's a good hint)
Other people, picked at random and asked to give the style a try, would come to appreciate it - even if they objected at first

The first requirement is investigated in more detail in the first part of this series - the Speed Reading perspective.

In this article we'll look at the other two.

The best laid code of keyboards and men

Armed with the ideas I'd derived from Speed Reading I decided to tackle the issue of objectively good code formatting styles. This is not to say that it is perfect or that it as truly objective in an absolute sense. But I do believe it has some value. Not least for solving the problem of how to format function signatures consistently.

Before we look at the specifics, I'll address the second and third requirements from the previous section.

They wouldn't necessarily follow my own existing style

This is the case. Although I continue to prefer my spaces-inside-the-parentheses style, and this is compatible, and I'd already had a preference for alignment and columns, the realisation of my ideas took some of that further, as well as into unexpected directions that took some getting used to.

Other people, picked at random and asked to give the style a try, would come to appreciate it - even if they objected at first

As stated early on, the numbers may not be statistically significant, but I have asked a number of developers with difference backgrounds to give the style a fair try. An immediate problem here is that they may have given this style a fairer try than other contenders. A proper study would have introduced control styles too. Nonetheless I found the results illuminating.

Pretty much without exception (at time of writing) everyone who tried it followed the same pattern:

  1. Immediate reaction: "Ugh! That's horrible! Ok, I'll try it, but then I'm going straight back to my old style"
  2. Day 1: Much the same reaction, some regressions, but generally following the style fairly easily, despite personal feelings.
  3. Day 2: "Actually I'm starting to like it!"
  4. Day 2-3: "This is awesome, I'm going to use this style in all my code now"
  5. Day 3+: "I can't stop myself reformatting all my old code to this new style!"
  6. ...
  7. Year 3+: "meh"

We'll come back to the Year 3 effect later. Other than that the general progression is promising, to say the least. However it's by no means conclusive. In addition to the weaknesses already outlined it doesn't really tell us how effective it is (i.e. whether it has a net positive impact on productivity, beyond the initial "feel good" phase). For this I don't have any hard numbers. What I do have is my own feeling, and that of those that tried it, that code readability and navigability improved greatly.

By this point you're probably wondering if I'll ever get to describe the style itself at all. In that case you shall be pleased to know that the next thing I'll cover is just that.

In the next article. See you then!

Technorati Tags: , ,