« Using a networked drive for Time Machine backups (on a Mac) | Main | Code formatting in C++ Part One »
Monday
Oct052009

Code formatting in C++ Part Two

In the mid nineties I worked at Dr. Solomon's on their Anti-Virus toolkit. I spent some time in the virus labs working with live viruses (which I am told is the correct pluralisation). In those days viruses were mostly DOS based and attached themselves either to an exe image or a disk boot sector. Dr. Solomon's had their own scripting language for describing how a particular virus was identified, and then how it should be removed. This was significant as viruses already had plenty of sophistication with encryption and poly-morphism (each infection looked different). So the guys in the lab would write code in this scripting language on a regular basis.

One thing I noticed as I did my stint in the labs was that the guys who worked there all the time didn't use any indentation. While the script was not really procedural, it did have sections and block scopes, and yet these were never being highlighted in the textual layout of the code. There was nothing in the language that prevented this, so when I wrote my own scripts I indented as I thought best and happily showed my code to the head of the lab. His appraisal?:

We don't use indentation here

I was shocked! Why would you deliberately hide the structure of the code when there was virtually no overhead in bringing it out?
Of course I knew that different people have different ideas about code formatting, but I hadn't come across such an extreme case before.

As my career progressed I learned more and more that the subject of code formatting was very delicate. Developers may grudgingly adopt a "house style" for the sake of consistency (or, increasingly commonly, just adopt the style of the source file they are editing at the time). But ask them to change what they think is best and you'll be lucky to walk away with all your teeth!

Despite this I did pay attention to my own formatting style. Rather than stick to what I'd always done, if I saw a new style I questioned myself on whether there was anything about it that gave it an advantage. If so I adopted the style. For example, when I started out I used the common style of placing spaces on the outside of parentheses, like so:

if (condition==expected)
{
    doSomething (argument);
}

Note the space before the opening (.

Then I saw someone who put the spaces on the inside:

if( condition==expected )
{
    doSomething( argument );
}

This looked really strange and I wondered why he was so keen to depart from the norm. But after a while I realised that, for me at least, I find the second version easier to read. The difference was only slight, of course (or so I thought), but I found that if I was looking at a screenful of code and needed to home in on the interesting bits, having the spaces on the inside of the parentheses helped those parts of the code to come out of the screen at me. Logically, the parentheses belong to the function or keyword preceding it, whereas the arguments or expressions passed in where external and varied independently - so the use of whitespace captured that relationship.
At least that's how I see it.

After a few more years I began to think about whether any sort of objective metrics could be extracted on what aspects of code formatting style enhanced readability - independently of an individual's "preferred" (ie, existing, often ground in) style.

If I could find any such metrics or recommendations they would, I surmised, need to satisfy the following requirements:

They would derive from objective sources that are ideally not connected with software development
They wouldn't necessarily follow my own existing style (ok this isn't a requirement - but if it diverges from my own style it's a good hint)
Other people, picked at random and asked to give the style a try, would come to appreciate it - even if they objected at first

The first requirement is investigated in more detail in the first part of this series - the Speed Reading perspective.

In this article we'll look at the other two.

The best laid code of keyboards and men

Armed with the ideas I'd derived from Speed Reading I decided to tackle the issue of objectively good code formatting styles. This is not to say that it is perfect or that it as truly objective in an absolute sense. But I do believe it has some value. Not least for solving the problem of how to format function signatures consistently.

Before we look at the specifics, I'll address the second and third requirements from the previous section.

They wouldn't necessarily follow my own existing style

This is the case. Although I continue to prefer my spaces-inside-the-parentheses style, and this is compatible, and I'd already had a preference for alignment and columns, the realisation of my ideas took some of that further, as well as into unexpected directions that took some getting used to.

Other people, picked at random and asked to give the style a try, would come to appreciate it - even if they objected at first

As stated early on, the numbers may not be statistically significant, but I have asked a number of developers with difference backgrounds to give the style a fair try. An immediate problem here is that they may have given this style a fairer try than other contenders. A proper study would have introduced control styles too. Nonetheless I found the results illuminating.

Pretty much without exception (at time of writing) everyone who tried it followed the same pattern:

  1. Immediate reaction: "Ugh! That's horrible! Ok, I'll try it, but then I'm going straight back to my old style"
  2. Day 1: Much the same reaction, some regressions, but generally following the style fairly easily, despite personal feelings.
  3. Day 2: "Actually I'm starting to like it!"
  4. Day 2-3: "This is awesome, I'm going to use this style in all my code now"
  5. Day 3+: "I can't stop myself reformatting all my old code to this new style!"
  6. ...
  7. Year 3+: "meh"

We'll come back to the Year 3 effect later. Other than that the general progression is promising, to say the least. However it's by no means conclusive. In addition to the weaknesses already outlined it doesn't really tell us how effective it is (i.e. whether it has a net positive impact on productivity, beyond the initial "feel good" phase). For this I don't have any hard numbers. What I do have is my own feeling, and that of those that tried it, that code readability and navigability improved greatly.

By this point you're probably wondering if I'll ever get to describe the style itself at all. In that case you shall be pleased to know that the next thing I'll cover is just that.

In the next article. See you then!

Technorati Tags: , ,

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (7)

I use spaces inside the brackets too.
I started doing it because it made it less likely I would miss a 'not' when looking at an if statement
if(!somevar)
if( !somevar ) //I find the ! stands out more on this line

October 6, 2009 | Unregistered CommenterHamish

I use this style as well when doing my own programming exactly for the same readability reasons.

October 6, 2009 | Unregistered CommenterGiovanni Asproni

Thanks for the back-up, guys. The spaces-inside-the-parantheses comments where more of an aside, but I thought it worth bringing out in this context.

October 8, 2009 | Registered CommenterPhil Nash

Hi Phil,

Nice articles! I also use the source code's layout to emphasize regularities ( or the lack thereof ).

I learned the spaced style in the early nineties from Alan Holub's Compiler Design in C (2nd ed. 1990) and use it. It also reminds me of gestalt psychological effects.

March 21, 2010 | Unregistered CommenterMartin Moene

At first I thought you were going to make a separate point with this code at the start of the post:

if (condition=expected)

Even though it's legal C, you probably meant

if (condition==expected)

right?

I thought you were going to use that as a lead-in to suggesting the form:

if (5 == a), which the compiler will bark at if only a single = is used (i.e., if you type (if (5 = a)), since you can't assign to the rvalue 5. I'm not a big fan of this syntax, however... I don't think I've made that mistake in 22 years, a static analysis tool will catch it, and in fact many compilers will complain about it (when set to decent warning level, which we all do, right folks?) So I'm glad that wasn't the point!

Anyway, just might want to touch that up. I found your site because of your recent post on the C++ unit test framework, looking forward to giving it a try.

(By the way, I completely agree on the importance of formatting. One thing my engineers are probably sick of hearing me say: "You know, there's no monopoly on whitespace.")

December 28, 2010 | Unregistered CommenterDan

@Dan thanks for that - good catch. I don't know how that slipped in there!
I was going to say that I don't think I've made that mistake for a long time either - but the evidence is there to the contrary :-s

Nonetheless I totally agree that it's a rare mistake - and the compiler will warn now (I need a blog editor that compiles code!) - so I hate it when code is contorted to try and catch it.

Anyway, I'll fix the post now - so anyone reading this later will have to trust Dan's comment that the bug ever existed ;-)

December 29, 2010 | Registered CommenterPhil Nash

Quite interesting reflections on your speed reading experience ! Thinking to give it a try.
Back to source code formatting issues. It does not hurt you until you _have to read_ that nth megabytes source listing writtent around some 20+ years ago... But after you did you magically know how to format your code for better readability. You also know that "where to put all these braces and parens" is the least of your concerns.

January 24, 2011 | Unregistered CommenterYury

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>