The "Large-Scale C++ Software Design" rules

In practice

One of the most interesting books that I have read on the C++ programming language is John Lakos' Large-Scale C++ Software Design. It was published in 1996 and unfortunately it is the only C++ book that talks about physical design in C++ and scaling physical design to large systems.

How have the guidelines stood the test of time after more than 10 years? Here are my brief comments on some of the more important major guidelines after experimenting with them on real-world projects:

A few definitions

  1. Physical design is concerned with the physical entities of a software system (files, directories, libraries).
  2. A declaration introduces a name into a program; a definition provides a unique description of an entity (e.g. type, instance, function) within a program.
  3. A name has internal linkage if it is local to its translation unit and cannot collide with an identical name defined in another translation unit at link-time.
  4. A name has external linkage if, in a multi-file program, that name can interact with other translation units at link time.
  5. A component is the smallest unit of physical design. This is typically a header + source file pair.
  6. A package is a group of components.
  7. A subsystem is a group of packages.

The guidelines

  • Keep class data private

This is one of the basic guidelines of both object-oriented design and physical design. It's a good idea because it hides some of a component's complexity.

Merely declaring member variables as private won't have any physical design effect , but going a step further and using a compiler firewall (PIMPL/Cheshire cat) is known to reduce compile time dependencies and decrease compile times.

Verdict: thumbs up

  • Avoid data with external linkage at file scope

Very easy to do (just add "static") and helps avoid linker errors and linker bugs. For instance: I have encountered an issue with two external linkage functions sharing the same name and having parameters that were convertible to each other. The wrong function was being called at runtime with no warnings at compile time whatsoever.

The only caveat here is that most C++ compilers don't support making symbols internal by including them in an anonymous namespace even though this is the standard recommended method and the static method is officially deprecated.

Verdict: thumbs up

  • Avoid free functions (except operator functions) at file scope in .h files; avoid free functions with external linkage (including operator functions) in .cpp files.

The basic idea is that of avoiding name clashes and strange interactions between translation units. I always do this one.

Verdict: thumbs up

  • Avoid enumerations, typedefs, and constants at file scope in .h files.

Same idea as before. Enumerations are especially tricky, because the enumeration name is not a namespace and each enumeration value is published in the global namespace.

Verdict: thumbs up

  • Avoid using preprocessor macros in header files except as include guards.

Macros in header files can lead to very hard to track bugs. The classic example is the badly named macro that unintentionally changes code that includes it, but it can be more subtle than that: a former colleague of mine once had a memory corruption issue where the program crashed in the debugger when deleting a certain object. To make things more interesting, it only crashed on certain deletes and not others. It seemed that all deletes from package A crashed, and all deletes from package B worked!

To make a not so long story even shorter, he had #ifdef-ed some of the member variables of the object. Package A crashed on delete because it got size X objects from package B and tried to delete them with size X - sizeof(#ifdefed member variables).

Verdict: thumbs up

  • Only classes, structures, unions, and free operator functions should be declared at files scope in a .h file; only classes, structures, unions, and inline (member or free operator) functions should be defined at file scope in a .h file.

This one is a consequence of the former rules. The idea is that classes, structures and unions create a kind of namespace when they are declared and this helps minimize name clashes. Operator functions mustn't necessarily be declared and defined at file scope, but some operators can't be made member functions so there is no choice.

Verdict: thumbs up

  • Place a unique and predictable (internal) include guard around the contents of each header file.

This is actually obligatory for most projects (even small ones) because you'll get compilation errors if a header file gets included multiple times. It's worth stressing that the include guard should have a single predictable name; I've worked on projects where multiple naming conventions were used and it was confusing.

I like to use something based on the file name, such as INC_FILENAME_H and I've created a little macro in my IDE/code editor that can generate an include guard for the selected text.

Verdict: thumbs up

  • Logical entities declared within a component should not be defined outside that component.

I have never encountered a situation where this rule was broken, but it must have been at some point, because otherwise why would it even be mentioned? C++ is probably one of the few programming languages where you can actually get away with this. There is no reason whatsoever that you would want to do it though...

Verdict: should be pretty obvious

  • The .c file of every component should include its own .h file as the first substantive line of code.

The purpose of this rule is avoiding successful compilation of header files that are not complete (i.e missing includes). If such a file is included first, it will certainly fail to compile, if it is included after other header files those files might include the files that our header is missing and the compilation would succeed.

This guideline is subverted by precompiled headers which must usually be the first file included (e.g: MSVC stdafx.h). My approach is to include the precompiled header first, include the component header next, the project headers, the external library headers (e.g: boost, wxWidgets, etc) and finally the STL/CRT headers. I also include the files that are part of the precompiled header explicitly, since compilers are smart enough to skip them and this way I can easily compile without precompiled headers should I need to.

Verdict: thumbs up

  • Avoid definitions with external linkage in the .c file of a component that are not declared explicitly in the corresponding .h file.
  • Avoid accessing a definition with external linkage in another component via a local declaration; instead, include the .h file for that component.

These guidelines are related. By importing names properly, you have only one point of change and when those names change in a breaking way you will get a compile error. Covert imports can and do break silently and cause hard to track bugs.

Verdict: thumbs up

  • Prepend every global identifier with its package prefix.
  • Prepend every source file name with its package prefix.

A bit of an overkill, no? You'll have to remember that the book IS about large-scale software though... When your project contains thousands of files, you probably can't keep all the file names in your head and anything that helps to tell you "what goes where" is welcomed.

I've found this practice to be especially helpful when:

  • doing code reviews on printed paper (or a really bad IDE/editor)
  • filtering packages when trying to navigate to a file
  • filtering packages when trying to navigate to an identifier

Verdict: thumbs up

  • Avoid cyclic dependencies among packages.

Cyclic dependencies are bad mmkay? Dependency management is very important in large-scale projects, because if you're not careful you'll end up with a monolith.

The problems that you're likely to encounter with cyclic dependencies in packages are:

  • the inability to test packages separately, which in turn hinders automated unit testing
  • bigger compile times
  • changes that propagate through the entire program source code

Verdict: a big thumbs up

  • Provide a mechanism for freeing any dynamic memory allocated to static constructs within a component.

One reason why you'd want to free such memory is that it allows you to track memory leaks easier. Most memory verification tools work by taking snapshots of the program's memory at different points in time. Non freed memory allocated by static constructs will show up as leaks.

Furthermore, if the memory is used by an object, the destructor won't be called unless the memory is freed explicitly by the programmer, even if the OS can free the memory on program exit.

The easiest way to do this is probably by using a smart pointer like auto_ptr or shared_ptr.

Verdict: thumbs up

Conclusions: As you can see, most of the guidelines are still useful after all this time. Lakos' book has had a big impact on the way that I approach large-scale C++ programming and I've successfuly used the guidelines in several projects that I've worked on. In a later post I'll write about the minor guidelines and some of the general guidelines. While the major guidelines should most of the times be respected, the minor and general ones are quite open to discussion as you'll see.

Posted on Mar 28 2009
Written by raz