Wednesday, 2 October 2013

C++: Some Consequences of a Design Decision

Top Dog

C++ has a very solid position as the programming language which makes the least performance compromises while providing good abstraction mechanisms that allow code to be written at a high level. In many ways, it's an easier and safer language to learn than C, if you stick to the same imperative style but use std::string and the provided containers.

Both of these statements are of course open to debate. The first statement is true, if we look at the usage of this language in performance-critical applications. The second is often challenged. To quote Andrei Alexandrescu's comment in this Reddit comment:

In my opinion C++ is a language for experts and experts only [...] It has enough pitfalls in virtually all of its core constructs to make its use by benevolent amateurs virtually impossible except for the most trivial applications.

Ouch! That's fair enough; we were comparing it to C anyway (which is definitely not for sissies). It is not really a programming language for civilians, and not a good first language for anyone other than a would-be professional. (In fact I'd say you would get a better all-round education in C, even if later you turn in relief to non-military languages; learning C++ is mostly good for ... programming in C++.)

Error, Error on the -Wall

The big hurdle is the first one, and that's making sense of the error messages:

 #include <iostream>
 #include <string>
 #include <list>
 using namespace std; // so sue me
 int main()
     list<string> ls;
     string s = "hello";
     ls.append(" world");
     cout << ls << endl;
     return 0;

This isn't a bad attempt at a C++ program at all, leaving aside the pedantic belief that using namespace std is bad. (I take the pragmatic view that anybody is free to inject whatever namespaces they care to within their files, and not take away this freedom from others by injecting namespaces within header files. C++ is very good at resolving ambiguous name references and everyone should know the contents of std anyway.)

We get nearly four hundred lines of error messages, full of implementation details. In this case, the abstractions are leaking all over the user!

Verity Stob once suggested that the thing to do was to write a Perl script to parse the error output. This was very funny and true, but using Perl would increase the number of problems. My practical way of realising Verity's joke is to use lake and a suitable plugin:

 C:\Users\steve\dev\dev>lake -L filter.cpp error.cpp
 g++ -c -O2 -Wall -MMD  error.cpp -o error.o
 error.cpp: In function 'int main()':
 error.cpp:10:8: error: 'list<string >' has no member named 'append'
      ls.append(" world");
 error.cpp:11:10: error: no match for 'operator<<' (operand types are 'ostream {a
 ka ostream}' and 'list<string >')
      cout << ls << endl;
 lake: failed with code 1

Now our noob has a fighting chance, and can now go to the reference and actually find the appropriate method.

Templates Considered Harmful

The real issue is that the C++ standard libraries over-use generics. std::string and std::stream could be plain classes, as they once were. At this point, there will be someone suggesting that I am a plain ASCII bigot and forgetting the need for wstring and so forth. Fine, let them be plain classes as well. An incredible amount of ingenuity went into making templated string types work, and the library designers could have made their life easier by using a low-tech solution. Generally, we should not pander to library designers and their desires, since they chose the hard road: their job is to use the right level of abstraction and not complicate things unnecessarily.

C++'s standard generic containers are fantastically useful, but their design is overcomplicated by being also parameterized by an allocator. This is a useful feature for those that need it, but there could be two versions of (say) std::list overloaded by template parameters, which can be done in C++11 with variadic templates. This makes life a bit harder for library implementers, but they are precisely the people who can manage complexity better than users.

The Standard is the Standard, no point in moaning. But let's do an experiment to see what the consequences of a simplified standard library. I emphasize that tinycpp is an experiment, not a proposal (modest or otherwise). It originally was done for the UnderC project, since the interpreter could not cope with the full STL headers, and I've since filled in a few gaps. Here it's purpose is allow some numbers to be generated, since qualitative opinion is all too common.

These simplified 'fake' classes directly give us better error messages, especially if the compile bombs out on the first error. (Often after the first error the compiler is merely sharing its confusion.)

 $ g++ -Wfatal-errors -Itiny error.cpp tiny/iostream.o tiny/string.o
 error.cpp: In function 'int main()':
 error.cpp:10:8: error: 'class std::list<string>' has no member named 'append'
      ls.append(" world");
 compilation terminated due to -Wfatal-errors.

It's easy to forget the initial difficulty of learning to ride the bicycle, and to scorn training wheels as a useful means to that end.

Templates Slow you Down

People say 'C++ compiles slowly' but this not really true. A little C++ program will involve in about 20Kloc of headers being processed, a lot of that being template code. Using the tinycpp library that goes down to 1.4Kloc.

The three compilers tested here are mingw 4.8 on Windows 7 64-bit, MSVC 2010 on the same machine, and gcc 4.6 in a Linux Mint 32-bit VM.

Here is a comparison of build times for standard vs tinycpp:

  • mingw 0.63 -> 0.33
  • gcc 0.60 -> 0.20
  • msvc 0.82 -> 0.17

As always, gcc works better on Linux, even in a VM, and it's no longer slower than MSVC. In all cases the tinycpp version compiles significantly faster.

C++ programmers can get a bit defensive about compile times, and often end up suggesting throwing hardware at the problem. There seems to be a "You're in the Marines now boy!" macho attitude that wanting to build faster is a sign of civilian weakness and poor attention span. This attitude is off-putting and gets in the way of better engineering solutions. Most people just suck it up and play with light sabres.

Templates Make you Fat

With small programs, these compilers produce small executables when they are allowed to link dynamically to the C++ library. This is not considered a problem on Linux, because obviously everyone has upgraded to the latest distro. But if you want to chase cool new C++11 features you may find that most of your users don't have the cool new libstdc++ needed to run your program.

It is (curiously enough) easier to get a new shiny GCC for Windows precisely because it's not considered part of the system. Executables rule in Windows, so it's alarming to find that a small program linked statically against libstc++ is rather large, nearly 600kb for Windows. And since libstc++ is not part of Windows you (again) have to suck it up. (And this is definitely what Alexandrescu would consider a 'trivial application'.)

You can get down to 174Kb using the fake tinycpp libraries, which suggests that an up-to-date and properly engineered version of std-tiny would be useful for delivering executables, not just for speed and noob-friendliness.

MSVC does static linking much more efficiently; the numbers are 170Kb (std) and 95Kb (tiny). The resulting executables have no C runtime dependencies whatsoever. Which suggests that MSVC is (at least) a good choice for building releases for distribution. Using a cross-platform compiler-aware tool like CMake or Lake can make that less painful. Not an ideologically comfortable recommendation to accept, true, but whatever works best. (The command-line version of MSVC 2010 is freely available.)

This preoccupation with executable sizes seems last-century by now (after all, Go users are fine with megabyte executables since they see that as the price of no other runtime dependencies.) And large executables are not slower, providing the actual code that's executing at any point is compact enough to be cache-friendly. So perhaps I'm just showing my age at this point, although please note that resource-limited devices are much more common than desktop computers.

No Free Lunches

C++ programmers like the phrase 'abstraction overhead' because C++ is very good at reducing this to zero in terms of run-time. Often this is at the cost of executable size, compile time and confusing errors. This may be an acceptable price, but it is not free.

C++ is what it is; it is unlikely to change that much, except get even slower to compile as the Boost libraries move into the Standard library. But I think that there are some lessons to be learned for new languages:

  • keep the standard library as simple as possible: library developers should not have too much fun (They should write cool applications that use their libraries instead to get excess cleverness out of their system.)
  • error messages should not burden the user with implementation details; this means that the abstraction is leaking badly.
  • compile time still matters. Perhaps the people who use C++ more regularly are more likely to be those who like to think upfront (like embedded programmers) but this is not the only cognitive style that flourishes in programming. It is a mistake to think that long build times are a necessary evil, since with C++ they largely come from an outdated compilation model. New languages can do better than that.