Andrew Birkett's nobugs.org
At one point I was working on a fluffy, friendly tour through the GCC C frontend. The article covered the lexer, parser, symbol table and tree constructs with lots of examples. Unfortunately, I left Edinburgh to go travelling for six months and didn’t continue the project when I returned. It was based on the 2.7.x sources, so it’s already somewhat out of date.
It would be nice if someone picked up this project, since I think the aims are still relevant. GCC is written in C, a language with only meagre support for large-scale programming. As a consequence, it is rather more difficult to learn how the compiler works (certainly compared to the ocaml compiler). The frontend is reasonably stable, so hopefully a document wouldn’t get out of date too quickly. Having said that, the preprocessor was integrated with the lexer on the last major release, and there’s a brand new recursive descent C++ parser in the pipeline too.
You can download the last version here.