What is the difference between hand-work and machine work, and what has caused it? How can we improve the situation? This essay explains problems in music notation (software), and our approach to solving them.
LilyPond is not unique in making music notation: there are a lot of programs that print music, and nowadays most of the newly printed music is made with computers. Unfortunately, that also shows: just ask any musician that plays classical music: new scores do not look as nice as old (from before, say, 1970) scores: the new ones have a bland, mechanical look. They are not at all pleasurable to play from.
To illustrate this, take a look at the following examples. Both are editions of the 1st Cello Suite by J.S.Bach. The one on the left is a very beautifully hand-engraved edition from 1950, the one on the right is a typical contemporary computer product. Take a few seconds to let the looks of both pages sink in. Which one do you like better, and why?
![]() |
![]() |
Bärenreiter (BA 350, (c) 1950) | Henle (nr. 666 (c) 2000) |
The left picture looks nice: it has flowing lines and movement. It's music, and it's alive. Now, the picture on the right shows the same music, and it was written by Bach. His music surely has liveliness and flowing lines.... Except, the score doesn't show it: it looks rigid and mechanical. To understand better why that is, let's blow up a fragment of both pieces:
Hand-made
Computer-made
The location of the bar lines is a giveaway. In the new edition, both barlines are on exactly the same horizontal location. Also, the note heads are on the exact same horizontal location. When you look back at the whole page, you can easily verify that almost all barlines are in the same location, as are most of the note heads. The entire thing is spaced as if it were put to a big grid, which is what causes the mechanical impression.
This is not the only error on this example, and more importantly, this piece is not the only one with typographical errors. Sadly, almost all music printed nowadays is full of basic typographical mistakes.
Musicians are usually more absorbed with performing the music than with studying its looks, so this nitpicking about typographical details may seem academical. That is not justified. This piece here has a monotonous rhythm. If all lines look the same, they become like a labyrinth. If the musician looks away once or has a lapse in his concentration, he will be lost on the page.
In general, this is a common characteristic of typography. Layout should be pretty, not only for its own sake, but especially because it helps the reader in his task. For performance material like sheet music, this is doubly important: musicians have a limited amount of attention. The less attention they need for reading, the more they can focus on playing itself. In other words, better typography translates to better performances.
Next: What's wrong with software, or how Finale is not the end-all of music software.
Computers have made music printing accessible to the masses, but they tend to deliver mediocre typography. Apparently, programmers have been doing a shoddy job on notation programs. To illustrate that, we had an amateur user set a piece of music in one of the most popular ‘professional’ notation programs sold today, Finale 2003. It was made with all of the default settings. The music is from the Sarabande of the 2nd Cello Suite by J. S. Bach.
(Finale is a registered trademark of MakeMusic! Inc.)
This example far surpasses the previous one when it comes to formatting errors: there are serious errors in literally every measure. The errors come in all sizes: a big one is the oddly s p a c e d   o u t last line. A smaller one is the flat in measure 13, which is covered by the note preceding it. Here is a magnification of that measure:
The errors go down to the teensy details: below is a blowup of the beam in that measure. Of course, in proper typography the beam should not stick out to the right of the stem, and the ribbles provide a telling glimpse into Coda Music Technology programmers' aptness (or lack thereof) with the underlying PostScript technology.
Now, one could refute that Finale has a graphical interface, and it lets you easily move about elements to correct errors, or use plug-ins to do so. This is certainly true: in fact, good professional engravers that use Finale typically spend the majority of their time correcting all the errors that Finale routinely makes. But do you want to spend your time on correcting all glaring errors? For the spaced out line, it is doable, but imagine that you have to correct each and every beam that sticks out of the stems.... by hand?
There is a less obvious reason why correcting things by hand is a bad idea. Consider again measure 13 reproduced above. The misplaced flat is pretty obvious, but did you notice that repeat bar? Its lines are too far apart. Did you notice that the eighth rest is too far down? Did it occur to you that the stem of the last eighth note is too long?
Unless you are an expert, typographical errors will irk you without being obvious. Many of them will go uncorrected and will still be in the final print.
This example may seem contrived, but in fact, it's not. All major producers of notation software claim to follow engraving standards, but we have not seen any that gets the basics right; all of them make systematic mistakes. If you want to assess the output of your favorite program, then buy a decent hand-made score from a respectable publisher, and try to reproduce one page of it. Then compare them:
Next: How not to design software, or: modeling music notation.
It would be nice if notation software didn't need any babysitting to produce acceptable output. Our goal with LilyPond was to write such a system: a program that will produce beautiful music ("engraving") automatically.
At first sight, music notation follows a straightforward hierarchical pattern. Consider the example below, with two staves containing two measures.
Isn't writing software all about finding hierarchies and modeling the real world in terms of trees? In the view of a naive programmer, the above fragment of notation is easily abstracted to a nested set of boxes
<score> <staff> <measure id="1"> <chord length="1/2"> <pitch name="c"> </chord> <chord> .... </measure> </staff> </score>
In short, this model is obvious, simple and neat. It's the format used by a lot software. Unfortunately, it's also wrong. The hierarchical representation works for a lot of simpler music, but it falls apart for advanced use. Consider the following example:
In this example, several assumptions of the previous model are violated: staves start and stop at will, voices jump around between staves, and sometimes span two staves.
Music notation is really different from music itself. Notation is an intricate symbolic diagramming language for visualizing an often much simpler musical concept. Hence, software should reflect that separation.
Next: Divide and conqueror, a blue print for automated notation
![]() |
← | { c'4 d'8 } |
1. form | 2. translation | 3. content |
Next: Impressive, but does it also work in theory? A practical approach to capturing notation.
Common music notation encompasses some 500 years of music. Its applications range from monophonic melodies to monstruous counterpoint for large orchestras. How can we get a grip on such a many-headed beast? Our solution is to make a strict distinction between notation, what symbols to use, and engraving, where to put them. For tackling notation, we have broken up the problem into digestible (and programmable) chunks: every type of symbol is handled by a separate plugin. All plugins cooperate through the LilyPond architecture. They are completely modular and independent, so each can be developed and improved separately.
This plug-in creates graphical objects from musical events. People that put graphics to musical ideas are called copyists or engravers, so by analogy, this plug-in is called Note_head_engraver.
This engraver is notified of any note head coming along. Every time one (or more, for a chord) note head is seen, a stem object is created, and attached to the note head.
The Accidental_engraver is the most complex plug-in: it has to look at the key signature, note pitches, ties, and bar lines to decide when to print accidentals.
In this situation, the accidentals and staff are shared, but the stems, slurs, beams, etc. are private to each voice. Hence, engravers should be grouped. The engravers for note head, stems, slurs, etc. go into a group called "Voice context," while the engravers for key, accidental, bar, etc. go into a group called "Staff context." In the case of polyphony, a single Staff context contains more than one Voice context. Similarly, more Staff contexts can be put into a single Score context:
Next: The art of stamping:
how did they make hand-made music?
Next: Stamping computer screens?. Computer hackers take over the engraving business.
How do we go about implementing typography? Answering the "music notation" problem left us with a bunch of graphic objects representing note heads, the staff, stems, etc.
If craftsmen need over ten years to become true masters, how could we simple hackers ever write a program to take over their jobs?
The answer is: we cannot! Since typography relies on human judgement of appearance, people cannot be replaced. However, much of their dull work can be automated: if LilyPond solves most of the common situations correctly, then this will be a huge improvement over existing software. The remaining cases can be tuned by hand. Over the course of years, the software can be refined to do more and more automatically, so manual overrides are necessary less and less.
How do we go about building such a system? When we started, we wrote the program in C++. Essentially, this means that the program functionality is set in stone by us developers. That proved to be unsatisfactory:
Next: Program architecture, your flexible friend: tuning, tweaking and developing typography rules.
Remember the music notation problem? Its solution left us with a bunch of objects. The formatting architecture is built on these objects. Each object carries variables:
The process of formatting a score consists of reading and writing object variables.
Next: Beautiful numbers: how LilyPond participates in the Miss World contests.
There are a few books on the art of music engraving available. Unfortunately, they contain rules of simple thumbs and some examples. Such rules can be instructive, but they are a far cry from an algorithm that we could readily implement in a computer. Following the instructions from literature leads to algorithms with lots of handcoded exceptions. Doing all this case analysis is a lot of work, and often not all cases are covered completely.
Formatting rules defined by example. Image from Ted Ross' The Art of Music Engraving
We have developed a much easier and robust method of determining the best formatting solution: score based formatting. The principle is the same as a beauty contest: for each possible configuration, we compute an ugliness score. Then we choose the least ugly configuration.
For example, in the above configuration, the slur nicely connects the starting and ending note of the figure, a desirable trait. However, it also grazes one note head closely, while staying away from the others. Therefore, for this configuration, we deduct a `variance' score of 15.39.
In this configuration, the slur keeps a uniform distance from the heads, but we have to deduct some points because the slur doesn't start and end on the note heads. For the left edge, we deduct 1.71, and for the right edge (which is further from the head) we deduct 9.37 points. Furthermore, the slur goes up, while the melody goes down. This incurs a penalty of 2.00 points
Finally, in this configuration, only the ending the slur is far away from the ending note head, at a score of 10.04 ugliness points.
Adding up all scores, we notice that the third option is the least ugly, or most beautiful version. Hence we select that one.
This technique is a general technique, and it is used in a lot of situations, for example
This technique evaluates a lot of possibilities, which takes some time to compute. However, that is a worthwhile expense, because the end result is much better, and because it makes our lives easy.
Next: Man is the measure of things: is a flexible architecture enough?
Here you see parts of a benchmark piece. At the top the reference edition (Bärenreiter BA 350) at the bottom the output from LilyPond 1.4:
Bärenreiter
LilyPond 1.4
The LilyPond output is certainly readable, and for many people it would be acceptable. However, close comparison with a hand-engraved score showed a lot of errors in the formatting details:
By addressing the relevant algorithms, settings, and font designs, we were able to improve the output. The output for LilyPond 1.8 is shown below. Although it is not a clone of the reference edition, this output is very close to publication quality.
LilyPond 1.8
Bärenreiter
Another example of benchmarking is our project for the 2.1 series, a Schubert song.
Next: Cool features, typographical hoops that we made LilyPond jump through.
![]() |
![]() |
![]() |
Henle (2000) | Bärenreiter (1950) | LilyPond (2003) |
Another typical aspect of hand-engraved scores is the general look of the symbols. They almost never have sharp corners. This is because sharp corners of the punching dies are fragile and quickly wear out when stamping in metal. The general rounded shape of music symbols is also present in all glyphs of our "Feta" font.
One of the problems that the Bach piece above inspired us to attack is the spacing engine. One of its features is optical spacing. It is demonstrated in the fragment below.
This fragment only uses quarter notes: notes that are played in a constant rhythm. The spacing should reflect that. Unfortunately, the eye deceives us a little: not only does it notice the distance between note heads, it also takes into account the distance between consecutive stems. As a result, the notes of an up-stem/down-stem combination should be put farther apart, and the notes of a down-up combination should be put closer together, all depending on the combined vertical positions of the notes. The top fragment is printed with this correction, the bottom one without. In the last case, the down-stem/up-stems combinations form clumps of notes.
![]() |
![]() |
![]() |
Henle (2000) | Bärenreiter (1950) | LilyPond (2004) |
Next: Use the Source, Luke, or: what goes into LilyPond.
As discussed earlier, the ideal input format for a music engraving system is the content: the music itself. This poses a formidable problem: how can we define what music really is? Our way out of this problem, is to reverse it. Instead of defining what music is, our program serves as a definition: we write a program capable of producing sheet music, and adjust the format to be as lean as possible. When the format can no longer be trimmed down, by definition we are left with content itself.
The syntax is also the user-interface for LilyPond, hence it is easily typable, e.g.,
c'4 d'8Are a quarter note C1 and eighth note D1, as in this example:
On a microscopic scale, such syntax is easy to use. On a larger scale, syntax also needs structure. How else can you enter complex pieces like symphonies and operas? The structure is formed by the concept of music expressions: by combining small fragments of music into larger ones, more complex music can be expressed. For example,
c4 |
![]() |
Combine this simultaneously with two other notes by enclosing in << and >>.
<<c4 d4 e4>> |
![]() |
{ <<c4 d4 e4>> f4 } |
![]() |
<< { <<c4 d4 e4>> f4 } g2 >> |
![]() |
Such recursive structures can be specified neatly and formally in a context-free grammar. The parsing code is also generated from this grammar. In other words, the syntax of LilyPond is clearly and unambiguously defined.
User-interfaces and syntax are what people see and deal with most. They are partly a matter of taste, and also subject of much discussion. Although discussions on taste do have their merit, they are not very productive. In the larger picture of LilyPond, the importance of input syntax is small: inventing neat syntax is easy, writing decent formatting code is much harder. This is also illustrated by the line-counts for the respective components: parsing and representation take up less than 10% of the code.
Parsing + representation | total |
---|---|
6000 lines C++ | 61500 lines C++ |
Next: wrapping it up, the conclusion.
Go back to the index.