In this article, I focus on music in digital formats. Moreover, because I am a Linux kind-of-guy, I’m going to take a Linux kind-of-perspective on this topic.
Most people have heard of the MP3 format. It’s an example of two things: First, it is not an open format, as a number of organizations claim patents on it. And second, it is a “lossy” format. Lossy formats compress the original signal by throwing out some of the signal components. The original rationale for this compression was to make music files smaller and more easily distributed. In contrast, there are also “lossless” formats, which can be compressed (without throwing away the original signal) or not. Digital music presented on the Compact Disc (CD) is an example of a lossless format (assuming it’s an audio CD, not a data CD with MP3s saved on it).
It is also worth mentioning that there are two main ways to encode digital music: pulse code modulation (PCM) and delta-sigma modulation (DSM). Until recently, most digital music has been encoded using PCM; but Sony and Philips established a DSM-based standard called DSD and implemented it on Super Audio CD (SACD) disks. A small but growing amount of music is available for download in this standard. We’ll leave it to Wikipedia to explain the difference in more detail.
Those of us who are concerned about software freedom should prefer completely free formats like Ogg Vorbis (lossy) and FLAC (lossless, compressed). We should particularly avoid file formats that include options for digital rights management (DRM). In theory, one might think that DRM is just a mechanism to prevent the unauthorized use (theft?) of someone’s intellectual property. However, certain vendors use DRM to force their customers to use their software, and sometimes hardware. Once again, Wikipedia has a nice detailed article about this whole format business.
But deciding on a format is not—or at least should not—be a primary concern. Rather, each of us has a different set of objectives with respect to the use of music. I’m going to explain my objectives, and then further explain how those objectives influence my decision on file formats.
First, and for emphasis, I am a big supporter of software freedom. This means I prefer the Ogg or FLAC formats for digital music. Any format with limited access due to a patent or trade secret is of little or no interest to me.
Second, my music collection stretches back to the 1960s. I still have most of the LPs I bought years ago (sometimes to my embarrassment), and one of the things that gives me great pleasure is how good some of those old LPs still sound on modern analog playback equipment. I like to think that good-sounding LPs, like Dave Brubeck’s Time Out, originally recorded in 1959, still sounds incredibly fresh and clear in part because the people who recorded it did an excellent and careful job with their equipment. And so when I buy music now, whether on LP or as a music download, I try to get the very best quality of recording I can.
Therefore, I buy digital lossless in strong preference to lossy. In fact, if something is only available in a lossy format, I usually don’t bother buying it. And not only do I buy lossless, but I buy it at a higher resolution than “CD standard” when available. And for sure my preferred lossless format is FLAC!
Let’s talk about resolution for a minute. Music on a CD is presented at a sampling rate of 44.1kHz and with a word length of 16 bits. In theory this means the loudest sound recorded on a CD is 216, or 65,536, times as loud as the softest sound. This means that if you have a recording that shows this full dynamic range and turn up your volume to the point where you can just hear the quietest parts, then the loudest parts will be so loud as to cross the auditory pain threshold.
Moreover, the Nyquist-Shannon sampling theorem tells us that the 44.1kHz sampling rate is more than ample to preserve sound frequencies up to 20kHz (the “kHz” is an abbreviation for “kiloHertz”, or cycles per second), which is said to be the upper limit of audibility for humans with excellent hearing.
So why do I think I need higher resolution than the CD standard?
Simple. A recording presented at a sampling rate of 96kHz and with a word length of 24 bits provides a great deal more “room” to fit in the original analog signal—not just the loudest sound and the softest sound—than does the 44.1/16 version. This means a recording need not be at a level so close to the maximum that it occasionally exceeds it. (A signal that exceeds the maximum is said to be “clipped,” and clipping introduces all sorts of nasty sounds not present in the original recording.) Moreover, quiet sounds in the music have more bits to represent them.
For example, Marconi Union’s Breathing Retake is regularly 25dB below maximum. A dB, or decibel, is a ratio between the actual level (-25dB in this case) and a reference level, 0dB. A signal that is -25dB below reference has the four most significant bits set to zero. So music in a 16-bit word length at -25dB only has 12 bits worth of signal, whereas in a 24-bit word length it has 19 bits worth of signal. Eric Whitacre’s Sainte-Chapelle as performed by the Tallis Scholars chugs along at -35dB to -40dB, which gives 10 or less bits for the signal in the case of a 16-bit word length. The 24-bit word length gives the recording engineer much more freedom to record the music as played, without having to compress the music to fit it into the 16-bit dynamic range.
As for the sampling rate, the 96kHz sample rate can be used for audio frequencies up to 45kHz or so, and 192kHz for frequencies up to 90kHz or so, well beyond what is thought to be the top end of human capability. However, having that extra bandwidth available means that the filtering that must be applied to the analog signal before it is digitized can be much more gentle than in the case of the 44.1kHz sampling rate. Gentle filters are generally preferred to more abrupt filters for their audio characteristics. The Well Tempered Computer has several nice articles on this topic.
And one more reason to buy the high-res stuff: my experience shows me that when music is released in high-res format, it is often well cared for in the production chain and preserves the original dynamic range (loud is LOUD and quiet is …) and life of the music, without introducing a bunch of artifacts—noise!—into the music.
In conclusion: When I buy digital music downloads, I buy them in FLAC format and try to get 24-bit files and 88.2kHz or 96kHz sample rates. My music files cost money. Why would I be willing to accept poor quality lossy files? And why would I be willing to let the vendor lock me into a particular software and hardware platform?