Wednesday, November 28, 2007

The Nature of .mp3s and Why It Is Best to Avoid Them

The nature of MP3 compression is not a secret, but I'm sure many people do not know exactly how it works. It is quite amazing how a 50 MB raw .wav file can be compressed to a tenth of its size or less and still sound the same to most ears. Raw .wav data is essentially that, raw sound waves recorded into a digital device. The digital representations of their real-world counterparts are exact replicas (well, technically they aren't entirely exact, but the difference is actually indiscernible to the human ear). These files include ALL sounds recorded, even those which are hard or impossible for a human ear to hear.

To reduce the enormous file size of this raw sound data, and MP3 employs a science called psycho-acoustics (essentially how the mind interprets sound) to remove or change these frequencies which are hard to hear. Basically, an MP3 encoder will begin by removing sounds which the average human ear cannot hear by themselves; generally sounds above 20,000 Hz, and below 20 Hz if I am not mistaken. Next comes the true psycho-acoustic processes. The encoder will then remove sound data from the file based upon models that it is programed with. Scientists have studied how the mind interprets sound, and have programmed these encoders to edit sound data, based on what the human ear can hear well, or not so well.

I will not delve into the depths of these processes, as I do not know enough myself, but one example would be that of a loud sound playing above a quiet sound. If the loud sound is loud enough, then human mind is said to focus on only the louder sound, and assume that the quieter sound it still playing underneath, until the loud sound ceases upon which time the ear then hears the quiet sound alone. An MP3 encoder would, in this situation, remove the sound data for the quieter sound while the louder sound was playing. The ear is said to interpret the quieter sound as continuing underneath, even if the data is not there. I think a diagram would help this explanation.


While .wav (raw sound data) data could be represented like this:

Louder Sound data: -------- -------- -------
Quieter sound data: ====================================



The .mp3 (edited sound data) rendering of the exact same
section of data would look like this in comparison:

Louder Sound data: -------- -------- -------
Quieter sound data: ======= ======


In this example, the quieter sound wave is represented by the === line, and the louder sound wave is represented by the --- line. In the .wav file representation above, the quieter sound continues, even while the louder sound is being played, as it would be in real-life. The human ear can still hear the quieter sound, but its focus is on the louder sound while it is playing. The .mp3 file representation takes advantage of this psycho-acoustic property. It removes the data for the quieter sound while the louder sound is playing, and leaves it alone elsewhere. The human ear is essentially "tricked" into thinking that it still hears the quieter sound. This applies at a very complex level.

I assume the next question by a reader might be, "Why can I hear the second guitar in my .mp3 file? The first guitar is much, much louder." The answer is that the sound an instrument produces is comprised of many sound waves that all come together to form its unique sound. This example speaks of two separate, pure frequencies. So, yes, you can still hear the second guitar under the first, but, in both instruments, you are not hearing all of the frequencies that would be there, because they are absent entirely in the .mp3 file.

Now obviously real-world applications of .mp3 files are not as black and white as this example. Obviously they sound very good, so the creators of the file format certainly knew what they were doing. That being said, it is an unnerving fact for people like me that I am not hearing every single bit of an instrument's tone quality and timbre when I am listening to an .mp3. The fact is that the difference between an .mp3 and a .wav file is actually very audible.

My first experience with this was with a comparison of a 128 kbps CBR .mp3 with a 192 kbps CBR .mp3. I knew the difference between 64 kbps and 128 kbps, but I thought after that it was near impossible to tell two files apart. I was ripping one of my own CDs and decided it would be best to go for "overkill" on quality, so I chose the 192 kbps format. In comparison with a similar 128 kbps, I could discern an audible difference. I bet that you, the reader, can as well; that is not great feat of the ear. Listen to the cymbals especially. They will sound more crystalline in lower quality files, and more clear in higher quality files. I invite you to test this. Take one song from a CD, and rip it in both 128 kbps and 192 kbps formats. If you cannot quite tell the difference, then start by comparing a 64 kbps and a 128 kbps file of the same song. I could not hear the difference of the 128 to 192 at first either. Even less discernible, but still very apparent to alert ears is the difference between a 192 kbps .mp3 and a 256 kbps .mp3. I began to rip everything at anywhere from 192 kbps to 320 kbps (I thought 320 was equal to CD quality at that time), but most commonly in the 192 kbps CBR .wma file format.

Now, after several years, what I was doing hit me. While 192 kbps is very high quality, it is still missing some of the sound data that would make a recording sound as close to the real setting as possible. There is a very audible difference between a raw sound file and a compressed .mp3, even of a bit rate as high as 192. So, now comes my plug for lossless audio file formats.

No comments: