Thursday, December 22, 2005

Perceptual Compression and Audio Forensics

Most digital-age or Internet-savvy people have probably beeen exposed to some form of perceptual audio compression by now, perhaps even unknowingly. These audio compression schemes, of which MP3 is one currently popular example, shrink audio files by impressive amounts. MP3, along with similar offerings from Apple (AAC), Microsoft (WMA), Sony (ATRAC), and others, are based on the subtleties of human psychoacoustics (i.e. audio perception). In simple terms, these perceptual compression schemes throw away anything that the average human couldn’t be expected to hear clearly. Unfortunately, at every place the compression algorithm ‘cuts out’ some sound it wants to discard, a bit of noise (from the resulting discontinuity) is left behind. Fortunately, these bits of noise can then be hidden using other psychoacoustic sleight-of-hand tricks. What is left at the end is an audio file that is not only very much smaller than the original but also sounds remarkably similar.

The technical aspects aside, perceptual compression is a boon for users of portable media players - the file sizes are small (letting one carry lots of music on a small player) and the loss of acoustic detail and warmth isn't as obvious where one typically uses portable media players (e.g. on the train or while jogging). Of course, those who like their music - and still have their hearing left - have found out that run-of-the-mill MP3 music files don't sound so good when played over their hi-fi systems in the quieter environs of home. But I digress... [For more on better MP3 codecs for audiophiles, see this Wired article.]

To get back on track, what is perfectly fine for compressing a professionally recorded, mixed, and mastered music album is not necessarily fine for storing evidentiary audio that may have to be forensically filtered. Why not? The short answer is that an audio recording that needs filtering needs it precisely because the speech is not loud and clear enough, there is masking noise, or both. Perceptual compression techniques work by throwing away the sounds that the average human doesn’t hear clearly anyway, so what gets removed? You guessed it – the speech. So, when the audio examiner removes the noise, there isn't much left there to be "revealed". At that point, MP3 and other such schemes change from boon into bane.

[Note: Expect additional posts on perceptual compression in the near future due to their prevalence.]

1 comment:

Keith said...

Well, I've learned a few hard lessons today about html, Microsoft Word, Blogger, and blogging... I ended up having to take down this post and then re-post it after editing the html by hand. Any experienced blogger reading this is probably thinking "sounds like a newbie error" or something of the sort - and they are right!