Sunday, January 29, 2006

IBM Speech Technology Roundup

PC Magazine has an article on speech recognition technology research at IBM. The article starts off reporting on the release of Embedded ViaVoice 4.4 and then goes on to talk about the various bits of speech research going on across IBM - and there is quite a lot.

After the hype of a few years ago quieted down, speech recognition technology has been progressing slowly but steadily with implementations being fielded, tinkered with, and re-tried until useful applications emerged in the darwinian world of day-to-day business. Just try making an airline reservation over the phone or calling a major credit/charge card company and you will see what I mean.

Some pretty impressive achievements are claimed in the article about not only speech recognition but language translation. English-Chinese translation is being addressed by a project called MASTOR (Multilingual Automatic Speech-to-Speech Translator) while on-the-fly translation of Arabic television to create English subtitles is tackled by project Tales. Tales is said to achieve an accuracy rate of between 60 and 70 percent with a four minute lag time (for processing) and 80 percent with a longer one. In an article like this, one usually doesn't get enough details to satisfy one's technical curiosity - such as how is 'accuracy' defined and exactly what conditions it achieved those results - but even if these numbers are a little too optimistic, it is still more than good enough to 'gist' Arabic in almost in real-time.

Saturday, January 28, 2006

Video Compression and Enhancement

I recently had two posts, one about compression of CCTV archives and the other about image stabilization. These two posts actually have something in common. In case while reading the one about CCTV you thought "If the image is not clear, then they can just filter it", you should understand that it won't work very well.

The reason is that filtering compressed images generally causes the compression artifacts (little 'errors' left behind after removing the image detail) to get amplified, thereby making the filtered image worse than the original compressed one. My attempts (on real-life video, not simulated images) have shown that image stabilization, brightness and contrast are about the only filters that can reliably work on (lossy) compressed video. Edge contrast enhancement and other such techniques that work very well on uncompressed images do not work on compressed ones (at least none of the ones I've seen so far).

Tuesday, January 24, 2006

Reconstucted Native American Language

MSNBC has an interesting article about the reconstruction of a Native American language for the upcoming movie about Pocahontas and Captain John Smith called "The New World". A linguist at UNC-Charlotte, named Blair Rudes, resurrected the language from a couple of surviving writings and knowledge of other related, but better documented, Algonquian languages.

Image: New Line Cinema

(Hat-tip: SciTech Daily)

Thursday, January 19, 2006

Motion Blindness

Now here is a medical case that illustrates just how 'strange' the human visual system is:

The patient had great difficulty pouring coffee into a cup. She could clearly see the cup's shape, color, and position on the table, she told her doctor. She was able to pour the coffee from the pot.

But the column of fluid flowing from the spout appeared frozen, like a waterfall turned to ice. She could not see its motion. So the coffee would rise in the cup and spill over the sides.

More dangerous problems arose when she went outdoors. She could not cross a street, for instance, because the motion of cars was invisible to her: a car was up the street and then upon her, without ever seeming to occupy the intervening space.

Even people milling through a room made her feel very uneasy, she complained to Josef Zihl, a neuropsychologist who saw her at the Max Planck Institute for Psychiatry in Munich, Germany, in 1980, because "the people were suddenly here or there but I did not see them moving."

The woman's rare motion blindness resulted from a stroke that damaged selected areas of her brain.

I came across this on the Howard Hughes Medical Institute site. The article goes on to talk about how the visual systems of other species are different. For instance, humans detect motion with their brains while simple vertibrates, such as frogs, do it with their retinas instead (I didn't know that).

Pretty cool stuff. I recommend reading the whole article. By the way, the site also has several more interesting pages on sight and hearing.

New RCFL opened to serve Rocky Mountain region in USA

A new Regional Computer Forensics Laboratory (RCFL) has been opened in Denver, Colorado (USA). The new lab, unsurprisingly christened the Rocky Mountain RCFL, will provide federal, state, and local law enforcement organizations throughout the states of Colorado and Wyoming with digital evidence examination services. The RCFLs are regional collaborations of law enforcement agencies that provide the personnel and local management for the labs. The Operational Technology Division (OTD) of the US Federal Bureau of Investigation (FBI) provides equipment, training, technical, procedural, and financial support to the laboratories. By the end of the year, a total of 14 RCFL facilities are expected to be up and operating.

The RCFL concept is a bit of a departure from the old model of separate forensic labs at the federal, state, and local levels, with the tougher cases being sent up the chain when lower level skill and/or resources were exceeded. The RCFL concept places additional equipment, training, and expertise out in the regions to handle things more locally. Of course, the central federal labs are still there to provide national level skills and resources when needed.

I've had the opportunity to meet and train RCFL personnel and have been positively impressed with the ones I've met thus far. Although it will take some years for the new labs to grow into themselves, so to speak, the collaborative regional concept holds the potential of significant benefits from local leadership, as well as the distribution of workload to the appropriate levels of the nation's laboratory 'network'.

Image: Photo from the Rocky Mountain RCFL Opening (L-R RMRCFL Director, C.Buechner, FBI OTD Assistant Director K.Haynes)

(Hat tip: Nieuwsbank)

Tuesday, January 17, 2006

Stabilization of Bigfoot "encounter" video

Fun link for the day: An animated GIF of a classic Bigfoot video that has been stabilized.

Over the years, I've had a hand in developing automatic video stabilization filters. Simply put, what these filters do is automatically recognize the motion of the subject in the video, then correct (i.e. re-register) each field by moving the field image up/down/sideways to make the subject line up in the same position as in the previous field, then go to the next field and repeat.

What has been done in this Bigfoot GIF is to combine stabilization with video mosaicking (i.e. making a bigger picture out of a collection of video frames). This is a powerful technique to use on aerial video of the ground or, for that matter, Bigfoot sightings... So, do you think it is just a guy in a suit or not?

(Hat tip: Digg)

Second live e-Passport trial begun at SFO

The US Department of Homeland Security has announced the beginning of the second live trial of a prototype electronic passport system (e-Passport) based on personal biometric data carried in a contactless chipset that is embedded in the passport. Australia, Singapore, New Zealand, and the USA are participating in this second trial of the technology, which is a key part of the US-VISIT program for immigration control. The first trial highlighted several issues that have since been addressed and are now being tested in this second trial, such as the theoretical ability of eavesdroppers to pick up the personal data as it is being transferred over the contactless link.

Hat tip: Contactless News

Real-life forensic video - Bosnian terrorist case

International Relations and Security Network (ISN) Security Watch reports on an investigation in Bosnia that turned up video evidence in the form of a video recorder and a taped 'last will and testament'. The evidence was sent to the US FBI for analysis - presumably by FAVIAU, the Forensic Audio Video and Image Analysis Unit. The evidence was collected during an investigation of suspected Islamic terrorists, which is now being looked at for linkages to the 7 July 2005 bombings in London, UK.

"Some 30 kilograms of explosives, dozens of guns, a suicide bombers vest, and a videotaped last will and testament were confiscated in raids on three apartments being rented out by the two suspects arrested in October.

The video tape shows the two men asking God for forgiveness for the sacrifice they were about to make. The two suspects are also shown making bombs, including one planted in a lemon and another planted in a tennis ball.

Police also found face masks worn by two of the suspects in the videotape and hair samples from those face masks believed to belong to one of the suspects. However, Bosnian forensics teams do not have the technical capability to analyze these samples.

The Bosnian authorities sent the videotape, a video camera, and other evidence to the US Federal Bureau of Investigation (FBI), which sources said were analyzing the samples.

The FBI’s forensic tests had shown that the video camera was the same one used to record the confiscated video tape containing the bombing-making evidence and the last will and testament. The hair analysis is expected later this month. The voice samples from the videotape could not be verified, the Bosnian police source said.

The source also said that comparing the number of weapons found in the apartment rented by Bektasevic and Abdulkadir with the number of weapons seen on the videotape, it was clear that weapons and explosives were still unaccounted for."

I emphasized the audio and video related items with a bold font in the quotation.

Having both the original tape and the recorder allowed the examiners to match them up. This can be done by different means, depending on the type of recorder, including using ferrofluid and a microscope to look at the magnetic tape (i.e. 'rusty plastic') to see the the patterns left by the head while erasing/writing.

The examiners reportedly weren't able to succesfully perform speaker ID (identification). Typically, this can be due to several factors, including: muffling of the speech (in this case by the masks they were wearing), interfering noise, or insufficient speech quality (e.g. level, dynamic range, frequency response, bandwidth).

This case also highlites how forensic units around the world can be of assistance to one another, as well as how proper equipment and techniques can yield valuable clues and evidence.

Nikon reshapes product line as it adjusts to today's digital reality

Nikon has announced that it is 'reshaping' its product line to emphasize digital and de-emphasize film-based cameras and accessories. In its press release, Nikon says it will be
"discontinuing production of all large format Nikkor lenses and enlarging lenses, as well as several of our film camera bodies, manual focus Nikkor interchangeable lenses and related accessories. Sales of these products will cease as supplies are depleted."
Tempus fugit (sometimes translated as "Time marches on"). I wonder if there is a latin translation for "technology marches on"?

(Hat tip: Metafilter)

Saturday, January 14, 2006

Photography and US Privacy Rights

Andrew Kantor has a well researched column in USA Today on what you can (and can not) photograph in the US. He also links to multiple sites, including one with a handy brochure written for photographers that summarizes the rules and another written with Missouri in mind, but which appears to generally apply elsewhere.

In case you aren't aware of it, US privacy laws place restrictions on taking pictures or making videos based on a person's reasonable expectations of privacy. There are also restrictions about taking pictures of military installations and the like - and after 911, we can all appreciate that.

(Note: I am not a lawyer and can not give legal advice. Like Kantor says, I am researching and reporting, not giving a legal opinion, so take my comments as holding no legal weight whatsoever.)

Hat tip Geek Press.

Friday, January 13, 2006

Nation-wide License Tag Recognition System

The Register (a UK computer and tech news site) reports that an automatic number plate recognition (ANPR) system will go live in April of this year (2006). The system will be used to log vehicle movements across the whole of the UK in order to apprehend suspected criminals as well as identify and track stolen, unregistered, and untaxed vehicles. The system will have cameras every 400 yards (365 meters) along motorways ('interstates', in US English) and will capture images 24 hours per day, 7 days per week. Given the traffic on UK motorways, that means processing about 50 million number plates (a.k.a. license tags) a day. A mobile ANPR unit is shown in the photo here.

Hat tip Technovelgy.

Viisage and Identix to Merge into Single Biometric Giant

TechWeb News reports that Viisage and Identix, two biometrics firms, are merging.

The complex transaction will result in the creation of a dominant biometric ID platform and a company with a new name--to be released later--and a new headquarters in Stamford, Conn.

Viisage will contribute stock worth about $770 million in the deal and the firm's current chairman Robert V. LaPenta will become chairman and chief executive officer of the combined company. Dr. Joseph J. Atick, now chief executive officer of Identix, will become chief strategic officer and vice chairman of the board.

In a statement, LePenta said: "The combination of Identix' advanced multi-biometric search technology with Viisage's expertise in secure credentials, document authentication, and verification, will create a global leader in biometric security, providing end-to-end identity solutions for state, local, national, and foreign government use, as well as a wide application across the commercial sector."

The firm said it expects the merged enterprise will record revenue of $220 million in 2006. The companies said the merger will enable them to compete for 80 percent of the market they address.

Hat tip InformationWeek

CCTV and Compression

Some weeks ago I posted some thoughts on audio compression and made a mental note to follow up with a similar post regarding video. A series of attacks, including one that ended in a death, recently made the news (via Drudge Report) and brought this back to mind. The CCTV security camera images shown obviously are not HDTV (High Definition TeleVision) quality, to say the least. Surely CCTV systems are intended for such situations and one would expect to have sufficient resolution to ID (identify) the individuals in the images, right? So why are the images so poor?

Before I go on, let me say that the comments I am about to make are general ones using this particular case as a taking-off point instead of an example. I say this because I have not seen the originals and therefore can not say for certain that my comments will apply to this particular case. What I do know from personal experience is that in the majority of similar cases, the live video has very good resolution. It is the archived (i.e. stored) video that is obtained by detectives and examiners that is the problem. How can this be?

Well, imagine that you are a proverbial fly-on-the-wall during the procurement of a CCTV system for a company or municipality. System specifications, performance, and cost are obviously key factors in selecting the vendor and particular system configuration. Vendors 'A' and 'B' both separately present similar systems - the same numbers of cameras, warranty, help desk support, and so on. During the live demonstration, you (the fly) see crystal clear, ID-quality color pictures of people on the street and the operator can pan-tilt-zoom the cameras to follow specific individuals. "Great, I'll have one of those!", you think. But when the time comes to tally up the bill, one of the vendors inevitably bids a bit lower or, if not, the purchasing agent negotiates them down. In either case, where do you think is an easy place to make a quick cost savings? By cutting the number of cameras? Certainly not - that could mean not covering all of the area. Going with black-and-white? No again. Squeezing more out of the video storage? Now you're talking. "How can they do that?", you might ask if you could, but you can't, because you're a fly, remember? Don't they need to archive so many days' worth of images to meet the specification? Yes, but by turning the 'Wonder Knob' (i.e. the compression ratio selector) all of the sudden they can fit twice as much video into half the space and what is seen on the live monitor doesn't change a bit. Money saved - job done. Just don't go and look at the archived images. Compressing the images almost always means throwing significant information away, in the form of entire frames, picture detail, or both. How else can you fit the same number of days of storage into a smaller space?

Time passes. The system is installed and everything seems to be going along just fine until a crime occurs and the police show up to recover the video. Then, lo and behold, the archived video looks no where near as good as it did when it was live. Now the police have a distant, slightly fuzzy image of a probable male suspect in poor lighting. Then when they zoom in for a close-up, the face becomes pixelated and nearly unuseable. That sinking feeling they experience can't be very pleasant, even when it was mostly expected...

Seems a bit shameful after all that money spent on the CCTV system, doesn't it?

If any CCTV system managers or operators are reading this, please go turn the 'Wonder Knob' back down. The police will thank you and so will the public. On top of that, you will probably save yourself an unpleasant question-and-answer session with your supervisor when an event does eventually occur.

(image source: Ft. Lauderdale Police Department, Florida, USA)

US Defense Lab Accredited

The American Society of Crime Laboratory Directors' Lab Accreditation Board (ASCLD/LAB) has accredited the Defense Computer Forensic Laboratory (DCFL). DCFL is the US DoD (Department of Defense) organization that performs electronic and computer media forensic restoration, enhancement, and analysis for all of DoD.

This reportedly makes DCFL the largest accredited lab of its type in the world. DCFL was spun up some years ago by merging a unit of the US Air Force's Office of Special Investigations (AF/OSI) with resources from the Army and Navy, if memory serves.

In case you are wondering what 'computer forensics' has to do with sound and light, DCFL's mission also includes cell phone, pda, audio, and video forensics. As noted in previous posts, this is the case in many organizations.

External, independent accreditation is now being sought, or at least considered, by most US audio/video forensic labs. This wasn't done in the past, but electronic/computer media labs have started becoming more like their 'wet lab' brethren and instituted drives for certification/accreditation. This is not to say that the labs did not approach their responsibilities professionally before. All US and European labs, in my experience, developed and maintained their skills, processes, and procedures through a combination of their own in-house training (including examination and mock-jury trials), vendor training (provided by equipment manufacturers and former professional examiners), law enforcement association training seminars (e.g. NATIA training seminars), and red-blue team comparisons. (Disclaimer: I have been and continue to be involved in forensic R&D, training, and red-blue teaming in my day job). Accreditation will now document and subject the processes and procedures to external review and certification.

As the saying goes, nothing is constant except change...

Hat tip to Government Computer News.

Wednesday, January 11, 2006


While surfing the Internet, I came across a neat site run by the Department of Physics and Astronomy at Georgia State University called HyperPhysics. It has well executed sections on my favorite topics (i.e. sound & hearing AND light & vision), as well as many others. Basic and advanced principles are illustrated with clear, informative graphics and videos. I particularly liked the movie of the Kundt Tube Experiment for showing standing waves using cork shavings in a clear tube (see picture). Enjoy.

PS. GSU also runs a sister site called HyperMath that is similarly well done. You can navigate to it from the above link.

Monday, January 09, 2006

Mysterious Charon gradually being revealed

The 5 January issue of Nature reports findings by astronomers from MIT and the Paris Observatory that add to the little that we know about Charon, Pluto's 'moon', as it is commonly referred to (although its status as a moon is disputed by some).

Using the event of Charon occluding (blocking) a distant star, the astronomers were able to determine several things about Charon. When Charon passed in front of the star, the light was immediately blocked, which implies the absence of a significant atmosphere (else it would have faded out more gradually). By observing the occlusion from two locations, they were able to tell its diameter. Then, by combining knowledge of its size with prior estimates of its mass, they were able to estimate its density and further, its composition (60% rock and 40% ice). Now that is an impressive amount of logical deduction from what amounts to using a stop watch to time a blinking (star) light.

Both teams of astronomers posted video (Paris and MIT). Be careful on the MIT clip as it crashed both Firefox (1.5) and IE for me.

Hat tip ScienceNOW Daily News.

Sunday, January 08, 2006

Adaptive Audio Forensic Filters

A point of confusion shared by many people is exactly what is meant by the term adaptive filter. A further point of confusion is why adaptive filters are needed at all. Adaptive filters, used in this context, are filters that automatically reduce noise(s) by changing their own 'settings'. This is in contrast to fixed filters that are set once and stay that way until the user changes them.

For an adaptive filter to change its own settings, it must be given some sort of rule to follow so that it will know which sounds to throw away and which to keep. The rule may be as simple as 'mute when the sound level goes very low to remove any left over hiss' or ' turn the gain down if the signal gets too loud and may clip'. At the other extreme, the rule can be as complex as 'figure out when no one is speaking, assume what is left is all noise, and then change your own settings to remove the noise even when the speech comes back'.

At this point, you might can see some of the potential pitfalls in using adaptive filters - if they make wrong judgements then they can accidentally remove desired speech along with the undesired noise. Or they can make the speech sound very sterile, like it was recorded in an anechoic chamber. Or they can go hay-wire and create static and distortion. For these reasons, forensic examiners need to follow the central principle of the physicians' Hippocratic Oath when using these powerful filters - first, do no harm.

If they can be so dangerous, then why use adaptive filters at all? The reason is that many common noises can not be removed completely, or even at all, using fixed filters. Take echo, for instance. Real-world echoes change constantly. Every time the speaker turns his head or a door opens, the echo paths in a room shift around. What it all boils down to is if the character of the noise changes, then a filter that changes itself is often required to remove it. If you want to be an expert audio forensic technician or examiner, mastery of adaptive filters is a must.

Saturday, January 07, 2006

Can you take away the noise and pull up the voice?

In my day job, I regularly get phone calls and emails from people who want to know basically the same thing - might it be possible to take away the noise and pull up the voice on a particular audio recording? Of course, without hearing the recording first there is only so much I can say. That being said, what is really needed at that point is an indication of whether it is a lost cause or not before going to the time and effort of submitting the recording for a formal evaluation.

So, how can one legitimately respond to such a question? First, I ask a few questions, such as:
  • Was it legally recorded? If yes, was it a one-party or two-party consent? If no, they might need to be contacting legal counsel and not me.
  • Is it a criminal, civil, or professional matter? If criminal, have they already contacted the appropriate authorities?
  • What kind of recorder is it (e.g. pocket recorder, cassette deck, or answering machine)?
  • What is the noise/interference like (e.g. hum, hiss, another talker, music, machinery, pops/clicks, or mobile/cell phone)?
  • How loud is the desired voice relative to the noise?
  • Is the original recording available? If so, what type of media is it on?
  • Is there a digital copy that could be emailed for an informal evaluation?
Leaving aside the questions regarding legal matters, what I try to find out is how good the recording is (i.e. bandwidth, dynamic range, wow-and-flutter, etc.), how much noise will have to be removed to hear most or all of the words, and, after the noise is taken away, will there be enough speech left to understand.

There is no way to avoid the uncertainty due to differences in terminology, technical savvy, and the like, but going through this question and answer process usually does let me give them some useful feedback and some confidence in whether it is likely worthwhile to proceed or not.

365 Days of Astronomy - Free Download

Universe Today has posted a downloadable 400+ page astronomy guide called What's Up 2006 (pdf - 13.5 MB). Each daily entry has suggested skywatching targets and techniques, plus a photo and facts. Both binocular and telescope users will find plenty of interest.

Bloggers have noticed that some of the dates don't match up to the correct days of the week, but, that aside, it looks great. I'm certainly enjoying it.

Hat tip SlashDot.

Through wall motion detector

The DARPA (US Defense Advanced Research Projects Agency) Special Projects Office has announced Radar Scope, a nifty gadget that acts like a 'stud finder' for motion on the other side of a wall. The DARPA prototype apparently can 'see' through 12 inches (30.5 cm) of concrete and then an additional fifty feet (15.24 meters) into a room. The article also says that it is sensitive enough to detect even the motion of breathing. The latter claim is particularly impressive. I'm curious to know whether it is detecting some change in the person's chest cavity or getting a reflection off of dog tags or some other body-worn or -carried article.

Even though DARPA is best known for its cutting-edge research, this technology is believed to be mature enough that their plan is to field it in Iraq at the squad level this coming spring. From the picture, the device housing looks field-ready and the typical gotcha for many advanced electronic technologies in the field (namely, the power source) looks to be a non-problem since it runs off of AA batteries.

This capability seems like it could be a nice complement to the existing ways of checking a room remotely - polecams (camera on a pole held up to the window), climbing robots, and contact mics (or accelerometers).

The article goes on to say that proposals for a follow-on technology program called Visi Building are being taken. The aim for this program is to go from motion detection to actual through-wall imaging - a much harder task.

Hat tip to Engadget.

Wednesday, January 04, 2006

Do some monkeys have southern accents?

National Geographic reports on the results of a study that found that Japanese macaques (Macaca fuscata) have regional accents. The researchers chose two groups of monkeys in Japan that were descended from the same original population but now reside hundreds of miles apart. They found that one group had an average 'tone' (I assume they mean 'fundamental frequency') that was 110 hertz higher than the other.

The researchers offer an intriguing acoustic explanation for the difference. One of the groups of monkeys has been living for decades in a forest while the other has been living in a rocky area with little vegetation - two very different acoustic propagation environments. High frequencies would tend to travel better than low frequencies in the forest and that correctly corresponds to the monkeys living in the forest using the higher frequency.

The researchers controlled for several factors (sex, time, type of vocalization, and activity) and analyzed a large sample of data to come up with their results. The full paper will be published in this month's Ethology (German scientific journal).