Sunday, January 29, 2006
After the hype of a few years ago quieted down, speech recognition technology has been progressing slowly but steadily with implementations being fielded, tinkered with, and re-tried until useful applications emerged in the darwinian world of day-to-day business. Just try making an airline reservation over the phone or calling a major credit/charge card company and you will see what I mean.
Some pretty impressive achievements are claimed in the article about not only speech recognition but language translation. English-Chinese translation is being addressed by a project called MASTOR (Multilingual Automatic Speech-to-Speech Translator) while on-the-fly translation of Arabic television to create English subtitles is tackled by project Tales. Tales is said to achieve an accuracy rate of between 60 and 70 percent with a four minute lag time (for processing) and 80 percent with a longer one. In an article like this, one usually doesn't get enough details to satisfy one's technical curiosity - such as how is 'accuracy' defined and exactly what conditions it achieved those results - but even if these numbers are a little too optimistic, it is still more than good enough to 'gist' Arabic in almost in real-time.
Saturday, January 28, 2006
The reason is that filtering compressed images generally causes the compression artifacts (little 'errors' left behind after removing the image detail) to get amplified, thereby making the filtered image worse than the original compressed one. My attempts (on real-life video, not simulated images) have shown that image stabilization, brightness and contrast are about the only filters that can reliably work on (lossy) compressed video. Edge contrast enhancement and other such techniques that work very well on uncompressed images do not work on compressed ones (at least none of the ones I've seen so far).
Tuesday, January 24, 2006
You may have to scroll down to find the article...
Image: New Line Cinema
(Hat-tip: SciTech Daily)
Thursday, January 19, 2006
I came across this on the Howard Hughes Medical Institute site. The article goes on to talk about how the visual systems of other species are different. For instance, humans detect motion with their brains while simple vertibrates, such as frogs, do it with their retinas instead (I didn't know that).
The patient had great difficulty pouring coffee into a cup. She could clearly see the cup's shape, color, and position on the table, she told her doctor. She was able to pour the coffee from the pot.
But the column of fluid flowing from the spout appeared frozen, like a waterfall turned to ice. She could not see its motion. So the coffee would rise in the cup and spill over the sides.
More dangerous problems arose when she went outdoors. She could not cross a street, for instance, because the motion of cars was invisible to her: a car was up the street and then upon her, without ever seeming to occupy the intervening space.
Even people milling through a room made her feel very uneasy, she complained to Josef Zihl, a neuropsychologist who saw her at the Max Planck Institute for Psychiatry in Munich, Germany, in 1980, because "the people were suddenly here or there but I did not see them moving."
The woman's rare motion blindness resulted from a stroke that damaged selected areas of her brain.
Pretty cool stuff. I recommend reading the whole article. By the way, the site also has several more interesting pages on sight and hearing.
The RCFL concept is a bit of a departure from the old model of separate forensic labs at the federal, state, and local levels, with the tougher cases being sent up the chain when lower level skill and/or resources were exceeded. The RCFL concept places additional equipment, training, and expertise out in the regions to handle things more locally. Of course, the central federal labs are still there to provide national level skills and resources when needed.
I've had the opportunity to meet and train RCFL personnel and have been positively impressed with the ones I've met thus far. Although it will take some years for the new labs to grow into themselves, so to speak, the collaborative regional concept holds the potential of significant benefits from local leadership, as well as the distribution of workload to the appropriate levels of the nation's laboratory 'network'.
Image: Photo from the Rocky Mountain RCFL Opening (L-R RMRCFL Director, C.Buechner, FBI OTD Assistant Director K.Haynes)
(Hat tip: Nieuwsbank)
Tuesday, January 17, 2006
Over the years, I've had a hand in developing automatic video stabilization filters. Simply put, what these filters do is automatically recognize the motion of the subject in the video, then correct (i.e. re-register) each field by moving the field image up/down/sideways to make the subject line up in the same position as in the previous field, then go to the next field and repeat.
What has been done in this Bigfoot GIF is to combine stabilization with video mosaicking (i.e. making a bigger picture out of a collection of video frames). This is a powerful technique to use on aerial video of the ground or, for that matter, Bigfoot sightings... So, do you think it is just a guy in a suit or not?
(Hat tip: Digg)
Hat tip: Contactless News
I emphasized the audio and video related items with a bold font in the quotation.
"Some 30 kilograms of explosives, dozens of guns, a suicide bombers vest, and a videotaped last will and testament were confiscated in raids on three apartments being rented out by the two suspects arrested in October.
The video tape shows the two men asking God for forgiveness for the sacrifice they were about to make. The two suspects are also shown making bombs, including one planted in a lemon and another planted in a tennis ball.
Police also found face masks worn by two of the suspects in the videotape and hair samples from those face masks believed to belong to one of the suspects. However, Bosnian forensics teams do not have the technical capability to analyze these samples.
The Bosnian authorities sent the videotape, a video camera, and other evidence to the US Federal Bureau of Investigation (FBI), which sources said were analyzing the samples.
The FBI’s forensic tests had shown that the video camera was the same one used to record the confiscated video tape containing the bombing-making evidence and the last will and testament. The hair analysis is expected later this month. The voice samples from the videotape could not be verified, the Bosnian police source said.
The source also said that comparing the number of weapons found in the apartment rented by Bektasevic and Abdulkadir with the number of weapons seen on the videotape, it was clear that weapons and explosives were still unaccounted for."
Having both the original tape and the recorder allowed the examiners to match them up. This can be done by different means, depending on the type of recorder, including using ferrofluid and a microscope to look at the magnetic tape (i.e. 'rusty plastic') to see the the patterns left by the head while erasing/writing.
The examiners reportedly weren't able to succesfully perform speaker ID (identification). Typically, this can be due to several factors, including: muffling of the speech (in this case by the masks they were wearing), interfering noise, or insufficient speech quality (e.g. level, dynamic range, frequency response, bandwidth).
This case also highlites how forensic units around the world can be of assistance to one another, as well as how proper equipment and techniques can yield valuable clues and evidence.
"discontinuing production of all large format Nikkor lenses and enlarging lenses, as well as several of our film camera bodies, manual focus Nikkor interchangeable lenses and related accessories. Sales of these products will cease as supplies are depleted."Tempus fugit (sometimes translated as "Time marches on"). I wonder if there is a latin translation for "technology marches on"?
(Hat tip: Metafilter)
Saturday, January 14, 2006
In case you aren't aware of it, US privacy laws place restrictions on taking pictures or making videos based on a person's reasonable expectations of privacy. There are also restrictions about taking pictures of military installations and the like - and after 911, we can all appreciate that.
(Note: I am not a lawyer and can not give legal advice. Like Kantor says, I am researching and reporting, not giving a legal opinion, so take my comments as holding no legal weight whatsoever.)
Hat tip Geek Press.
Friday, January 13, 2006
Hat tip Technovelgy.
The complex transaction will result in the creation of a dominant biometric ID platform and a company with a new name--to be released later--and a new headquarters in Stamford, Conn.
Viisage will contribute stock worth about $770 million in the deal and the firm's current chairman Robert V. LaPenta will become chairman and chief executive officer of the combined company. Dr. Joseph J. Atick, now chief executive officer of Identix, will become chief strategic officer and vice chairman of the board.
In a statement, LePenta said: "The combination of Identix' advanced multi-biometric search technology with Viisage's expertise in secure credentials, document authentication, and verification, will create a global leader in biometric security, providing end-to-end identity solutions for state, local, national, and foreign government use, as well as a wide application across the commercial sector."
The firm said it expects the merged enterprise will record revenue of $220 million in 2006. The companies said the merger will enable them to compete for 80 percent of the market they address.
Hat tip InformationWeek
Before I go on, let me say that the comments I am about to make are general ones using this particular case as a taking-off point instead of an example. I say this because I have not seen the originals and therefore can not say for certain that my comments will apply to this particular case. What I do know from personal experience is that in the majority of similar cases, the live video has very good resolution. It is the archived (i.e. stored) video that is obtained by detectives and examiners that is the problem. How can this be?
Well, imagine that you are a proverbial fly-on-the-wall during the procurement of a CCTV system for a company or municipality. System specifications, performance, and cost are obviously key factors in selecting the vendor and particular system configuration. Vendors 'A' and 'B' both separately present similar systems - the same numbers of cameras, warranty, help desk support, and so on. During the live demonstration, you (the fly) see crystal clear, ID-quality color pictures of people on the street and the operator can pan-tilt-zoom the cameras to follow specific individuals. "Great, I'll have one of those!", you think. But when the time comes to tally up the bill, one of the vendors inevitably bids a bit lower or, if not, the purchasing agent negotiates them down. In either case, where do you think is an easy place to make a quick cost savings? By cutting the number of cameras? Certainly not - that could mean not covering all of the area. Going with black-and-white? No again. Squeezing more out of the video storage? Now you're talking. "How can they do that?", you might ask if you could, but you can't, because you're a fly, remember? Don't they need to archive so many days' worth of images to meet the specification? Yes, but by turning the 'Wonder Knob' (i.e. the compression ratio selector) all of the sudden they can fit twice as much video into half the space and what is seen on the live monitor doesn't change a bit. Money saved - job done. Just don't go and look at the archived images. Compressing the images almost always means throwing significant information away, in the form of entire frames, picture detail, or both. How else can you fit the same number of days of storage into a smaller space?
Time passes. The system is installed and everything seems to be going along just fine until a crime occurs and the police show up to recover the video. Then, lo and behold, the archived video looks no where near as good as it did when it was live. Now the police have a distant, slightly fuzzy image of a probable male suspect in poor lighting. Then when they zoom in for a close-up, the face becomes pixelated and nearly unuseable. That sinking feeling they experience can't be very pleasant, even when it was mostly expected...
Seems a bit shameful after all that money spent on the CCTV system, doesn't it?
If any CCTV system managers or operators are reading this, please go turn the 'Wonder Knob' back down. The police will thank you and so will the public. On top of that, you will probably save yourself an unpleasant question-and-answer session with your supervisor when an event does eventually occur.
(image source: Ft. Lauderdale Police Department, Florida, USA)
This reportedly makes DCFL the largest accredited lab of its type in the world. DCFL was spun up some years ago by merging a unit of the US Air Force's Office of Special Investigations (AF/OSI) with resources from the Army and Navy, if memory serves.
In case you are wondering what 'computer forensics' has to do with sound and light, DCFL's mission also includes cell phone, pda, audio, and video forensics. As noted in previous posts, this is the case in many organizations.
External, independent accreditation is now being sought, or at least considered, by most US audio/video forensic labs. This wasn't done in the past, but electronic/computer media labs have started becoming more like their 'wet lab' brethren and instituted drives for certification/accreditation. This is not to say that the labs did not approach their responsibilities professionally before. All US and European labs, in my experience, developed and maintained their skills, processes, and procedures through a combination of their own in-house training (including examination and mock-jury trials), vendor training (provided by equipment manufacturers and former professional examiners), law enforcement association training seminars (e.g. NATIA training seminars), and red-blue team comparisons. (Disclaimer: I have been and continue to be involved in forensic R&D, training, and red-blue teaming in my day job). Accreditation will now document and subject the processes and procedures to external review and certification.
As the saying goes, nothing is constant except change...
Hat tip to Government Computer News.
Wednesday, January 11, 2006
While surfing the Internet, I came across a neat site run by the Department of Physics and Astronomy at Georgia State University called HyperPhysics. It has well executed sections on my favorite topics (i.e. sound & hearing AND light & vision), as well as many others. Basic and advanced principles are illustrated with clear, informative graphics and videos. I particularly liked the movie of the Kundt Tube Experiment for showing standing waves using cork shavings in a clear tube (see picture). Enjoy.
PS. GSU also runs a sister site called HyperMath that is similarly well done. You can navigate to it from the above link.
Monday, January 09, 2006
The 5 January issue of Nature reports findings by astronomers from MIT and the Paris Observatory that add to the little that we know about Charon, Pluto's 'moon', as it is commonly referred to (although its status as a moon is disputed by some).
Using the event of Charon occluding (blocking) a distant star, the astronomers were able to determine several things about Charon. When Charon passed in front of the star, the light was immediately blocked, which implies the absence of a significant atmosphere (else it would have faded out more gradually). By observing the occlusion from two locations, they were able to tell its diameter. Then, by combining knowledge of its size with prior estimates of its mass, they were able to estimate its density and further, its composition (60% rock and 40% ice). Now that is an impressive amount of logical deduction from what amounts to using a stop watch to time a blinking (star) light.
Both teams of astronomers posted video (Paris and MIT). Be careful on the MIT clip as it crashed both Firefox (1.5) and IE for me.
Hat tip ScienceNOW Daily News.
Sunday, January 08, 2006
For an adaptive filter to change its own settings, it must be given some sort of rule to follow so that it will know which sounds to throw away and which to keep. The rule may be as simple as 'mute when the sound level goes very low to remove any left over hiss' or ' turn the gain down if the signal gets too loud and may clip'. At the other extreme, the rule can be as complex as 'figure out when no one is speaking, assume what is left is all noise, and then change your own settings to remove the noise even when the speech comes back'.
At this point, you might can see some of the potential pitfalls in using adaptive filters - if they make wrong judgements then they can accidentally remove desired speech along with the undesired noise. Or they can make the speech sound very sterile, like it was recorded in an anechoic chamber. Or they can go hay-wire and create static and distortion. For these reasons, forensic examiners need to follow the central principle of the physicians' Hippocratic Oath when using these powerful filters - first, do no harm.
If they can be so dangerous, then why use adaptive filters at all? The reason is that many common noises can not be removed completely, or even at all, using fixed filters. Take echo, for instance. Real-world echoes change constantly. Every time the speaker turns his head or a door opens, the echo paths in a room shift around. What it all boils down to is if the character of the noise changes, then a filter that changes itself is often required to remove it. If you want to be an expert audio forensic technician or examiner, mastery of adaptive filters is a must.
Saturday, January 07, 2006
So, how can one legitimately respond to such a question? First, I ask a few questions, such as:
- Was it legally recorded? If yes, was it a one-party or two-party consent? If no, they might need to be contacting legal counsel and not me.
- Is it a criminal, civil, or professional matter? If criminal, have they already contacted the appropriate authorities?
- What kind of recorder is it (e.g. pocket recorder, cassette deck, or answering machine)?
- What is the noise/interference like (e.g. hum, hiss, another talker, music, machinery, pops/clicks, or mobile/cell phone)?
- How loud is the desired voice relative to the noise?
- Is the original recording available? If so, what type of media is it on?
- Is there a digital copy that could be emailed for an informal evaluation?
There is no way to avoid the uncertainty due to differences in terminology, technical savvy, and the like, but going through this question and answer process usually does let me give them some useful feedback and some confidence in whether it is likely worthwhile to proceed or not.
Universe Today has posted a downloadable 400+ page astronomy guide called What's Up 2006 (pdf - 13.5 MB). Each daily entry has suggested skywatching targets and techniques, plus a photo and facts. Both binocular and telescope users will find plenty of interest.
Bloggers have noticed that some of the dates don't match up to the correct days of the week, but, that aside, it looks great. I'm certainly enjoying it.
Hat tip SlashDot.
The DARPA (US Defense Advanced Research Projects Agency) Special Projects Office has announced Radar Scope, a nifty gadget that acts like a 'stud finder' for motion on the other side of a wall. The DARPA prototype apparently can 'see' through 12 inches (30.5 cm) of concrete and then an additional fifty feet (15.24 meters) into a room. The article also says that it is sensitive enough to detect even the motion of breathing. The latter claim is particularly impressive. I'm curious to know whether it is detecting some change in the person's chest cavity or getting a reflection off of dog tags or some other body-worn or -carried article.
Even though DARPA is best known for its cutting-edge research, this technology is believed to be mature enough that their plan is to field it in Iraq at the squad level this coming spring. From the picture, the device housing looks field-ready and the typical gotcha for many advanced electronic technologies in the field (namely, the power source) looks to be a non-problem since it runs off of AA batteries.
This capability seems like it could be a nice complement to the existing ways of checking a room remotely - polecams (camera on a pole held up to the window), climbing robots, and contact mics (or accelerometers).
The article goes on to say that proposals for a follow-on technology program called Visi Building are being taken. The aim for this program is to go from motion detection to actual through-wall imaging - a much harder task.
Hat tip to Engadget.
Wednesday, January 04, 2006
The researchers offer an intriguing acoustic explanation for the difference. One of the groups of monkeys has been living for decades in a forest while the other has been living in a rocky area with little vegetation - two very different acoustic propagation environments. High frequencies would tend to travel better than low frequencies in the forest and that correctly corresponds to the monkeys living in the forest using the higher frequency.
The researchers controlled for several factors (sex, time, type of vocalization, and activity) and analyzed a large sample of data to come up with their results. The full paper will be published in this month's Ethology (German scientific journal).