We show the demonstration of our research in Youtube:
We propose an algorithm for recognizing speech from a single speech wave spoken simultaneously by multiple speakers. We use a synthesized speech as a query with a category so that the recognition system works speaker independently.
As one of cognitive functions, the human brain has a function solving called “cocktail party effect” which works for understanding the meaning of a focused utterance among mixed speech simultaneously spoken by multiple speakers. The typical situation of this phenomenon is in the place of cocktail party.
The trial for engineeringly resolving the cocktail party effect is to apply the algorithm called Independent Component Analysis (ICA), which has strong potential for separating the mixed speech into a set of separated and independent speeches. The function of ICA is only separation of a speech. Therefore, the recognition of the speech is out of ICA. When applying ICA, we need basically a set of microphones of which number is equal or more than the number of speakers.
The human brain is actually realizing the function of cocktail party effect using two ears. Using two ears are not equivalent to having two microphones, but used to identify the location of a sound source in the 3D space around the person.
Therefore, we could say that there is a function realized using only a single microphone in the human brain. This indicates that the same function can be engineeringly realized by using a single microphone. ICA using many microphones for only separating a speech could be not the intrinsic resolution of the human cocktail party effect.
Our method carries out speech recognition using a query of synthesized speech, which corresponds a category, from a single speech of mixed speakers without separation of the speech.
the attacged figure shows the experimental result of keyword or key phrase segmentation-free recognition from a single speech spoken by English, Japanese , Chinese and German speakers. The query keyword and key phrase are synthesized speech. It means that our method works speaker-independently.
The patent of the method is now pending.