/ Masahide Sugiyama / Professor
/ Susantha Herath / Associate Professor
/ Michael Cohen / Assistant Professor
/ Minoru Ueda / Assistant Professor
Using our communication channels (sense organs: ears, mouth, eyes, nose, skin, etc) we can communicate each other, including between human and human, human and machine, and human and every information sources. Because of disability of the above channels in software or hardware sense, sometimes it becomes to be difficult for human to communicate. Research area of Human Interface Laboratory covers enhancement and generation of various human interface channels.
In order to advance the above research on human interface, we adopt the following research principle:
We have the following two main research topics in 1995:
We organized second workshop IWHIT95 on Oct.12th and 13th (International Workshop on Human Interface Technology 1995) which was sponsored by the International Affairs Committee of the University of Aizu. The workshop had 5 sessions (1. Speech and Multi-modal Interface(1), 2. Speech and Multi-modal Interface(2), 3. Speech and Multi-modal Interface(3), 4. Non-Verbal Interface, 5. Exotic Interface 6. Visual Cognition): 4 key note lectures and 16 lectures.
We promoted 6 SCCPs for students (Social Hyper Networking, Visual Language for Office Processing Software, Speech Dialogue System, Computer Music, Neural Network Modeling, Non-Verbal Communication) and 3 Joint Projects (Study of Machine Processing of Signs Generated by Hand Movements, Study on Speech Recognition under Noisy Environment, Audio Windows: Spatialization of Synthesized Speech, Spatialization of Music and Hierarchical Organization of Spatial Sound Sources), and 1 Courseware project (Speech Processing and Speech Recognition). One of us received a commissioned research fund from ATR Interpreting Telecommunication Research Labs. on ``Study on Speech Recognition System Based on Information Theory", NTT Human Interface Labs. on ``Audio Window", NTT DATA on ``Study on Speech Processing Technology and Human Interafce", and also received research grant from Ministry of Education on ``Robust Speech Recognition Using Microphone Array and Signal Source Modeling Technique".
We exhibited our research activities in the open campus in University Festival. We promoted Lab Open House for Freshmen on April 12th, 13th and 14th, and 5 labs joined.
On our research activity we presented 9 refereed papers in International Conferences and academic journals. We promoted series of HI Lab seminar.
One of members organized working group on ``Blind and Computer" and about 30 people attended to the working group (June 18th and Oct. 1st 1995). The topics are ``Computer Environment for Blind People", ``Walk Training System for Blind People", ``Computer with speech synthesizer", ``On Tenzi Printer" and ``Computer based Tenzi Translation".
We started to provide Human Interface Lab WWW information to open our research and education activities to the world. http://www.u-aizu.ac.jp/labs/sw-hi/.
Refereed Journal Papers
This paper presents the design and implementation techniques employed in a Japanese-to-Sinhalese machine translation system. The main result of this work is the successful application of Bunsetsu in generating meaningful translations for a flexible grammar language. The system has been developed considering the similarities between Japanese Bunsetsu and Sinhalese units. Such efforts are being focused on determining the minimum reasonable grammatical knowledge necessary for machine translation. The principal characteristics of the system, the translation process, problems encountered during the development stages, present status and future plans will be discussed.
Audio windowing is a frontend, or user interface, to an audio system with a realtime spatial sound backend. Besides the directionalization % dollar signs around parens to avoid italicization by a digital signal processor dsp, gain adjustment is used to control the volume of the various mixels ([sound] mixing elements). Virtual gain can be synthesized from components derived from collective iconic size, mutual distance, orientation and directivity, and selectively enabled according to room-wise partitioning of sources across sinks. This paper describes a mathematical derivation of virtual gain, and outlines the deployment of these calculations in an audio windowing system.
Refereed Proceeding Papers
The similarities between Sinhalese units (SU) and Japanese bunsetsu (JB) are discussed and analyzed. Methodology used on the MT process, problems encountered and solutions are discussed. Complicated classical Sinhalese is excluded.
Flexible word order languages need a mechanism to identify the subject of a sentence in language processing. Some languages has no distinction of subject and the object in some cases. In this paper, a algorithm is proposed to find out the subject of a sentence in Sinhalese, a flexible word order language.
Spatial sound is the presentation of audio channels with positional attributes. Dsp-synthesized spatial sound, driven by even a simple positional database, can denote directional cues useful to a sight-impaired user. Augmented reality describes hybrid presentations that overlay computer-generated imagery on top of real scenes. Augmented audio reality extends this notion to include sonic effects, overlaying artificially spatialized sounds on a natural environment. Maw (acronymic for multidimensional audio windows) is a NextStep-based audio windowing system deployed as a binaural directional mixing console, capable of presenting such augmented audio reality spatial sound cues. By associating spatialized sound with natural directions, sight-impaired users can leverage off intuitive mental spatial models to identify sound sources and segregate audio streams. Applications of audio windows to asynchronous communication (like voicemail) or synchronous applications (like distributed realtime groupware) generalize traditional telephone answering machines and teleconferencing. Rotating and non-omnidirectional sources and sinks allow selective attention, and motivate deployment with extensions like a chair tracker or a hemispherical speaker array, which allow soundscape stabilization.
Alternative non-immersive perspectives enable new paradigms of perception, especially in the context of frames-of-reference for musical audition and groupware. maw (acronymic for multidimensional audio windows) is an application for manipulating sound sources and sinks in virtual rooms, featuring an exocentric graphical interface driving an egocentric audio backend. Listening to sound presented in such a spatial fashion is as different from conventional stereo mixes as sculpture is from painting. Schizophrenic virtual existence suggests sonic (analytic) cubism, presenting multiple acoustic perspectives simultaneously. Clusters can be used to hierarchically organize mixels, [sound] mixing elements. New interaction modalities are enabled by this sort of perceptual aggression and liquid perspective. In particular, virtual concerts may be ``broken down'' by individuals and groups.
Maw (acronymic for multidimensional audio windows) is an interface for manipulating iconic sound sources and sinks in virtual rooms. Implemented as a NextStep-based application which can drive a heterogeneous combination of internal and external sound spatializers, Maw is suitable for synchronous applications like teleconferences or concerts, as well as asynchronous applications like voicemail and hypermedia. Maw's main view is a top-down dynamic map of iconic sources and sinks in a virtual room. The sources, which might correspond to voices in a teleconference, are sound emission channels. The sink is a sound receptor, a delegate of the human listener in the virtual room. For this demonstration, the sources are musical, synthesized by a sound module driven off a acronym{midi} sequencer. The source$\rightarrow$sink spatialization is performed by {dsp} modules, which convolve the digitized input stream with {hrtf}s (head-related transfer functions) that capture directional effects. Gain, which controls volume, is adjusted according to distance, direction, directivity, and size of the source and sink. Maw's sources can move around, in response to mouse actions, keyboard arrows, menu commands, or entering data into numeric panels. The sink may also move, motivated by the same suite of manipulation techniques, or via user position updates, as strobed by a chair tracker (not shown) that uses a Polhemus sensor to gauge orientation. Keywords: binaural directional mixing console, cscw (computer-supported collaborative work), groupware, mixel ([sound] mixing element), spatial sound.
Maw enables multiple auditory presence, overlaying soundscapes via the superposition of multiple sinks. This allows audio windowing to present multiple acoustic perspectives simultaneously. The user, iconified by multiple sinks, can leave a `pair of ears' in one strategic location, while placing another virtual pair somewhere else... This feature can be used to sharpen the granularity of control of spatialization. In a groupware environment there may be inhibitions on relocating sources shared by others. But multiple sinks allow one to monitor a main conference while attending a separate sub-caucus. In this video, the user wants to pay close attention to multiple musical channels. Anticipating level difference localization, each source is spatialized only with respect to the loudest sink, so that a listener's perception of a source depends on which of the (possibly multiple) sinks can best hear it. The experience of being in multiple places simultaneously, like all virtual situations, may define its own rules. A psychophysical interpretation is important as an interface strategy, making the system behavior consistent with users' intuitions, artificial but accessible. The overlaid existence suggests the name given to this effect: sonic (analytic) cubism, presenting multiple simultaneous acoustic perspectives. Being anywhere is better than being everywhere, since it is selective; Maw's schizophrenic mode is distilled ubiquity: (groupware-enabled) accommodation of multiple objects of regard. Keywords: binaural directional mixing console, groupware, mixel ([sound] mixing element), sonic (analytical) cubism, spatial sound.
This paper describes the speech database retrieval (search) using speech key. Here, speech key means retrieval key generated from assigned speech data, and corresponds to specified speaker, text, gender, language, etc which are contained in speech. Recently we have various large-scale speech database which have no tools nor information for data retrieval. As a first step of speech retrieval, database retrieval using speaker individuality is studied. The result shows that speaker retrieval error rate is about 4% for 5 word length input speech using VQ codebook with 16 - 32 codes which are produced by 10 word length speech.