Next: Shape Modeling Laboratory Up: Department of Computer Previous: Multimedia Systems Laboratory

Human Interface Laboratory

/ Masahide Sugiyama / Professor
/ Susantha Herath / Associate Professor
/ Michael Cohen / Assistant Professor
/ Minoru Ueda / Assistant Professor

Using our communication channels (sense organs: ears, mouth, eyes, nose, skin, etc) we can communicate each other, including between human and human, human and machine, and human and every information sources. Because of disability of the above channels in software or hardware sense, sometimes it becomes to be difficult for human to communicate. Research area of Human Interface Laboratory covers enhancement and generation of various human interface channels.

In order to advance the above research on human interface, we adopt the following research principle:

Theoretical: Our target is human interface and our study has possibility to do try-and-error, heuristic, too practical business. Based on our experimental results, and experiences we try to establish the theory, unified insight, generalization and analytical viewpoints.
Practical: Our target is not theory generation for theory. We extract the concept, theory in order to clarify in experimental and quantitative viewpoint.

We have the following two main research topics in 1995:

Study on Communication with Handicapped
Study on Analysis and Generation of Acoustics Scene

We organized second workshop IWHIT95 on Oct.12th and 13th (International Workshop on Human Interface Technology 1995) which was sponsored by the International Affairs Committee of the University of Aizu. The workshop had 5 sessions (1. Speech and Multi-modal Interface(1), 2. Speech and Multi-modal Interface(2), 3. Speech and Multi-modal Interface(3), 4. Non-Verbal Interface, 5. Exotic Interface 6. Visual Cognition): 4 key note lectures and 16 lectures.

We promoted 6 SCCPs for students (Social Hyper Networking, Visual Language for Office Processing Software, Speech Dialogue System, Computer Music, Neural Network Modeling, Non-Verbal Communication) and 3 Joint Projects (Study of Machine Processing of Signs Generated by Hand Movements, Study on Speech Recognition under Noisy Environment, Audio Windows: Spatialization of Synthesized Speech, Spatialization of Music and Hierarchical Organization of Spatial Sound Sources), and 1 Courseware project (Speech Processing and Speech Recognition). One of us received a commissioned research fund from ATR Interpreting Telecommunication Research Labs. on ``Study on Speech Recognition System Based on Information Theory", NTT Human Interface Labs. on ``Audio Window", NTT DATA on ``Study on Speech Processing Technology and Human Interafce", and also received research grant from Ministry of Education on ``Robust Speech Recognition Using Microphone Array and Signal Source Modeling Technique".

We exhibited our research activities in the open campus in University Festival. We promoted Lab Open House for Freshmen on April 12th, 13th and 14th, and 5 labs joined.

On our research activity we presented 9 refereed papers in International Conferences and academic journals. We promoted series of HI Lab seminar.

One of members organized working group on ``Blind and Computer" and about 30 people attended to the working group (June 18th and Oct. 1st 1995). The topics are ``Computer Environment for Blind People", ``Walk Training System for Blind People", ``Computer with speech synthesizer", ``On Tenzi Printer" and ``Computer based Tenzi Translation".

We started to provide Human Interface Lab WWW information to open our research and education activities to the world. http://www.u-aizu.ac.jp/labs/sw-hi/.

Refereed Journal Papers

S. Herath, Y. Hyodo, Y. Kunieda, T. Ikeda, and S. Herath. Bunsetsu-based japanese---sinhalese translation system. Information Sciences, 90:303--319, March 1995.
This paper presents the design and implementation techniques employed in a Japanese-to-Sinhalese machine translation system. The main result of this work is the successful application of Bunsetsu in generating meaningful translations for a flexible grammar language. The system has been developed considering the similarities between Japanese Bunsetsu and Sinhalese units. Such efforts are being focused on determining the minimum reasonable grammatical knowledge necessary for machine translation. The principal characteristics of the system, the translation process, problems encountered during the development stages, present status and future plans will be discussed.
Michael Cohen and Nobuo Koizumi. Virtual gain for audio windows. Presence: Teleoperators and Virtual Environments, 5(3), Aug. 1996.
Audio windowing is a frontend, or user interface, to an audio system with a realtime spatial sound backend. Besides the directionalization % dollar signs around parens to avoid italicization by a digital signal processor dsp, gain adjustment is used to control the volume of the various mixels ([sound] mixing elements). Virtual gain can be synthesized from components derived from collective iconic size, mutual distance, orientation and directivity, and selectively enabled according to room-wise partitioning of sources across sinks. This paper describes a mathematical derivation of virtual gain, and outlines the deployment of these calculations in an audio windowing system.

Refereed Proceeding Papers

S. Herath, T. Ikeda, and A. Herath. Bunsetsu based japanese-sinhalese translation system and hadling ga and wa. In Ishikawa and Nitta, editors, Proceedings of International Translation Studies Conference, pages 153--162, May 1995.
The similarities between Sinhalese units (SU) and Japanese bunsetsu (JB) are discussed and analyzed. Methodology used on the MT process, problems encountered and solutions are discussed. Complicated classical Sinhalese is excluded.
S. Herath, A. Herath, T. Ikeda, and S.Ishizaki. Subject determine algorithm for sinhalese: A flexible word order language. In Proceedings of SNLP'95, pages 237--247, Aug 1995.
Flexible word order languages need a mechanism to identify the subject of a sentence in language processing. Some languages has no distinction of subject and the object in some cases. In this paper, a algorithm is proposed to find out the subject of a sentence in Sinhalese, a flexible word order language.
Michael Cohen. Audio windows for synchronous and asynchronous conferencing. In Harry J. Murphy, editor, Proc. Virtual Reality and Persons with Disabilities, San Francisco, CA, Aug. 1995.
Spatial sound is the presentation of audio channels with positional attributes. Dsp-synthesized spatial sound, driven by even a simple positional database, can denote directional cues useful to a sight-impaired user. Augmented reality describes hybrid presentations that overlay computer-generated imagery on top of real scenes. Augmented audio reality extends this notion to include sonic effects, overlaying artificially spatialized sounds on a natural environment. Maw (acronymic for multidimensional audio windows) is a NextStep-based audio windowing system deployed as a binaural directional mixing console, capable of presenting such augmented audio reality spatial sound cues. By associating spatialized sound with natural directions, sight-impaired users can leverage off intuitive mental spatial models to identify sound sources and segregate audio streams. Applications of audio windows to asynchronous communication (like voicemail) or synchronous applications (like distributed realtime groupware) generalize traditional telephone answering machines and teleconferencing. Rotating and non-omnidirectional sources and sinks allow selective attention, and motivate deployment with extensions like a chair tracker or a hemispherical speaker array, which allow soundscape stabilization.
Michael Cohen. Besides immersion: Overlaid points of view and frames of reference--- using audio windows to analyze audio scenes. In Susumu Tachi, editor, Proc. ICAT/VRST: Int. Conf. Artificial Reality and Tele-Existence/Conf. on Virtual Reality Software and Technology, Makuhari, Chiba, Nov. 1995.
Alternative non-immersive perspectives enable new paradigms of perception, especially in the context of frames-of-reference for musical audition and groupware. maw (acronymic for multidimensional audio windows) is an application for manipulating sound sources and sinks in virtual rooms, featuring an exocentric graphical interface driving an egocentric audio backend. Listening to sound presented in such a spatial fashion is as different from conventional stereo mixes as sculpture is from painting. Schizophrenic virtual existence suggests sonic (analytic) cubism, presenting multiple acoustic perspectives simultaneously. Clusters can be used to hierarchically organize mixels, [sound] mixing elements. New interaction modalities are enabled by this sort of perceptual aggression and liquid perspective. In particular, virtual concerts may be ``broken down'' by individuals and groups.
Michael Cohen and Nobuo Koizumi. Audio windows for virtual concerts i. In Susumu Tachi, editor, Video Proc. ICAT/VRST: Int. Conf. Artificial Reality and Tele-Existence/Conf. on Virtual Reality Software and Technology, Makuhari, Chiba, Nov. 1995.
Maw (acronymic for multidimensional audio windows) is an interface for manipulating iconic sound sources and sinks in virtual rooms. Implemented as a NextStep-based application which can drive a heterogeneous combination of internal and external sound spatializers, Maw is suitable for synchronous applications like teleconferences or concerts, as well as asynchronous applications like voicemail and hypermedia. Maw's main view is a top-down dynamic map of iconic sources and sinks in a virtual room. The sources, which might correspond to voices in a teleconference, are sound emission channels. The sink is a sound receptor, a delegate of the human listener in the virtual room. For this demonstration, the sources are musical, synthesized by a sound module driven off a acronym{midi} sequencer. The source$\rightarrow$sink spatialization is performed by {dsp} modules, which convolve the digitized input stream with {hrtf}s (head-related transfer functions) that capture directional effects. Gain, which controls volume, is adjusted according to distance, direction, directivity, and size of the source and sink. Maw's sources can move around, in response to mouse actions, keyboard arrows, menu commands, or entering data into numeric panels. The sink may also move, motivated by the same suite of manipulation techniques, or via user position updates, as strobed by a chair tracker (not shown) that uses a Polhemus sensor to gauge orientation. Keywords: binaural directional mixing console, cscw (computer-supported collaborative work), groupware, mixel ([sound] mixing element), spatial sound.
Michael Cohen and Nobuo Koizumi. Audio windows for virtual concerts ii: Sonic cubism. In Susumu Tachi, editor, Video Proc. ICAT/VRST: Int. Conf. Artificial Reality and Tele-Existence/Conf. on Virtual Reality Software and Technology, Makuhari, Chiba, Nov. 1995.
Maw enables multiple auditory presence, overlaying soundscapes via the superposition of multiple sinks. This allows audio windowing to present multiple acoustic perspectives simultaneously. The user, iconified by multiple sinks, can leave a `pair of ears' in one strategic location, while placing another virtual pair somewhere else... This feature can be used to sharpen the granularity of control of spatialization. In a groupware environment there may be inhibitions on relocating sources shared by others. But multiple sinks allow one to monitor a main conference while attending a separate sub-caucus. In this video, the user wants to pay close attention to multiple musical channels. Anticipating level difference localization, each source is spatialized only with respect to the loudest sink, so that a listener's perception of a source depends on which of the (possibly multiple) sinks can best hear it. The experience of being in multiple places simultaneously, like all virtual situations, may define its own rules. A psychophysical interpretation is important as an interface strategy, making the system behavior consistent with users' intuitions, artificial but accessible. The overlaid existence suggests the name given to this effect: sonic (analytic) cubism, presenting multiple simultaneous acoustic perspectives. Being anywhere is better than being everywhere, since it is selective; Maw's schizophrenic mode is distilled ubiquity: (groupware-enabled) accommodation of multiple objects of regard. Keywords: binaural directional mixing console, groupware, mixel ([sound] mixing element), sonic (analytical) cubism, spatial sound.
M. Sugiyama. Speech database retrieval using speech key. In Proc. of ICSPAT95, Oct. 1995.
This paper describes the speech database retrieval (search) using speech key. Here, speech key means retrieval key generated from assigned speech data, and corresponds to specified speaker, text, gender, language, etc which are contained in speech. Recently we have various large-scale speech database which have no tools nor information for data retrieval. As a first step of speech retrieval, database retrieval using speaker individuality is studied. The result shows that speaker retrieval error rate is about 4% for 5 word length input speech using VQ codebook with 16 - 32 codes which are produced by 10 word length speech.

Unrefereed Papers

Michael Cohen. Besides immersion: Overlaid points of view and frames of reference; using audio windows to analyze audio scenes. In 3D Forum: The Journal of Three Dimensional Images, pages 21--30. The Forum for Advancement of Three Dimensional Image Technology and the Arts, Nov. 1995.
M. Sugiyama. Fast speech data retrieval using speech key. In Proc. of ASJ Spring Meeting, pages 199--200. ASJ, March 1995.

Technical Reports

Michael Cohen and Elizabeth M. Wenzel. The design of multidimensional sound interfaces. Technical Report, 95-1-004, February, 59pgs, The University of Aizu, Aizu-Wakamatsu, Japan, 1995.
Michael Cohen. Multimedia is for everyone: Virtual reality and telecommunication conferences, concerts, and cocktail parties. Technical Report, 95-1-005, February, 27pgs, The University of Aizu, Aizu-Wakamatsu, Japan, 1995.
Michael Cohen. Multimedia is for everyone: Virtual reality and telecommunication conferences, concerts, and cocktail parties. Technical Report, 95-1-015, May, 29pgs, The University of Aizu, Aizu-Wakamatsu, Japan, 1995.

Grants

Michael Cohen. NTT Research Grant: \ 600,000, Company Fund, Computer Science, 1995.
Masahide Sugiyama. Robust Speech Recognition Using Microphone array and Signal Source Modeling Technique, Research Grant of Ministry of Education, Important Research (1), No. 06232102, 1995.

Academic Activities

Susantha Herath, 1995. IEEE Coordinator (1993.4 -).
Susantha Herath, 1995. Member of the Review Board for the International Journal of Applied Intelligence (1992.5 -).
Susantha Herath, 1995. Organizing Chair of second International Workshop in Human Interface Technology '95 (IWHIT'95) 10/11-13 and 12/19.
Susantha Herath, 1995. Co-editing the Proceedings of second International Workshop in Human Interface Technology '95 (IWHIT'95) 10/11-13 and 12/19.
Michael Cohen, 1995. Referee for IEICE Trans. on Fund. Electronics, Communications and Comptuer Sciences A Proposal of Five-Degree-of-Freedom 3D Nonverbal Voice Interface.
Michael Cohen, 1995. Referee for Presence: Teleoperators and Virtual Environments (MIT Press) Some Perspectives on Preformed Sound and Music in Virtual Environments.
Michael Cohen, 1995. Referee for CISMOD95- Conf. on Information Systems and Management of Data Implementation of a Graphical User Interface for Object-Oriented Databases.
Michael Cohen, 1995. IEEE VR Terminology Committee Member.
Michael Cohen, 1995. VRML Audio Advisory Board.
Masahide Sugiyama, 1995. IEICE and ASJ Referee.
Masahide Sugiyama, 1995. Chairman of sessions in ASJ and IEICE confereces.
Masahide Sugiyama, 1995. Member of the Speech Processing Committee in IEICE and ASJ (1994.5 - ).
Masahide Sugiyama, 1995. Member of Tohoku Regional Board of IEICE and ASJ.
Masahide Sugiyama, 1995. Member of Committee of Sign Linguistic Technology Research.

Others

Masahide Sugiyama, 1995. Commissioned research fund from NTT Human Interface Labs.
Masahide Sugiyama, 1995. Commissioned research fund from NTT DATA.
Masahide Sugiyama, 1995. Commissioned research fund from ATR Interpreting Telecommunication Research Labs.

Next: Shape Modeling Laboratory Up: Department of Computer Previous: Multimedia Systems Laboratory

www@u-aizu.ac.jp
November 1996