/ Masahide Sugiyama / Professor
/ Susantha Herath / Associate Professor
/ Michael Cohen / Assistant Professor
/ Minoru Ueda / Assistant Professor
Using our communication channels (sense organs: ears, mouth, eyes, nose, skin, etc) we can communicate each other, including between human and human, human and machine, and human and every information sources. Because of disability of the above channels in software or hardware sense, sometimes it becomes to be difficult for human to communicate. Research area of Human Interface Laboratory covers enhancement and generation of various human interface channels.
In order to advance the above research on human interface, we adopt the following research principle:
We promoted 5 SCCPs for students (``Social Hyper Networking", ``Visual Language for Office Processing Software", ``Speech Dialogue System", ``Computer Music", ``Non-Verbal Communication") and 3 Joint Projects (``Study of Machine Processing of Signs Generated by Hand Movements", ``Study on Speech Recognition under Noisy Environment", ``Audio Windows: Spatialization of Synthesized Speech", ``Spatialization of Music and Hierarchical Organization of Spatial Sound Sources"), and 1 Courseware project (``Speech Processing and Speech Recognition"). One of us received a commissioned research fund from NTT Human Interface Labs. on ``Audio Window" and NTT DATA on ``Study on Speech Processing Technology and Human Interface".
We exhibited our research activities in the open campus in University Festival (Oct 18th, 19th). We promoted Lab Open House for Freshmen on April 9th, 10th and 11th.
On our research activity we presented 5 refereed papers in International Conferences and academic journals.
One of members organized working group on ``Blind and Computer" and about 30 people attended to the working group (Apr. 14th, July 14th, Oct. 13th, Nov. 10th and Feb. 2nd 1997). The topics are ``TeN-yaku Hiroba and Information Network", ``The Visually Handicapped and Computer Network Access", ``Computer Training for Beginners", ``Joining ``Amedia Fair96''" and ``Hearing/Visual Aids in New NTT Technology".
We have the homepage of Human Interface Lab to open our research and education activities to the world. http://www.u-aizu.ac.jp/labs/sw-hi/.
Refereed Journal Papers
In this paper, we consider signals originated from a sequence of sources. More specifically, the problems of segmenting such signals and relating the segments to their sources are addressed. This issue has wide applications in many fields. This report describes a resolution method that is based on an Ergodic Hidden Markov Model (HMM), in which each HMM state corresponds to a signal source. The signal source sequence can be determined by using a decoding procedure (Viterbi algorithm or Forward algorithm) over the observed sequence. Baum-Welch training is used to estimate HMM parameters from the training material. As an example of the multiple signal source classification problem, an experiment is performed on unknown speaker classification. The results show a classification rate of 79% for 4 male speakers. The results also indicate that the model is sensitive to the initial values of the Ergodic HMM and that employing the long-distance LPC cepstrum is effective for signal preprocessing.
This paper presents the design and implementation techniques employed in a Japanese-to-Sinhalese machine translation (MT) system. The main result of this work is the successful application of Bunsetsu in generating meaningful translations for a flexible grammar language. The system has been developed considering the similarities between Japanese Bunsetsu and Sinhalese units. Such efforts are being focused on determining the minimum reasonable grammatical knowledge necessary for machine translation. The principal characteristics of the system, the translation process, problems encountered during the development stages, present status and future plans are discussed.
This study was performed to investigate the accuracy of performing a localization task as a function of the use of three display formats: an auditory display, a perspective display, and a perspective-auditory display. The experimental task for the perspective and perspective-auditory displays was to judge the relative azimuth and elevation which separated a computer-generated target object from a reference object. The experimental task for the auditory display was to determine the azimuth and elevation of a sound source with respect to the listener. For azimuth estimates, there was a significant effect for type of display, with worse performance resulting from the purely auditory format. Further, azimuth judgements were better for target objects which were aligned close to the major meridian orthogonal to the viewing vector. For elevation errors, there was a main effect for the type of display with worst performance for the purely auditory condition; elevation judgements were worse for larger elevation separations independent of display condition. Finally , elevation performance was superior when target images were aligned close to the major meridian orthogonal to the viewing vector. Implications of the results for the design of spatial instruments is discussed.
Refereed Proceeding Papers
As the Japanese to modern Sinhalese pair of languages is virtually unexplored in its machine translation prospective, this efforts are being focussed on determining the reasonable minimum of grammatical knowledge of Japanese necessary for obtaining intelligible modern Sinhalese output. This paper discusses the problem of the countability in machine translation (MT) from Japanese to modern Sinhalese and a method that extracts information relevant to countability from the Japanese text and combines it with knowledge about countability in Sinhalese.
Inspired by the cyclical nature of octaves and helical structure of a scale, we prepared a model of a piano-style keyboard (prototyped in Mathematica), which was then geometrically warped into a left-handed helical configuration, one octave/revolution, pitch mapped to height. The natural orientation of upper frequency keys higher on the helix suggests a parsimonious left-handed chirality, so that ascending notes cross in front of a typical listener left$\rightarrow$right. Our model is being imported (via the dxf file format) into (Open Inventor/) vrml, where it can be driven by midi events, realtime or sequenced, which stream is both synthesized (by a Roland Sound Module), and spatialized by a heterogeneous spatial sound backend (including the CRE Acoustetron II and the Pioneer Sound Field Control speaker-array System), so that the sound of the respective notes is directionalized with respect to sinks, avatars of the human user, by default in the tube of the helix.
The psfc, or Pioneer Sound Field Control System, is a dsp-driven hemispherical 14-loudspeaker array, installed at the University of Aizu Multimedia Center. Collocated with a large screen rear-projection stereographic display the psfc features realtime control of virtual room characteristics and direction of two separate sound channels, smoothly steering them around a configurable soundscape. The psfc controls an entire sound field, including sound direction, virtual distance, and simulated environment (reverb level, room size and liveness) for each source. It can also configure a dry (dsp-less) switching matrix for direct directionalization. The psfc speaker dome is about 14 m in diameter, allowing about twenty users at once to comfortably stand or sit near its sweet spot.