/ Masahide Sugiyama / Professor
/ Susantha Herath / Associate Professor
/ Michael Cohen / Assistant Professor
/ Minoru Ueda / Assistant Professor
Using our communication channels (sense organs: ears, mouth, eyes, nose, skin, etc) we can communicate each other, including between human and human, human and machine, and human and every information sources.
Research area of Human Interface Laboratory covers enhancement and generation of various human interface channels.
We have the following two main research topics:
In order to achieve our topics, we continued to establish the experimental environment: We introduced 3 HP workstations (2 HP9000/712, 715) and the server/client type speech recognition system implemented before installed on them to do parallel processing of speech recognition. Also a speech synthesizer was connected to the workstation (Sun/S10) to develop a man machine dialogue system, also this is very useful to communication between blind people and computers. To Develop of visual programming language for 4GL a workstation (Sun/Spark) has been introduced. For Audio Window (virtual acoustics) research a NeXT workstation and convolution engine have been introduced.
In order to encourage the community of Sign Language Research, we promoted a workshop on Aug. 9th in the University of Aizu sponsored by Sign Language Technology Committee of IEICE. One of members promoted a meeting of Speech Research Committee of IEICE on Oct 13th at the University. We organized IWHIT94 on Sep. 29th and 30th (International Workshop on Human Interface technology 1994) which was sponsored by the International Affairs Committee of the University of Aizu. The workshop had 5 sessions (1. Speech Recognition and Translation, 2. Robustness in Speech Recognition, 3. Speaker Recognition and Segregation, 4. Virtual Acoustics, 5. Non-Verbal Communication): 8 keynote lectures and 11 lectures.
We proposed a joint project on ``Multi-modal Human Interface for Handicapped People" to Multimedia Information Center in the University of Aizu, and the project has been successfully approved and received budget of about 30 million yen to establish this project.
We promoted 6 SCCPs for students (Social Hyper Networking, Visual Language for Office Processing Software, Speech Dialogue System, Computer Music, Neural Network Modeling, Non-Verbal Communication) and 3 Joint Projects (Study of Machine Processing of Signs Generated by Hand Movements; Study on Speech Recognition under Noisy Environment; Audio Windows: Spatialization of Synthesized Speech, Spatialization of Music and Hierarchical Organization of Spatial Sound Sources), and 1 Courseware project (Speech Processing and Speech Recognition). Our members received a commissioned research fund from ATR Interpreting Telecommunication Research Lab on ``Study on Speech Recognition System Based on Information Theory", NTT Human Interface Lab on ``Audio Windows", and a research grant from Ministry of Education on ``Robust Speech Recognition Using Microphone array and Signal Source Modeling Technique".
We exhibited our research activities (FPM-LR speech recognition system, Audio window and Sign language) in the open campus in University Festival. We promoted Lab Open House for Freshmen and 3 Labs exhibited their research activities.
On our research activity we presented 6 papers in refereed International Conferences and 4 full papers in refereed academic journals. We promoted series of HI Lab seminar which contained 8 lectures. We published textbooks on Human Interface Technology as Human Interface Technology Series: Vol. 1: ``Speech Processing and Speech Recognition" and Vol. 2: ``Speech Recognition --- Advanced Technology--", and using special educational fund we printed 50 copies each.
One of members organized working group on ``Blind and Computer" and about 30 people attended to the first working group (March 12th, 1995). The topics are ``Personal computer environment for blind people", ``Multimedia Information Center in the University of Aizu", ``Recent NTT speech research and products".
We started to provide Human Interface Lab WWW information to open our research and education activities to the world. http://www/labs/sw-hi/HI.html.
Refereed Journal Papers
As in the Bauhaus movement of the '30s, artists and engineers are working together on commercial industrial (hardware) and post-industrial (software) design. Japan, a world leader in R/D areas like display technology and robotics, is a fertile environment in which VR (known here sometimes as AR [for artificial reality]) can flourish, both in labs and studios, and as consumer products and services: a confluence of theme parks, amusement centers, retail outlets, and home computer and media centers. Emphasizing the capture, transmission, and reproduction of experience, literally sensational VR is upon us, to simulate and stimulate. If it's hyped, or hyper, it's happening around Tokyo. Here's a selective guide to meta-holo-attractions open to the public in `The Big Orange.'
Zebrackets is a system of meta-METAFONTs to generate semi-custom striated parenthetical delimiters on demand. Contextualized by a pseudo-environment in LaTeX, and invoked by an aliased pre-compiler, Zebrackets are nearly seamlessly invokable in a variety of modes, manually or automatically generated marked matching pairs of background, foreground, or hybrid delimiters, according to a unique index or depth in the expression stack, in `demux,' unary, or binary encodings of nested associativity. Implemented as an active filter that re-presents textual information graphically, adaptive character generation can reflect an arbitrarily wide context, increasing the information density of textual presentation by reconsidering text as pictures and expanding the range of written spatial expression.
In this paper, we consider signals that have originated from a sequence of sources. The problem of segmenting the signal and identifying segments to their sources is addressed. This problem has wide applications in many fields. This report describes a resolution method using Ergodic Hidden Markov Models (HMM). In this model, each HMM state corresponds to a signal source. The signal source sequence can therefore be determined by using Viterbi decoding over the observation sequence. Baum-Welch training can be used to estimate the HMM parameters from training material. As an application of the multiple signal source indentification problem, an experiment was performed on unknown speaker identification. As a result, a classification rate of 79% for 4 male speakers was obtained. The results further indicated that the model is sensitive to the initial values of the Ergodic HMM and that the long distance LPC cepstrum is an effective way of preprocessing the signal.
Refereed Proceeding Papers
During the last few decades, the requirments of the international market imposed by economic forces have led to the necessity to develop effective and efficent electronic natural language processing tools. Many Machine Translation (MT) systems are being developed world wide, especially in Japan and Europe to address this chalanges in the 21 century. The research and development of modern Sinhalese began recently. This paper discuss the similaraties of Japanese and Sinhalese and discusses the metholody used on MT process, and the problems encountered and present status and future plans.
MAW's audio window reinterpretation of standard idioms for WIMP systems-- including draggably rotating icons, and directionalized and non-atomic spatial sound objects-- compliments features that are especially well suited for asynchronous operations, including compatibility with hypermail (allowing spatial sound to be put into electronic mail). By embedding MAW documents, which might include dynamic effects, alongside voicemail, we tag each utterance as a spatial channel.
Augmented reality describes hybrid
presentations that overlay computer-generated imagery on top of real scenes.
Augmented
Poster, ISBN-0-201-62603-9
This paper describes speech segmentation and clustering algorithms based on acoustic gender features, where speakers and speech context are unknown. As the simpler case, when speech segmentations are known, the Output Probability Vector Clustering algorithm is applied. In the case of unknown segmentations, an ergodic HMM-based technique is applicable. In this paper only the simpler case is focused on and evaluated using simulated multi-speaker dialogue speech data.