Next: Shape Modeling Laboratory Up: Department of Computer Previous: Multimedia Systems Laboratory

Human Interface Laboratory

/ Masahide Sugiyama / Professor
/ Michael Cohen / Associate Professor
/ Susantha Herath / Associate Professor
/ William L. Martens / Visiting Researcher
/ Minoru Ueda / Assistant Professor

Using our communication channels (sense organs: ears, mouth, eyes, nose, skin, etc) we can communicate each other, including between human and human, human and machine, and human and every information sources. Because of disability of the above channels in software or hardware sense, sometimes it becomes to be difficult for human to communicate. Research area of Human Interface Laboratory covers enhancement and generation of various human interface channels.

In order to advance the above research on human interface, we adopt the following research principle:

Theoretical Our target is human interface and our study has possibility to do try-and-error, heuristic, too practical business. Based on our experimental results, and experiences we try to establish the theory, unified insight, generalization and analytical viewpoints.
Practical Our target is not theory generation for theory. We extract the concept, theory in order to clarify in experimental and quantitative viewpoint.

We organized second workshop IWHIT98 on Nov. 11th, 12th and 13th (International Workshop on Human Interface Technology 1998) which was sponsored by the International Affairs Committee of the University of Aizu. The workshop had 5 sessions (1.Object Location and Tracking in Video Data, 2.Subjective Factors in Handling Images, 3.Visual Interfaces, 4.Visual and Body Perception, 5.Tools for Language Generation; 15 lectures).

We promoted 5 SCCPs for students (``Speech Processing and Multimedia", ``Sign Language Processing System'', ``GAIA -- Planet Management'', ``Computer Music'', ``Aizu Virtual City on InterNet") and 2 Research Projects (``Object Location and Tracking in Video Data'', ``Spatial Media: Sound Spatializiation''). We received 4 commissioned research funds; IPA on ``Development of Japanese Dictation Software'' , HITOCC on ``Study on Computer Security using Speaker Recognition", Fukushima Prefectural Foundation for the advancement of Science and education on ``Environment computer activity project'', Telecommunication Advancement Organization of Japan Fund on ``Sign Language Communication Between Different Languages''.

We exhibited our research activities in the open campus in University Festival (Oct 31st, Nov.1st) and Fukushima Sangyo Fair (Nov. 29th and 30th). We promoted Lab Open House for Freshmen on April 3rd.

On our research activity we presented 6 papers in academic journals and 10 refereed papers in International Conferences.

One of members organized working group on ``Blind and Computer" and about 30 people attended to the working group and received the support from NHK Wakaba Fund.

We have the homepage of Human Interface Lab to open our research and education activities to the world.

http://www.u-aizu.ac.jp/labs/sw-hi/.

Refereed Journal Papers

M. Yamashita and M. Sugiyama, Speaker Verification Applied to xvlock in X Window Lock System -- Development and Its Evaluation --. Trans. of IPSJ, vol.39, No. 11, pp.3131-3141, Nov.1998.
The speaker can be recognized using the individual features included in each voice wave. It is called the speaker recognition, which can be applied as a means of an individual verification. This paper develops a software system named ``xvlock'' which can manage computer access by the speaker recognition technique, and also describes the outline of xvlock and the performance evaluation. The implementation and the experiments did only one standard platform, but xvlock can be applied to the other platforms because of less platform dependency. The For low quality input voice (8bit $\mu$ law, sampling rate: 8kHz ) implemented xvlock achieved 93.9\% verification rate.
Michael Cohen and Nobuo Koizumi, Virtual gain for audio windows. Presence: Teleoperators and Virtual Environments, vol.7, No.1, pp.53-66, Feb.1998.
Audio windowing is a frontend, or user interface, to an audio system with a realtime spatial sound backend. Complementing directionalization by a digital signal processor (DSP), gain adjustment is used to control the volume of the various mixels ([sound] mixing elements). Virtual gain can be synthesized from components derived from collective iconic size, mutual distance, orientation and directivity, and selectively enabled according to room-wise partitioning of sources across sinks. This paper describes a derivation of virtual gain, and outlines the deployment of these expressions in an audio windowing system.
Katsumi Amano, Fumio Matsushita, Hirofumi Yanagawa, Michael Cohen, Jens Herder, William Martens, Yoshiharu Koba, and Mikio Tohyama. A Virtual Reality Sound System Using Room-Related Transfer Functions Delivered Through a Multispeaker Array: the PSFC at the University of Aizu Multimedia Center. TVRSJ: Trans. of the Virtual Reality Society of Japan, vol.3, No.1, pp.1-12, March 1998.
The PSFC (for Pioneer Sound Field Controller) is a DSP-driven hemispherical loudspeaker array, installed at the University of Aizu Multimedia Center. It features realtime manipulation of the primary components of sound spatialization for each of two audio sources located in a virtual environment, including both the content (source location: apparent direction and distance) and context (room characteristics: room size and liveness). In an alternate mode, it can also direct the destination of the two separate input signals across 14 loudspeakers, manipulating the apparent direction of the virtual sound sources with no control over apparent distance other than that afforded by source loudness (i.e., no simulated environmental reflections or reverberation). The PSFC speaker dome is about 10m in diameter, accommodating about fifty simultaneous users, including about twenty users comfortably standing or sitting near its ``sweet spot,'' the area in which the illusions of sound spatialization are most vivid. Collocated with a large screen rear-projection stereographic display, the PSFC is intended for advanced multimedia and virtual reality applications.

Refereed Proceeding Papers

M. Yamashita and M. Sugiyama, Speaker Verification Applied to Display Lock System --- Development and Its Evaluation ---. Proc. of AVIOS98, AVIOS, Sep.1998.
The speaker individuality can be recognized using voice features included in his/her voice wave. It is called the speaker recognition and can be applied to an individual verification. This paper proposes a new computer security software system named ``{\bf xvlock}'' which can control computer access connected with the speaker recognition technique, and also describes the outline of {\bf xvlock} and the performance evaluation. The implementation and the experiments did only one standard platform, but {\bf xvlock} can be applied to the other platforms because of less platform dependency. For the low quality input voice (8bit $\mu$ law, sampling rate: 8kHz ) implemented {\bf xvlock} achieved 93.9\% verification performance.
T. Asano and M. Sugiyama, Object Location and Tracking in Video Data. Proc. of SPECOM98, SPECOM, Oct.1998.
Multimedia database management and retrieving are on a world-wide demand. In particular, Object Location and Tracking (OLT) technology in time-space is a core in a search engine of a huge multimedia database and has wide applications.
The final target of our research is to establish technologies which enables to locate and track specified objects in video data from the combination of audio and visual cues. As human being is one of the typical objects, as the first step of our research this paper will be focused on location and tracking of a specified person in the sound domain.
This paper describes the speaker-based segment detection and junction algorithms and evaluation experiments using the simulated dialogue data.
M. Sugiyama, Object Detection and Tracking in Video Data. Technical Report of Human Information Group, pp.7-12, The Institute of Image Information and Television Engineers, Nov.1998.
Multimedia database management and retrieving are on a world-wide demand. In particular, Object Location and Tracking (OLT) technology in time-space is a core in a search engine of a huge multimedia database and has wide applications. The final target of our research is to establish technologies which enables to locate and track specified objects in video data from the combination of audio and visual cues. As human being is one of the typical objects, as the first step of our research this paper will be focused on location and tracking of a specified person in the audio domain. This paper describes OLT project, the speaker-based segment detection and junction algorithms and evaluation experiments using the simulated dialogue data, and segment Fuzzy search algorithm and its application to detection of variable length segment.
Michael Cohen, The internet chair. Proc. Int. Workshop on Networked Appliances, (Tak Kamae, editor) pp.S5-1, IEEE, Kyoto, Nov.1998.
Refereed. A pivot (swivel, rotating) chair is considered as an I/O device, an information appliance. As implemented, the main input modality is orientation tracking, which dynamically selects transfer functions used to spatialize audio in a rotation-invariant soundscape. In groupware situations, like teleconferencing or chat spaces, such orientation tracking can also be used to twist iconic representations of a seated user, avatars in a virtual world, enabling social situation awareness via coupled visual displays, fixed virtual source locations, and projection of non-omnidirectional sources.
Michael Cohen and Jens Herder, Symbolic representations of exclude and include for audio sources and sinks. Proc. VE:Virtual Environments, (Martin Goebel, Jurgen Landauer, Ulrich Lang, and Matthias Wapler, eds.) (pp. 59:1-4), Springer-Verlag. Stuttgart, Germany, June 1998. ISSN 1024-0861.
Traditional mixing idioms for enabling and disabling various sources employ mute and solo functions, which, along with cue, selectively disable or focus on respective channels. Exocentric interfaces which explicitly model not only sources, but also location, orientation, directivity, and multiplicity of sinks, motivate the generalization of mute/solo and cue to exclude and include, manifested for sinks as deafen/confide and harken, a narrowing of stimuli by explicitly blocking out and/or concentrating on selected entities. As sinks are analogs of sources, the semantics are identical. Such functions can be applied not only to other users' sinks for privacy, but also to one's own sinks for selective presence. Multiple sinks are useful in both groupware, where a common environment implies social inhibitions to rearranging shared sources like musical voices or conferees, and individual sessions in which spatial arrangement of sources, like the configuration of a concert orchestra, has mnemonic value. Exclude/include source and sink attributes can be visually represented by iconic attributes associated with a figurative avatar and can distinguish between operations reflexive, invoked by the user associated with a respective icon, and transitive, invoked by another user in the shared environment. Distributed users might typically share spatial aspects of a groupware environment, but attributes like muteness or deafness are determined and displayed on a per-user basis. For example, a source representing a human teleconferee might symbolize muteness with an iconic hand clapped over its mouth, positioned differently (thumb up or thumb down) depending on whether the source was muted by itself or another user's sink. (In the former case, all the users in the space could observe the muted source, but in the later, only the user disabling the remote source would see and perceive the mute.) An audio muffler can be wrapped around an iconic head to denote its deafness, but to distinguish between self-imposed deafness, invoked by an associated user whose attention is focused elsewhere, and distally imposed, invoked by a user desiring privacy, iconic hands can be clasped over the ears can be positioned differently depending on the agent of deafness.
K. Naito, S. Herath. Abstract Generation for Newspaper Articles. Proceedings of ICS'98, Tainan, Taiwan. December 17-19, 1998.
Extracting necessary information is hard and time consuming in the information-oriented society. An abstract generation system for newspaper that represents a large volume of information is vital. This paper presents an experimental system developed for abstracting newspaper articles on traffic accidents without applying complicated natural language processing techniques. The level of abstraction can be selected by the user. The user saves time significantly by excluding unwanted information in the article.

Unrefereed Papers

T. Asano, M.Sugiyama, Segmentation and Classification of Auditory Scenes in Time Domain. Proc. of IWHIT98, IWHIT, Nov.1998.
T. Asano, M. Sugiyama. Junction of Acoustic Scene Segments. Proc. of ASJ, pp.155-156, ASJ, Sep.1998.
M. Sugiyama. Fast Segment Search Algorithms. Technical Report of Speech Processing, pp.39-45, ASJ/IEICE. Feb.1998.
K. Yamamoto, K. Suganami, S. Herath. Electronic Sign Language Dictionary Development. Proceedings of the Fourteenth Symposium on Human Interface, pp.215-220, Tokyo, September 28-30, 1998.
K. Abe, S. Herath. Network-Based Sign Language Learning System. Proceedings of the Fourteenth Symposium on Human Interface, pp.203-208, Tokyo, September 28-30, 1998.
C. Sito, K. Suganami, S. Herath. Structured Comparison of Sign Language. Proceedings of the Fourteenth Symposium on Human Interface, pp. 149-154, Tokyo, September 28-30, 1998.
Herath, A., Ikeda, T., Herath, S, Case Structure Based Solution for Meaningless Translation Problem from Japanese to Sinhalese MT. Proc. of IWHIT'97, Editor: M. Cohen, pp. 91-95, Sashimaya Printing, Japan. March 1998.

Book Chapters

Michael Cohen, Cyberworlds: Quantity of presence: Beyond person, number, and pronouns, Springer-Verlag, 1998. (Tosiyasu L. Kunii and Annie Luciani, eds.) Chapter 19 (pp. 289-308). ISBN 4-431-70207-5.
Alternative non-immersive perspectives enable new paradigms of perception, especially in the context of frames-of-reference for musical audition and groupware. Maw, acronymic for multidimensional audio windows, is an application for manipulating sound sources and sinks in virtual rooms, featuring an exocentric graphical interface driving an egocentric audio backend. Listening to sound presented in such a spatial fashion is as different from conventional stereo mixes as sculpture is from painting. Schizophrenic virtual existence suggests sonic (analytic) cubism, presenting multiple acoustic perspectives simultaneously. Clusters can be used to hierarchically organize mixels, [sound] mixing elements. New interaction modalities are enabled by this sort of perceptual aggression and liquid perspective. In particular, virtual concerts may be ``broken down'' by individuals and groups. Keywords and Phrases: binaural directional mixing console, CSCW (computer-supported collaborative work), frames of reference, groupware, mixel ([sound] mixing element), points of view, sonic (analytical) cubism, sound localization, spatial sound.
Michael Cohen and Jens Herder, Virtual Environments '98: Symbolic representations of exclude and include for audio sources and sinks. Springer-Verlag, 1998 (pp. 235-242). (Martin Goebel, Jurgen Landauer, Ulrich Lang, and Matthias Wapler, eds.) ISSN 0946-2767; ISBN 3-211-83233-5.
Shared virtual environments, especially those supporting spatial sound, require generalized control of user-dependent media streams. Traditional mixing idioms for enabling and disabling various sources employ mute and solo functions, which, along with cue, selectively disable or focus on respective channels. Exocentric interfaces which explicitly model not only sources, but also location, orientation, directivity, and multiplicity of sinks, motivate the generalization of mute/solo and to exclude and include, manifested for sinks as deafen/confide and harken, a narrowing of stimuli by explicitly blocking out and/or concentrating on selected entities. This paper introduces figurative representations of these functions, virtual hands to be claped over avatars' ears and mouths. Applications include groupware for collaboration and teaching, teleconferencing and chat spaces, and authoring and manipulation of distributed virtual environments. Keywords: CSCW (computer-supported collaborative work), groupware, narrowcasting functions, articulated mixing console.

Grants

Masahide Sugiyama, Development of Speech Dictation System. Grant, IPA, No.1, April 1998.
Susantha Herath. Fukushima Prefectural Foundation for the advancement of Science and education, Environment computer activity project, yen 1,600,000 April 1998.
Susantha Herath. Telecommunication Advancement Organization of Japan Fund, Sign Language Communication Between Different Languages, No 10006, yen 7,800,000 April 1998.

Academic Activities

Masahide Sugiyama. Member of Human Interface Committee in IPSJ. April 1998.
Masahide Sugiyama. Member of Spoken Language Processing in IPSJ. April 1998.
Masahide Sugiyama. Member of Speech Processing Committee in IEICE. April 1998.
Masahide Sugiyama. Referee of IEICE and ASJ (Acoustic Society of Japan). April 1998.
Masahide Sugiyama. Book reviewer (ASJ), April, 1998.
Michael Cohen. IEEE VR Terminology Committee. April 1998.
Susantha Herath. IEEE coordinator (1993.4 -). April 1998.
Susantha Herath. Member of the Review Board for the International Journal of Applied Intelligence (1992.5 ~). May 1998.
Susantha Herath. Financial Chair of third International Workshop in Human Interface Technology '98 (IWHIT'98). Nov. 1998.
Susantha Herath. Co-editing the Proceedings of third International Workshop in Human Interface Technology '98 (IWHIT'98). Nov.1998.
Susantha Herath. Session chair of Natural Language Processing, Third International Workshop in Human Interface Technology '98 (IWHIT'98).

Others

Satoko Yasuda. Masters thesis: Modeling Sound Radiation Characteristics of a Clarinet. Advisors: Michael Cohen, Jens Herder, and William L. Martens.
Kimitaka Ishikawa. Masters thesis: Using a MIDI Module as a Sound Spatialization Backend. Advisors: Michael Cohen, Jens Herder, and William L. Martens.
Yuuko Watanabe. Masters thesis: Kansei Engineering for Control of Effects Processing for Musical Sound. Advisors: Michael Cohen, Jens Herder, and William L. Martens.
Satoshi Oshikubo. Interface for EDR Conceptual Dictionary. Univ. of Aizu, Thesis Advisor: Susantha Herath. 1998.
Katsutoshi Matsutani. EDR Based Semantic Analysis. Univ. of Aizu, Thesis Advisor: Susantha Herath. 1998.
Takeshi Onedera. Meaningful Japanese Sentence Generation Based on EDR Dictionary. Univ. of Aizu, Thesis Advisor: Susantha Herath. 1998.
H. Terai, Japanese Sign Language Learning System on WWW. Univ. of Aizu, Thesis Advisor: Susantha Herath. 1998.
Fumi Toya, Sign Coding System for Machine Translation. Univ. of Aizu, Thesis Advisor: Susantha Herath. 1998.
Sato, Analysis System for Classical Japanese. Univ. of Aizu. Thesis Advisor: Susantha Herath. 1998.
K. Suganami, Automated Multilingual Sign Generating System. University of Aizu Graduate School of Computer Science and Engineering, Thesis Advisor: Susantha Herath. 1998.
K. Naito. Classification System for Newspaper Articles University of Aizu Graduate School of Computer Science and Engineering. Thesis Advisor: Susantha Herath. 1998.

Next: Shape Modeling Laboratory Up: Department of Computer Previous: Multimedia Systems Laboratory

www@u-aizu.ac.jp
November 1999