Audio Windows: Virtual Concerts

MAW, acronymic for multidimensional audio windows, is an interface for manipulating iconic sound sources and sinks in virtual rooms. Implemented as a NeXT-based frontend, MAW is suitable for synchronous applications like teleconferences or concerts, as well as asynchronous applications like voicemail and hypermedia.

A source is a sound emitter; a sink is a sound receptor, a delegate of the human listener in the virtual room. Source->sink spatialization is performed by a DSP (digital signal processing) module which convolves the digitized input streams with HRTFs (head-related transfer functions) that capture directional effects. This spatialization enables auditory localization, the identification of the location of a source, which can be used for "the cocktail party effect." The use of such effects might be used in a concert to `hear out' an instrument, virtually and perceptually pulling it out from the mix, or for sub-caucusing in a teleconference. Listening to sound presented in this spatial fashion is as different from conventional stereo mixes as sculpture is from painting.

The sinks and sources may wander around, like minglers at a cocktail party, or upon the stage during a concert, hovering over the shoulder of a favorite musician. Using the cut/paste idiom as a transporter, one may leave a room and beam down into others. Such a control mechanism can be used to focus selectively on various sources. If several rooms were interesting, it would get tiresome to have to bounce back and forth. Therefore, the user can simply fork themself (with copy), leaving one clone hither while installing another yon, overlaying soundscapes via the superposition of multiple sinks' presence. This schizophrenic existence suggests the name given to this effect: sonic (analytic) cubism, presenting multiple acoustic perspectives simultaneously.

This feature can be used to sharpen the granularity of control, as the separate sinks can monitor individual sources (sometimes called "mixels," acronymic for '[sound] mixing elements', in analogy to pixels [picture elements], texels [texture elements], taxels [tactile elements], voxels [volume elements, a.k.a. boxels]). In fact, the multiple sinks need not listen in separate rooms. If the sources and sinks are brought back together, each source is 'auto-focused' by only a single sink (the closest, for example).

"Let's enjoy lunch time with us!"

presented in conjunction with VR Sound and Computer Music SCCPs

Kimitaka Ishikawa, Yuzi Nonoyama
Osamu Kaneda, Kenichi Muramatsu, Masato Suzuki
Sayuri Hoshi, Daigo Imanishi, Miyoshi Kitami, Masatoshi Miura, Yosuke Miyanaga, Ken Sasaki, Mikako Takita, Yukinori Urashima

Michael Cohen