Basic Information

Affiliation
University of Aizu
Title
Chairperson of the Board of Executives / President
E-Mail
oka@u-aizu.ac.jp
Web site
http://web-ext.u-aizu.ac.jp/~oka/Index.html

Education

Courses - Undergraduate
Invitation to the computer science and engineering (one lecture)
Courses - Graduate
None

Research

Specialization
Pattern Recognition, Artificial Intelligence, Robotics
Educational Background, Biography
1. The University of Tokyo (Graduate School. master course) 2. Electrotechical Laboratory of MITI ( Researcher) 3. National Research Council of Canada (Visiting Scientist) 4. Real World Computing (National Project, Chief of division and laboratories)
Current Research Theme
Wide 3D scene image reconstruction from video, Video processing for automatic vehicle driving, Motion understanding from video, Recognition of speech captured in the cocktail party place, Automatic evaluation of sport performance from video, Detection of motions and flow line of multiple moving objects from video, Moving robot, Drone network (Dronet)
Key Topic
Computer vision, Motion recognition, Speech recognition, Continuous DP, Cocktail party effect, Matching, Moving robot, Dronet. Gorone
Affiliated Academic Society
IEEE, IEICE, Acoustic Society of Japan, Japanese Society of Artificial Intelligence

Others

Hobbies
Listening to music, Reading books, Visiting art museums in foreign countries.
School days' Dream
Becoming a researcher of scientific engineering
Current Dream
Making our university more attractive. Finding new algorithms in engineering and verifying their reality in applications.
Motto
"Any creative work is done by through boy's pure and honest mind" by Ryotaro Shiba
Favorite Books
Books by Nanami Shiono, Goro Shimura, Shuichi Kato, Mitsuo Taketani, Taturu Uchida, Ryotaro Shiba,Emmanual Todd,Yuval Noah Harari.
Messages for Students
Choose a more active direction when you face two directions.
Publications other than one's areas of specialization
President's Message
Relay Essay #35
Relay Essay #68"

Main research

Recognition of human motions from a video of a moving robot in the moving background

It is a normal behavior for a robot to move itself around the places where people lives daily. Then around a robot, threre are many moving oblects such as walking persons, moving cars, moving dogs, cats etc. Moreover a moving robot is seeing a moving scene by its eyes.

In this situation, a robot looks at motions of people who are facing the robot. The robot must recognize the human motions and make respoces to the people by its creating motion and suitable utterances by synthsized voices.

 If a robot is impossible to create suitable actions based on the perception of motions of facing people, the robot seems not be cooperative with human so that the robot not acceptable for human society.

We have already developed a new algorithm called "Time-space Continuous Dynamic Programming (TSCDP)" which enable a robot to realize fuctions mentioned the above required for a robot eye, which make a robot to be well cooperated with our society.

Namely, TSCDP is implemented for a robot. TSCDP works using a time-varying image
captured by an eye of moving robot, so that the robot can recognize the human motions in the moving background.

 Occlusion also often occures in our daily life. Occlusion means that there are blocking objects between a robot and a forcussed person who is making motions.

TSCDP is also allowed the exisitance of partially occlusion for a robot.

The attached picture is showing the recognition of motion of "S"by a focussed person
captured by a moving camera of a robot in the moving background where persons are crossing the scene.

The realized functions by our proposed algorithm seem quite difficult by so-called Deep
Learning because Deep Learning is weak to recognize motions captured by a moving
camera under the environment of moving scene.

This research uses the algorithm proposed by the following paper:

[1] Yuki Niitsuma, Syunpei Torii, Yuichi Yaguchi & Ryuichi Oka:"Time-segmentation and position-free recognition of air-drawn gestures and characters in videos", Multimedia Tools and Applications, An International Journal, ISSN 1380-7501, Volume 75, Number 19, pp.11615--11639.


View this research

Automatic scoring of sport performance and decision of win or loss

  We proposed a matching algorithm called Time-space Continuous Dynamic Programming (TSCDP) [1] for segmentation-free recognition of complex and multiple motions from a video stream. Segmentation-free characteristics work in both time and spatial position so that determination of both starting and ending times of each motion is not required, and any spatial position of each motion is allowed for recognition.

 Moreover, multiple and complex motions in a scene are also recognized. Moving background and occlusion are also allowed. Real-time and segmentation-free recognition is available for spotting recognition of sport motions such as performance of figure skating and sumo etc. and it is useful for realizing automatic scoring and/or decision of win or loss.
The video captured by a moving camera is allowed. 

 These functions have not been realized by conventional methods including HMM etc. so far.

 There are many other sensors using infrared (Kinect etc.) and laser devices, and accelermeter for capturing human motions. However the realization of these functions are out of scope even we use these devices.

The figures are showing several applications of motion recognition including complex and connected Chinese characters and also new functions such as detection of moving cars realized by TSCDP. The realized functions provide actual and ideal solutions which have been required to realize the real world applications of motion recognition technology. The patent of this method has been registered.

[1] Yuki Niitsuma, Syunpei Torii, Yuichi Yaguchi & Ryuichi Oka:"Time-segmentation and position-free recognition of air-drawn gestures and characters in videos", Multimedia Tools and Applications, An International Journal, ISSN 1380-7501, Volume 75, Number 19, pp.11615--11639.

View this research

Reconstructing 3D images of city and indoor scenes from videos --- Data making for walk through of city and indoor scenes from videos

 Using a standard video camera, it is easy to capture wide scenes of city, town, mountain, country sides as well as indoor scenes. We proposed a new algorithm for making a dense 3D image with a wide range of distance of a scene covering a wide area in a video. This kind of research target is a frontier of vision research.

 The obtained 3D data is suitable to use for supporting the work of robots as so-called Visual SLAM. Moreover it becomes easy to make contents of VR systems such as  3D world data for walk through. Automatic car driving becomes its application by using a video capturing a 360 degree scene by a camera on a moving car.

 There are many conventional methods for reconstructing a 3D image using devices such as ultrasonic, laser and infrared sensors, or techniques based on vision such as stereo vision, filling voxels method based on silhouette characteristics, object-based method, etc. However, in order to make a 3D image of a wide scene these methods still have weakness such as limit of pixel size, a small range of distance, being not applicable to non-standard reflection characteristics of the object etc. Moreover conventional methods need to combine with other techniques such as feature extraction  (SIFT, etc.), factorization , RANSAC, Kalman filter etc. Therefore a new algorithm is required to overcome the weakness of conventional methods.

 Our method solves most of difficulties mentioned in the above.

 There are two kinds of 3D information for a wide scene. The one is global 3D information for distinguishing larger objects such as buildings, roads, rivers, woods, etc. The other one is for distinguishing sub- objects belonging to each larger object. Our method is applicable to extract both kinds of 3D information. Here we show only the former one.

 The following five images are:  1) one frame image of a video capturing city scene, 2) the RGB + distance image of 1) from a view angle. 3) a RGB +distance image constructed by video capturing our univeristy garden

The part of our method was published in a paper,Ryuichi Oka and Ranaweera Rasika, Region-wise 3D Image Reconstruction from Video Based on Accumulated Moton Parallax, MIRU2017, PS1-5, August 2017.

 The patent of this method is now pending.

 


 

View this research

Speech Recognitionl from a Single Wave of Mixed Speakers -- Speech recognition without separation of a single speech wave simultaneously spoken by multiple speakers

   We propose an algorithm for recognizing speech from a single speech wave spoken simultaneously by multiple speakers. We use a synthesized speech as a query with a category so that the recognition system works speaker independently.
 
   As one of cognitive functions, the human brain has a function solving called “cocktail party effect” which works for understanding the meaning of a focused utterance among mixed speech simultaneously spoken by multiple speakers. The typical situation of this phenomenon is in the place of cocktail party. 
 
   The trial for engineeringly resolving the cocktail party effect is to apply the algorithm called Independent Component Analysis (ICA), which has strong potential for separating the mixed speech into a set of separated and independent speeches. The function of ICA is only separation of a speech. Therefore, the recognition of the speech is out of ICA. When applying ICA, we need basically a set of microphones of which number is equal or more than the number of speakers.
 
   The human brain is actually realizing the function of cocktail party effect using two ears. Using two ears are not equivalent to having two microphones, but used to identify the location of a sound source in the 3D space around the person.
 
   Therefore, we could say that there is a function realized using only a single microphone in the human brain. This indicates that the same function can be engineeringly realized by using a single microphone. ICA using many microphones for only separating a speech could be not the intrinsic resolution of the human cocktail party effect.
 
   Our method carries out speech recognition using a query of synthesized speech, which corresponds a category, from a single speech of mixed speakers without separation of the speech.
the attacged figure shows the experimental result of keyword or key phrase segmentation-free recognition from a single speech spoken by English, Japanese , Chinese and German speakers. The query keyword and key phrase are synthesized speech. It means that our method works speaker-independently.

The patent of the method is now pending.

View this research

Motion detection of moving persons and cars from a video camera in the sky

 Grasp of congestion situation and flow of persons and cars in time of disaster is important for reducing human damage. There are many researches and developments for obtaining the information by analyzing images and videos captured by a camera on an airplane or a drone. However conventional methods (optical flow, particle filter, Kalman filter, statistical processing of time-space voxel code, etc.) are not enough to grasp situation by detecting motion of each person and each car so far. The segmentation problem of moving persons and moving cars in images or videos is not well solved.

 A new algorithm should be developed so that it can detect motion of each person and each car in a wide area from a video in the way that it works easily, quickly, automatically, in real time, for a long time video, and specially realizing segmentation-free characteristics of persons and cars in images or videos. This kind of task is not realized by using a laser or an infrared sensors because the scene is capturing by a video camera in the sky.
 
 Our algorithm called Time-Space Continuous Dynamic Programming (TSCDP)[1]can detect motion in the way mentioned in the above including the solution of segmentation-free problem.

 We show two experimental results for detecting motion of many persons and cars (see pictures). The first one is detection of motion of each football player during the game on the field from a video captured by a camera. The second one is motion detection of each walking person and each moving car on the road from a video. A person walking along a sidewalk is detected. Different colors in the
scene images indicate different motions of moving objects.

 The experimental results show surely the potential of TSCDP for grasping congestion situation and flow of persons and cars in time of disaster. The patent of this method has been registered.

 Recently, drones become popular for using them in many application domains. We need now to develop new algorithms which are applicable to data obtained by drones and obtain actual useful information. These works are mainly belonging to not hardware but software.

[1] Yuki Niitsuma, Syunpei Torii, Yuichi Yaguchi & Ryuichi Oka:"Time-segmentation and position-free recognition of air-drawn gestures and characters in videos", Multimedia Tools and Applications, An International Journal, ISSN 1380-7501, Volume 75, Number 19, pp.11615--11639.


View this research

Dronet -- A proposal of a network of drones connected by cables for realizing new functions--

 We proposed a new type of drone called "Dronet." A dronet is composed of many drones. Each drone of a dronet is connected by cables with neighboring drones. Drones of a dronet are taking distributed control for stabilizing the dronet and reaching the target place. The flight for reaching a target place is realized by stabilization of a dronet against a vertual and external force additionally introduced. A dronet can realize new functions which are not realized by a group of conventional drones.

 Two types of dronet are proposed, that is, a dronet with a power supply cable from the ground, and a dronet without power supply from the ground. The latter type of  dronet has drones which are used for only carrying batteries.  Both types of dronet are robust against external forces like the wind for capturing  video data of scene by cameras, and able to carry a heavy object by summing up payload of many drones.

 A drone with a cable for supplying power from the ground is able to stay in the air for a long time. Therefore the dronet with line shape can enter the internal space of buildings or bridges so that cameras or laser sensors can work for a long time in the space for obtaining necessary sensor data.

 The sub-images indicate the simulations of dronet motion: 1) doronet carrying an object, 2) dronet with a broken drone, 3)  dronet with line shape. A hardware under construction is also shown.

Reference :Ryuichi Oka, Keisuke Hata: "Dronet --Drone Network Connected by Cables," Journal of The Society of Instrument and Control Engineer, Vol.56, January, pp.40-43, 2017. (in Japanese)

 The patent of this method is now pending.

View this research

Dissertation and Published Works

1) A new cellular automaton structure for macroscopic linear-curved features extraction: Ryuichi Oka, p.654, Proc. 4-th International Joint Conference on Pattern Recognition (1978).
Comment: Proposal of Cellular Feature including orientation pattern which became standard features in the field of character recognition.

2) "Continuous Words Recognition by Use of Continuous Dynamic Programming for Pattern Matching": Ryuichi Oka, Technical Report of Speech Committee, Acoustic Society of Japan, Vol.S78-20, pp.145-152, June (1978) (in Japanese).
Comment: This is the first paper of Continuous Dynamic Programming written in Japanese. Spotting recognition (segmentation-free recognition) realized by Continuous Dynamic Programming is extended to apply time sequence, 2D image, and time-varying image.

3) "Spotting Method for Classification of Real World Data": Ryuichi Oka, The Computer Journal, Vol.41, No.8, pp.559-565 (1998).
Comment: This paper is cited internationally in many papers concerning Continuous Dynamic Programming.

4) Hierarchical labeling for integrating images and words: Ryuichi Oka, Artificial Intelligence Review, Vol. 8, pp. 123-145 (1994).
Comment::This paper proposed an algorithm of middle vision which seems the most difficult stage in understanding of vision. There are three stages in computer vision, namely, early, middle, hige level.

5) On Spotting Recognition of Gesture Motion from Time-varying Image: Ryuichi OKA, Takuichi Nishimura, Hiroaki Yabe, Transactions of Information Processing Society of Japan, Vol.43, No.SIG 4 (CVIM 4), pp.54-68 (2002).
Comment:This paper proposed an architecture called "frame-wise complete cycle" for real time computer human integration of multi-media.

6) Image-to-word transformation based on dividing and vector quantizing images with words: Y.Mori, H.Takahashi and R.Oka, First International Workshop on Multimedia Intelligent Storage and Retrieval Management (MISRM'99), December (1999)
Comment: This paper is cited in many papers related language vision integration.This paper is one of pioneer papers in this topic.

7) Time-segmentation and position-free recognition of air-drawn gestures and characters in videos, Yuki Niitsuma, Syunpei Torii, Yuichi Yaguchi & Ryuichi Oka, Multimedia Tools and Applications, An International Journal, ISSN 1380-7501, Volume 75, Number 19, pp.11615--11639.
Comment: This paper is an English paper describing the method called "Time-space Continuous Dynamic Programming" in detail. There is a set of algorithms based on concept of Continuous Dynamic Programming.