Robot learning from observation

Teamwork with Katsushi Ikeuchi, Zengquang Yan, Yoshihiro Sato, Minako Nakamura, Shunsuke Kudho, et al.

My contribution:

Proposed a scale-space filtering-based method to extract keyframes from human gestures to generate a representation for reproducing human motion. When using a Labanotator’s notation as the baseline, this method performs better than the previous energy-function-based method in terms of accuracy.
Algorithm implementation.
Robot control programming under ROS.
Robot hardware design and prototyping.

Related work is published in International Journal of Computer Vision, 2018. You can find out more information here.
Related work was demoed to Bill Gates, Paul Allen and Satya Nadella at TechFest 2016 of Microsoft.

Related post on Harry Shum’s facebook

Hand-made robot prototype

We present a robot interaction system by introducing Labanotation into the working process. In real world scenario, there are several ways of programming a robot. But due to the kinematic and dynamic difference between human and robots, or even different robots themselves, simple mimicking method to repeat exactly the same joint angles will not work. So we bring Labanotation, a notation being used to record dance, as an intermediate symbolic representation for imitation of human’s upper body motion. In our system, a person performs series of actions in front of a Kinect first. Human skeleton information is captured and processed into so-called “energy function”. Peaks in the energy function are detected and treated as time point of key poses. A sequence of Labanotation is generated from the skeleton information (a.k.a. key pose) of the corresponding time points. After that, the Labanotation is sent to different robots. Robots receive the message then interpret Labanotation according to its’ own kinematics structure to mimic human action.

Procedures for using Labanotation as intermediate language of robot control.

For tasks that do not request precise manipulation, the biggest benefits of doing this is time-saving for motion editing to different robots. Engineers only need to program the robot once to set up rules of mapping Labanotation symbols. Then robots are able to interpret any Labanotation sequences.