Summer Semester 2019

Focus on an interactive game

Blog

We decided on our main goal for this semester: we want to develop an interactive game for Pepper. The game should be like the famous children game "I spy with my little eye" and Pepper should be able to play in the position of both the questioning and the answering player with a human opponent. To achieve our goal we divided the whole group into smaller groups with different focuses.

Week 1 to 4

Microphone Group

We mostly did some research in the field of voice/speech recognition. We favoured using ICA (independent component analysis) over the idea of using an ANN because ICA seems to produce better and more stable results. Our goal is to recognize speech in a noisy environment and localize the speaker, so that during the game the player(s) can talk to pepper. To achieve this online and hopefully with multiple speakers we will also look into Kalman filters.

Object Recognition

We started on our research into what algorithms/packages/etc. are available, to establish a broad overview into the field. We noticed during the research of different approaches and algorithms that probably a different set of objects would be best to use with each different algorithm.

Conversation and Natural Language Processing

We set up a goal in terms of conversation and language processing. We searched for several existing chatbots and for how to apply them to our game. Moreover, we tried to outline upcoming works. We would use Python to build conversation structure and will see if API.AI and Watson Assistant are needed. The user throws a question "I spy with my little eye something[color name]", starting the game. In addition to color, we plan to collect attributes that can be used for description of the object.

Week 4 to 6

Microphone Group

We want to use HARK for sound localization and separation. HARK is an open-source robot audition software that was invented at Kyoto University. HARK is module-based, which means that we can adapt it to our task, our version of pepper and the microphone we are going to use. Also HARK has a ROS implementation, so that we can connect it to the projects of the other groups.

Conversation and Natural Language Processing

We made a basic conversation flow and added some exceptional situations. We started making codes beginning with a situation that Pepper is guessing and an user is making a quiz. To begin with, we decided some features as a clue; color, shape, size, and the beginning letter. (The first clue always should be a color as we have discussed with other subgroups.)

Object Recognition

At first, we will try out computer vision solutions, since it might be the easiest way. If this does not work, we will do some further research on FALKON (https://github.com/rbgirshick/py-faster-rcnn/tree/master/models/pascal_voc/ZF) which is a program for object detection for robots (https://www.youtube.com/watch?v=eT-2v6-xoSs): one simply holds the objects in front of the robot (depth camera included) and moves them around. During this phase the robot learns the object and is able to recognize it later (e.g. in a forced choice situation). Of course this has to be further elaborated: e.g. in such a way that the small detected picture of the recognized object is used as an input to analyze the shape, size and the RGB values of that object. If all the objects have the same distance to Pepper, their sizes can probably be detected via Pepper’s depth camera. Another option would be to use YOLO (You Only Look Once) which is a state-of-the-art, real time object detection system (https://github.com/llSourcell/YOLO_Object_Detection).