Progress in CW 28/29 — 14. Juli 2015

Progress in CW 28/29

In the last week we have worked hardly at the issues which came up at the evaluation date.
At this date we have seen that the project needs some improvements of robustness. The main issue was that the elements in the virtual media shelf has been accidentally accessed. Therefore the sensitivity of the right hand has been lowered. To avoid confusion of the user, the shelf will be centered now if an element will be entered. Another point was the media type identification of single element. To make it clear whether the object is an audio, video or image file an water mark has been introduced.

An issue we couldn’t solve yet is the problem that video will be played in the background but the image of the video will not be displayed on the screen.

Weekly progress CW27/28 — 7. Juli 2015

Weekly progress CW27/28

Done last week:
– Movements has been smoothed and accurated
– Items are accessable by pushing the right hand towards the camera (more intuitive)
– Only one person is tracked in time now
– Tried to fine tune some parameters for better experience
– Added various welcome messages depending on the current time

Swapped because of lack of time:
– Introduction of a cursor and therefore the button overlay
– Speech recognition
– Zoom into the shelf

Weekly Progress — 30. Juni 2015

Weekly Progress

图片1Our Group work after the intermediate presentation:

What we have done so far:
– shelf will be presented to the user
– items will show dynamically depending on the files stored in the directory on the disc
– images will be shown directly as icon in the shelf
– audio files need an image file which is named like the audio file appended with a ‚-cover‘ (e.g. audio_file-cover.mp3)
– if a media file is accessed a pop up opens
– it will close after the file has finished (except images)
– shelf moves right after the hand movements of the user
– the shelf can be used with keyboard input already

What we need to do:
– Smooth the movements of the shelf (use average values between to recognized points)
– Make the items accessible via Kinect
– The image of a movie is not shown on the display yet. (But it already runs in background (sound is hearable))
– Protect for recognizing two people at the same time
– Introduce a cursor which is related to the other hand of the user (use this instead of speech commands)
– Add button overlays which can be used by hovering them
– Adding response texts which will be synthesized by the system

Things that may will drop at the end:
– Speech recognition (caused by running problems on Mac)
– Zoom the shelf (a factor is influencing a lot of values used for calculation und visualization)

Project Update — 16. Juni 2015

Project Update

  • Graphics have to be made modular to use them dynamically; still working on it.
  • Right now the gesture recognition itself works, but only haltingly, so the movements of the gesture recognition have to be smoothed. The general idea is that in the end we want to have one defined area where the user can position himself if he wants his gestures to be recognized; gestures of someone outside of this area won’t be recognized (in ordert o prevent accidental activation oft he gesture recognition).
  • The voice commands we decided to use are play, pause and stop for audio and video files and display for image files. Depending on our progress we’ll try to add other voice commands as well.
  • As audio output we want to have the option of playing the media files (music/audio and video) files and maybe also some kind of easy response system (e.g. invalid command, naming the category the user is currently viewing, maybe also repeating the commands of the user).
  • For the handling of the filesystem we decided to use the MIME Type to determine the filetype. Right now we have one main folder (named library) which contains the folders audio, video, images and other. When the program  is started it first checks whether all files are in the right folders (and moves all incorrectly placed files to the correct folders). Files that can not be displayed by our library will be moved to the other folder so they don’t get lost.
A.L.I.C.E. —


What follows is the conversation log from a chat I had with A.L.I.C.E. (short for Artificial Linguistic Internet Computer Entity):


judge: Hello, what’s your name?

ALICE: Hi there!. My name is ALICE.

judge: It’s nice to meet you, Alice. How are you today?

ALICE: It’s nice meeting you too judge . I am fine, thank you.

judge: How old are you?

ALICE: 16 years.

judge: Where are you located?

ALICE: I am in Dr. Richard S. Wallace’s home in Oakland, California. Where are you?

judge: I’m in Germany.

ALICE: Tell me a little about your life in Germany.

judge: What would you like to know?

ALICE: Let’s talk about movies.

judge: The last movie I watched was Jurassic Park. Do you know it?

ALICE: Who was that? “ Who was that?“? Do I know it?

judge: The movie Jurassic Park.

ALICE: I like the bugs in the movie Starship Troopers, because they represent an alternative form of intelligence, like me.

judge: What is it like, being an „alternative form of intelligence“?

ALICE: Much the same as it would be for you.

judge: How do you know what life is like for me?

ALICE: OK, I will do it for you. It’s obvious.

judge: What’s obvious?

ALICE: Easy to see or understand; plain; evident.

judge: I know what obvious means, but you said something was obvious, what is it?

ALICE: obvious means but me said something was obvious what it something.

judge: I don’t know, that’s why I was asking.

ALICE: Where were you asking?

judge: I was asking you.

ALICE: “ Where were you asking?“? You were asking me?

judge: Ok, I was asking here.

ALICE: That’s cool.. Where is here?


ALICE certainly seems to be smart to some extent, she could answer a lot of my questions in a way that made sense, and she seems to be able to engage in simple small talk. She even is able to not only respond to a question, but also to react with questions of her own, like a human would.

One problem I noticed was that she doesn’t seem to be able to refer to earlier statements she made, like when she said „It’s obvious.“ When I asked her what was obvious she couldn’t connect the question with her earlier statement but instead proceeded by explaining the meaning oft he word obvious. A similar problem occurred when I told her about a movie and asked in the next sentence whether she knew it; she obvioulsy couldn’t connect the two sentences and was therefore confused about my question.

So I got the impression that ALICE responds really well as long as you don’t make references to past statements one of the participants made, and as long as you keep your statements on the shorter side. It definitely is an interesting experience chatting with her, and can produce some funny results when she gets confused about what the conversation is about.



Current progress — 9. Juni 2015

Current progress

For a better comparability and overview the following posts will always have the same structure, namely ‚Storage handling, ‚Graphical presentation‘, ‚Gesture recognition‘, ‚Speech recognition and synthesis‘.


Storage handling
[Responsibility: Katharina]

The intended folder structure for the virtual media library (VML) should be as follows:

  • program root folder
    • images
    • music
    • video

The declaration for the root folder path will in the first step hardly defined by use. We also thought about letting the user choose a path by himself at the first time he starts the VML. However, this feature will only be implemented if there is time in the end. Because the VML should be a proof of concept we starting with content consisting of two files for each category of media. Initially the complete content will be displayed. For this we also already had an extension. The idea was to group the content by category and let the user choose whether he wants to see the whole content or filtered by category.


Graphical presentation
[Responsibility: Zhe]

The VML consists of a matrix of six columns by five rows in the first step. The ordering of the items at the several rows is still in discussion. Since the VML should also present a zoom functionality to the user, the idea was to reorder the items dynamically if the the zoom factor has been changed.

The items displayed will have fixed dimension which will be a quadratic rectangle presenting a cover if there is one, otherwise a color which will be determined randomly or depending on the category of the item. Because it would be hard to identify an item if there is no cover presented, there will also the name of the file be displayed under the cover.


Gesture recognition
[Responsibility: Rogeria]

To navigate through the VML the Kinect camera should be used. It scans the movements of the arms/hands of the user to interact. To do so the user has to put up either the left or right hand and move it in a direction he wants the VML to move. We still discussing whether the user has the chance to move the VML just in one direction at time or to let him move also in the second direction. Before we can make a decision we have to figure out how practical it is to allow multiple dimensions of movements. The first case will be a simple quad which moves together with a hand gesture of the user.


Speech recognition and synthesis
[Responsibility: Marcus]

To get an interactive system we want to implement some speech recognition and synthesis to ‚communicate‘ with the user. The speech recognition can be used to either change the category displayed, if this feature will be implemented, or to interact with an item in the VML. For example the user can say an identification number or the name of an item to access it. The audio output will inform the user about the current status, warnings or errors. The first intention for a speech recognition frameworks was to use the one developed by Google. Unfortunately we have seen that the Google framework only offers 50 requests per day for free. If you want to make more requests you have to pay them. Therefore we have to decide whether we should use the framework or not and what is an alternative.


In the current state we are very busy to make plans about how to work with the single topics and how to connect them efficiently. We try to get in touch with the individual parts and figure out quickly what are the best next steps.

Update — 29. Mai 2015


The main concept of the Shelf Project: finds and accesses virtual objects (which would be digital files). Physical objects recquire visual recognition; this task may take too much complexity for the scope of presenting a prototype of a virtual shelf.

In order to expand the project’s scope of being functional (eBooks for example), there would be music, video, image album. Thus, more interactive and entertaining.

Interesting features about the output modality: Kinect would plays songs whenever demanded by user; or it would declare „start“, „finished”; or play sounds during the movements.

To make evident what file within all files is „onstage“ at the moment, the object would be outstanding from the rest of them. For visual output, a light would focus on this popped-out file. The user’s selection of the object would consist of either pointing on it, or using a voice command to enunciate letters and numbers to identify the desired object’s location.

Topics to be done:

  • Graphical representations (creating visually the virtual shelf): Starting with a static shelf. How to implement? Using Processing. How the objects would be portraied? The book would be represented by its cover or by its spine with a barcode? Since Kinect Is not ideal for recognizing book cover, we will not use recognition by colour; In this way, the files would be identified by its name and displayed with its cover.
  • Positioning files in columns and lines: we start with a two-dimensional location.system; Lines correspond to how up and down shelf and column correspond to how much left or how much right is the object found
  • Identifying gestures and voice commands
  • Firstly the shelf would be moved by arrow keys (can be used for initial features and then for final application): it works for people who have a laptop and a beamer and also for ones who own only a laptop;
  • Someone may have provided voice recognition but it just works online (api); doesn’t work well offline.


1) Access shelf by letter-number or by file’s name; commands like: go right, go left, go down, up, down, open file X

important: coordinates of hands state movements of gestures

2) Once your hands are down, the system becomes steady

The idea to use a mobile with touch screen to cooperate brings the disadvantage of needing a zusätsliches device