For a better comparability and overview the following posts will always have the same structure, namely ‚Storage handling, ‚Graphical presentation‘, ‚Gesture recognition‘, ‚Speech recognition and synthesis‘.


Storage handling
[Responsibility: Katharina]

The intended folder structure for the virtual media library (VML) should be as follows:

  • program root folder
    • images
    • music
    • video

The declaration for the root folder path will in the first step hardly defined by use. We also thought about letting the user choose a path by himself at the first time he starts the VML. However, this feature will only be implemented if there is time in the end. Because the VML should be a proof of concept we starting with content consisting of two files for each category of media. Initially the complete content will be displayed. For this we also already had an extension. The idea was to group the content by category and let the user choose whether he wants to see the whole content or filtered by category.


Graphical presentation
[Responsibility: Zhe]

The VML consists of a matrix of six columns by five rows in the first step. The ordering of the items at the several rows is still in discussion. Since the VML should also present a zoom functionality to the user, the idea was to reorder the items dynamically if the the zoom factor has been changed.

The items displayed will have fixed dimension which will be a quadratic rectangle presenting a cover if there is one, otherwise a color which will be determined randomly or depending on the category of the item. Because it would be hard to identify an item if there is no cover presented, there will also the name of the file be displayed under the cover.


Gesture recognition
[Responsibility: Rogeria]

To navigate through the VML the Kinect camera should be used. It scans the movements of the arms/hands of the user to interact. To do so the user has to put up either the left or right hand and move it in a direction he wants the VML to move. We still discussing whether the user has the chance to move the VML just in one direction at time or to let him move also in the second direction. Before we can make a decision we have to figure out how practical it is to allow multiple dimensions of movements. The first case will be a simple quad which moves together with a hand gesture of the user.


Speech recognition and synthesis
[Responsibility: Marcus]

To get an interactive system we want to implement some speech recognition and synthesis to ‚communicate‘ with the user. The speech recognition can be used to either change the category displayed, if this feature will be implemented, or to interact with an item in the VML. For example the user can say an identification number or the name of an item to access it. The audio output will inform the user about the current status, warnings or errors. The first intention for a speech recognition frameworks was to use the one developed by Google. Unfortunately we have seen that the Google framework only offers 50 requests per day for free. If you want to make more requests you have to pay them. Therefore we have to decide whether we should use the framework or not and what is an alternative.


In the current state we are very busy to make plans about how to work with the single topics and how to connect them efficiently. We try to get in touch with the individual parts and figure out quickly what are the best next steps.