Monday, October 17, 2011

Kinect speech recognition in linux

Audio support is now part of libfreenect. Additionally it is now possible to load the microsoft SDK version of the audio firmware from linux courtesy of a utility called kinect_upload_fw written by Antonio Ospite.This version of the firmware makes the kinect appear to your computer as a standard USB microphone.
This means you can now record audio using your kinect, but that's not all that interesting in and of itself. Linux support for speech recognition at this point is not all that great. It is possible to run dragon naturallyspeaking via wine or to use the sphinx project (after much training), but neither of those approaches really appealed to me for simple voice commands (as opposed to dictation). The google android project happens to include a speech recognizer from Nuance which by default is meant to be built for an ARM target, like your phone. After extensive hacking around the build system I was able to instead build for an x86 target, like your desktop. Now, you can combine these two things- kinect array microphone + android voice recognition to do some more interesting things, i.e. toggle hand tracking on and off via voice.



How to get started:

1) Check if you have the "unbuffer" application which is part of the linux scripting language called expect:

which unbuffer

If the above command comes up empty you should download a copy of unbuffer from the link here:
http://dl.dropbox.com/u/11217419/unbuffer

copy unbuffer to a directory that is in your path, like /usr/local/bin or ~/bin

2)Download my precompiled version of the srec subproject from here:
http://dl.dropbox.com/u/11217419/srec_kinect.tgz

3)save the tarball from step 1 in a convenient directory then unpack it with this command:
tar xfz srec_kinect.tgz

4)switch into the subdirectory where I've placed some convenience scripts:
cd srec/config/en.us

5) Open a second terminal and in that second terminal also switch into srec/config/en.us

6) In the first terminal execute
./run_SRecTestAudio.sh
and in the other terminal execute
cat speech_fifo

7) try speaking into your microphone and wait for recognition results to appear in both terminals. Note that the vocabulary as configured at this point is very small- words like up,down,left,right and the numbers from 1-9 should be recognized properly.

Integrating the kinect:
1)Acquire Antonio Ospite's firmware tools like so:
git clone http://git.ao2.it/kinect-audio-setup.git/

2)move into the kinect-audio-setup subdirectory:
cd kinect-audio-setup

3)build kinect_upload_fw as root:
make install

4)Fetch and extract the microsoft kinect SDK audio firmware (depending on your directory permissions, this may also need to be run as root):
./kinect_fetch_fw /lib/firmware/kinect

This will extract the firmware to this location by default:
/lib/firmware/kinect/UACFirmware.C9C6E852_35A3_41DC_A57D_BDDEB43DFD04

5)Upload the newly extracted firmware to the kinect:
kinect_upload_fw /lib/firmware/kinect/UACFirmware.C9C6E852_35A3_41DC_A57D_BDDEB43DFD04

6)Check for a new USB audio device in your dmesg output

7)Configure the kinect USB audio device to be your primary microphone input and
try out run_SRecTestAudio.sh again as described earlier.


Additional Notes:

I unfortunately no longer remember all the changes I had to make in order for the srec project within android build for x86. Perhaps someone with better knowledge of the android build system can chime in at the comments below. In the interim, use the precompiled copy that I have linked above, just be aware that it is old, I think it dates back to the froyo branch of android or earlier (I compiled it a long time ago). If you want to take a shot at building the latest srec yourself, check out the android source code then look under external/srec/

The run_SRecTestAudio.sh script sets up the speech recognizer to run on live audio and pipes the recognition results to a fifo in the same directory called speech_fifo. Running cat in the second terminal lets you read out the recognition results as they arrive. Instead of cat you could alternatively have whatever programs needs recognition results read from the fifo and act accordingly. Unbuffer is used to make sure you see recognition results right away rather than waiting for the speech_fifo to fill up.

The srec recognizer does not require any training but has certain limitations. The most significant limitation is the vocabulary it can recognize. The larger the vocabulary you specify, the less accurate the recognition results will likely be. As a result this recognizer is best used for a small set of frequently used voice commands. Under srec/config/en.us/grammars/ there are a number of .grxml files which define what words the recognizer can understand. You can define your own simple grammar (.grxml) here which, for example, only recognizes the digits on a phone keypad. To do this you can follow the syntax of any of the other .grxml files in the directory and then execute run_compile_grammars.sh which will produce a .g2g file from the .grxml file. There is also a voicetag/texttag file with extension .tcp which needs to point to the g2g file of your choice. You can find the .tcp files under the srec/config/en.us/tcp directory. run_SRecTestAudio.sh points to a tcp file which you can specify.

46 comments:

  1. outstanding article! Can't wait to tell my kinect to control my xbmc gentoo :)
    If you don't mind I'd like to ask you if you could share your srec changes?
    In case the license allows this I'm sure making this a standalone shared lib would help a lot in many projects.
    friendly,
    marcel "frostwork"unbehaun

    ReplyDelete
  2. I intend to post up the binaries and more detailed instructions when I get home tonight. After I started writing this post I realized I had a lot of hard-coded absolute paths in the build scripts so I need to go through and make them relative paths before I post up the files.

    ReplyDelete
  3. sounds good! I'll try to wrap a cmake build system around it then to create a shared.so

    ReplyDelete
  4. thank you for sharing your changes!
    Looks like it will need some time to create proper cmake scripts around the sourcecode :)

    ReplyDelete
  5. Did I miss where you posted the changes?

    ReplyDelete
  6. kodom, look above under step 2 of "How to get started" and you'll see this link: http://dl.dropbox.com/u/11217419/srec_kinect.tgz
    That is my build of the recognizer. Were you looking for a written description of the changes needed to make that build?

    ReplyDelete
  7. Hi, i'm trying to integrate the kinect audio as how you gave in the instruction "integrating the kinect" but it always fails at the 4th step saying:

    ls: cannot access UACFirmware.*: No such file or directory
    install: cannot stat `': No such file or directory

    It seems the extraction does not produce any file. Do you know why this is happening and how to fix it exactly?

    and by the way, does this work on 64 bit machines? i'm on ubuntu 10.04 64 bit..

    Thanks man, this will be really useful..

    ReplyDelete
  8. 30hoursflight, it's possible that maybe the download location of the firmware file has changed and thus no file is getting downloaded or perhaps you need to run step 4 as root so that kinect_fetch_fw is actually able to write to /lib/firmware/kinect(or I suppose you could change the owner and permissions of that directory). Let me know if none of those ideas explain your problem. Not sure about the 64-bit concerns- my linux box is an old 32-bit machine.

    ReplyDelete
  9. The Download location is definitely correct as i've downloaded the driver myself. I, of course, also run the 4th step command as root user. What seems to be the problem is that when we extract that archive we just downloaded, it does not contain the "UACFirmware.C9C6..."
    DO you think you have any idea on this?

    About the 32 / 64 bit issue, we can simply change the download link in the script you made to download the 64 bit version driver, i guess. But either way, the problem i just mentioned above still persists.

    Thanks for helping man..

    ReplyDelete
  10. Under 64-bit ubuntu, you also need to install ia32-libs-multiarch ( sudo apt-get install ia32-libs-multiarch ) or you will get an error about "file not found" for libESR_Portable.so

    ReplyDelete
  11. hi trtg,

    can you send the makefile to build srec for Linux? (sleuthhound@gmail.com)

    thank you so much!

    ReplyDelete
  12. This comment has been removed by the author.

    ReplyDelete
  13. This comment has been removed by the author.

    ReplyDelete
  14. The onus remains solidly, on each speech producer to recognize the kind of speech most reasonable to his/her motivation.robot voice text to speech

    ReplyDelete
  15. Would take that finish on Sunday, too, he laughed. Had a long wait at 16 (485 yard par 4) because it a really tough hole. I hit a 3 wood straight into the wind and a 2 iron into the bunker. Several days later, Owen picked up a thigh injury in training. Newcastle manager Sam Allardyce admitted that Owen was likely to miss the start of the forthcoming Premier League season due to the injury which "doesn't look as encouraging as we first thought". Owen made his comeback from injury in a club friendly on 13 August 2007 and declared himself available for Newcastle's next match, against Aston Villa, as well as England's forthcoming international matches.

    And there's no need to doubt whether Longchamp Soldes Destockage these teams would be Cheap Oakley Sunglasses willing to create the cap space necessary to sign James. In his 15th season, James is averaging 26.9points, 9.0 assists and 8.4 rebounds per game while shooting 54.3percent from the field and 37.1percent from 3 point range. Bolsos Michael Kors Baratos He is Ray Ban Sale UK the most powerful and recognizable star in the league, and his impact can extend beyond the floor to a city's economy.

    ReplyDelete
  16. You can pick the kind of the voice - male or female and generally there is more than one voice accessible per sexual orientation.free text to speech converter

    ReplyDelete
  17. Both things are possible if you carry Michael Kors Handbags Wholesale. If you are a woman who goes for innovative designs, a designer Michael Kors Bags On Sale is perfect for you. Offering a huge selection of chic purses, handbags, shoes and accessories, Michael Kors Outlet Online Store celebrates womanhood in an entirely unique way. Michael Kors Factory Outlet Online Store At Wholesale Price are one of the most sought-after handbags worldwide. We all agree that diamonds are a woman's best friend; however Official Coach Factory Outlet Online are absolutely next in line. To Coach Outlet Sale aficionados, don't fret because we have great news: a discount Official Coach Outlet Online isn't hard to find. If you are a smart shopper looking for a good buy and great deals on your next handbag purchase, you can go to Official Coach Outlet Online.

    Friendly Links: Toms Shoes Womens | Toms Clearance

    ReplyDelete
  18. kebutuhan pasutri jadi semakin lengkap dengan adanya levitra 100 mg original cod di cikarang kota sebagai obat untuk pembangkit ereksi agar hubungan bersama istrinya jadi lebih memuaskan info selengkapnya . dengan bantuan pil levitra asli 100 mg cod di karawang yang berasal dari jerman kini pasutri lebih bisa memaksimalkan saat hubungan badan bersama pasangan menuju puncak seksual yang mereka inginkan baca selanjutnya . kehebatan pria saat mengonsumsi levitra asli 100 mg cod di semarang utara semakin dapat menambah percaya diri saat diranjang bersama pasangan tanpa adanya rasa kecewa saat penetrasi read more . rasa khas yang ada didalam permen soloco asli cod di bandung adalah memiliki rasa coklat yang banyak disukai oleh kalangan pria dan wanita, untuk itu permen ini sangat cocok bagi kalangan dewasa yang ingin mencapai klimaks saat berhubungan intim bersama pasangan.

    ReplyDelete
  19. Great job for publishing such a nice article. Your article isn’t only useful but it is additionally really informative. Thank you because you have been willing to share information with us. text to speech natural voices

    ReplyDelete
  20. This post is so useful and valuable to increase our knowledge. I am happy that you have shared great info with us. Grateful to you for sharing an article like this. Dropbox transfer

    ReplyDelete
  21. This comment has been removed by the author.

    ReplyDelete
  22. I guess I am the only one who comes here to share my very own experience guess what? I am using my laptop for almost the post 2 years.

    Dragon Naturally Speaking Crack

    ReplyDelete
  23. I guess I am the only one who came here to share my very own experience. Guess what!? I am using my laptop for almost the past 2 years, but I had no idea of solving some basic issues. I do not know how to Crack Softwares Free Download But thankfully, I recently visited a website named ProCrackHere
    Dragon Naturally Speaking Crack
    Netbalancer Crack

    ReplyDelete