Thursday, February 24, 2011

Kinect audio reverse engineering

I did some work on getting the kinect audio hardware to work as part of openkinect/libfreenect a while back. Here's some quick notes on what I've figured out and how:
Once the audio firmware has been loaded the kinect sends 524 bytes to the xbox every 1ms, every tenth packet is short (60 bytes) but potentially preceded by an empty packet. The short packets appear to be non-audio data (maybe signaling of some sort) because if you exclude them the resulting data doesn't appear to have any gaps.

The audio samples appear to be 32 bits signed each at 16khz (if you assume that sample rate then the FFT of the recorded data has the correct frequency).
The 4 channels seem to be transmitted in order left to right from the perspective of someone looking at the front of the kinect. The leftmost channel is transmitted first. 256 samples of each channel are transmitted before switching to the next channel.
if you stitch together disparate 256 sample blocks to reconstruct a given channel the data appears to be continuous. The plot below shows the captured 4-channel audio stream with the channels labeled from left to right as 1,2,3,4. You can see that the leftmost channel has the greatest amplitude corresponding to the fact that the speaker was placed closest to the leftmost microphone. I repeated the test with a speaker near the rightmost microphone and, as expected, channel 4 became the strongest.

I was able to determine the information in the last paragraph by synthesizing a 700Hz sine wave in matlab and then playing it back at the kinect with a speaker nearest the leftmost microphone (as seen from the front of the kinect). I then captured the data stream coming back from the kinect while I played the sine wave using a beagle USB sniffer. I extracted the 524 byte blocks I suspected to be audio from the beagle dumps and then post processed them with a series of shell scripts before reading them into matlab and plotting the FFT of this audio as seen below:

The frequency shown by the FFT is correctly 700Hz(approx.) This suggests that my interpretation of the audio format is correct.

Firmware loading process

I've managed to duplicate so far what I think is most of the init sequence-
I send all the same control transfers and bulk transfers as the Xbox,
as far as I can tell. My beagle480 confirms that I mirror the Xbox behavior for the most part. After completing a series of 512 byte bulk-out transfers which I
assume is some sort of bootstrapping firmware upload, the audio device
re-enumerates, I wait for that to happen and open the new audio
device, then send some more control and bulk transfers. So far ,so
good, this all follows what I see in the Xbox logs. At this point the
Xbox appears to send 12 cycles of ( 1 xfer: 0 byte iso IN, 8 xfers 4
bytes out) which I also duplicate perfectly. Now, the final step is a
very long stream of (1 xfer: 0 byte iso IN, 8 xfers 76 bytes out)
before eventually those 0 byte IN transfers become 524 bytes
transfers. Unfortunately it seems the content of those 76 byte OUT
transfers must matter because after trying all zeros I never get any
data back in my IN transfers (even after >5000 IN transfers). I have
some scripts I'll use to try generate code for all those OUT transfers
directly from the .tdc files.


  1. nice post! I've been looking around and this is the best article about kinect audio I've seen so far!

  2. Excellent work! You inspired me to spend my weekend working out some more details which I'm (slowly) writing up at

    The set of bulk transfers before the device reenumerates are indeed the firmware upload, and the contents of that upload byte-for-byte match a firmware image extracted from an Xbox360 update package. This is required.

    After reenumeration, the control transfers performed are the crypto validation that the Xbox360 uses to ensure it's talking to an authentic Kinect. The entire sequence of control transfers is optional (and since it takes a while, I skipped it).

    The bulk transfers after the reenumeration are uploading CEMD data (Complex Empirical Mode Decomposition). It turns out that this is also optional, but if we send the right data, it should improve the noise-cancelled signal - it's basically the calibration data.

    Finally, we have the isochronous streams. Each of the 524-byte IN transfers we receive are actually a 12-byte header and 512 bytes data as S32_LE at 16KHz (as you described). After 4 bytes of magic in the header (0x80000080), there is a one-byte tag that tells us which channel the data belongs to with values between 0x01 and 0x0a. Values of (2,3), (4,5), (6,7), and (8,9) correspond to channels 1, 2, 3, and 4 respectively. The 512 bytes in the packet tagged 0x01 is 16-bit signed samples at 16KHz - the noise-cancelled data. I still don't know what the data in the short packet (always tagged 0x0a) means.

    The iso OUT transfers are 4 bytes header, 72 bytes data. The first two bytes seem to be a 16-bit little-endian timestamp that shows up in the header of the IN transfers a few msec later; the next byte is a sequence number that overflows at 0x80, and the last byte involves some rather obtuse logic to produce what appears to be a timestamp split across multiple transfers. I'll see if I can write it up reasonably some time.

    I have a mostly-working driver that I can use to dump the raw PCM data to files. I'm going to clean it up a little bit, then post it for all.

    I did all this with the adafruit USB dumps and have no beagle480; if you have any other traces that you'd be willing to share, they might help me test some hypotheses. :)

  3. Drew, that's very cool that you got it all working. It's also good that you took much better notes than I did ;P There were a number of things that I never got around to documenting like the channel numbering and header structure. It's very interesting that a lot of the control transfers are actually optional. I wonder if my inability to get the inbound iso stream started was then do to inappropriately initializing some other part of the kinect.

    I should be able to provide you with more dumps later this week- I'm going to borrow an xbox on thursday. That's actually a big part of why I hadn't worked on this in the past month or so- I don't own an xbox. Let me know what exactly you want to test out and I'll try to post the data for you on dropbox or something like that.

    How did you verify that the 16-bit samples are the noise cancelled result? I'd be meaning to analyze it at least subjectively, but never got around to it. Might try to tonight.

  4. It seems that the iso OUT transfers need to have an (at least partially) valid header, or the iso IN transfers don't happen. Probably has to do with the timestamp in the header that the OUT transfers echo.

    I'm excited to have more dumps to work with! The best thing you could do is dump the entire Kinect audio calibration routine (if you can) - I'd like to get a better handle on the CEMD data. If you can videotape the sequence as well, that would also be excellent, since it would tell me what's happening when, and maybe what real sounds are being played. After looking at the CEMD blob a bit more, it looks like a bunch of signed 32-bit floats between -.5 and .5. Converting that to PCM data, you get a series of what look kinda like impulse response filters. Not sure exactly how to interpret them, but I think I'm on the right track.

    I haven't actually been able to stream the noise-cancelled stream - I think I may have some parameters in my iso OUT transfers wrong. However, I'm confident in my interpretation because the data in that space from the adafruit dump sounds like this if interpreted as 16-bit little endian PCM samples at 16kHz. That's definitely Limor's voice, and it sounds like there's some automatic volume adjustment going on there too.

  5. Having made a small recording with the official SDK (raw streams), it looks like the actual dynamic range is only 16 bits - ie. only the upper 16 bits of the 32 bit signed integers are actually used...

  6. Time we at the race track, he be over the wall helping out the guys any way he can, Chilton added of his owner. The parts to the mechanics, getting the wheel guns out of the way. Just making everyone life easier. Was at my parents home. And I was looking out their kitchen window. And I saw some Amish men walking down the street.

    One of his favorite pastimes was making mixtapes Cheap Oakley Sunglasses for long trips in the Longchamp Soldes Destockage car. He'd tape rock tunes from his early years in the United States. Bob Dylan, a voice he first heard through the walls of Bolsos Michael Kors Baratos his apartment, thanks to a neighbor who played his stereo sociopathically loud.

    MARTIN: There have been other Republicans who have at least softened their language when thinking about an assault weapons ban. Ray Ban Sale UK I interviewed a top Republican donor recently who said that he was going to stop supporting candidates who didn't fully embrace an assault weapons ban. Why do you argue that that's a bad idea.

  7. While there still isn't much precedence for Minnesota cocktail room design, those we've visited in other states range aesthetically from garage bar dumps to posh martini lounges. Oskey and partner/childhood bud Jon Kreidler's open and inviting space is taproom casual, classed up with a few grandpa's den leather chairs and an elegant chandelier dangling above the U shaped bar. In other words, feel free to rock a Summit T shirt and your favorite cargo shorts from 2001 while sipping a comely Clover Club on Tattersall's patio..

    Lawless stresses that he wasn the only medical person in the room Sunday night. The trauma team is a team consisting of several nurses and other doctors, social workers and even a chaplain. And Cheap Oakley Sunglasses after the Longchamp Soldes Destockage shooting it was hands on deck. He Ray Ban Sale UK said the report now exonerates him on the Russia probe. So, what did they sayut that? Here's ABC's chief justice correspondent pierhomas tonight. Reporter: President trump claims a report from the Bolsos Michael Kors Baratos inspector general hasleared him of any ongoing.

  8. Both things are possible if you carry Michael Kors Handbags Wholesale. If you are a woman who goes for innovative designs, a designer Michael Kors Bags On Sale is perfect for you. Offering a huge selection of chic purses, handbags, shoes and accessories, Michael Kors Outlet Online Store celebrates womanhood in an entirely unique way. Michael Kors Factory Outlet Online Store At Wholesale Price are one of the most sought-after handbags worldwide. We all agree that diamonds are a woman's best friend; however Official Coach Factory Outlet Online are absolutely next in line. To Coach Outlet Sale aficionados, don't fret because we have great news: a discount Official Coach Outlet Online isn't hard to find. If you are a smart shopper looking for a good buy and great deals on your next handbag purchase, you can go to Official Coach Outlet Online.

    Friendly Links: Toms Shoes Womens | Toms Clearance

  9. kepuasan dalam bercinta bisa diraih oleh kedua pasangan asal selalu menjaga stamina sebelum beraktifitas seksual, namun akhir-akhir ini banyak wanita yang mengeluh karena banyak pria yang tidak bisa memberi kepuasan lantaran mereka mengalami disfungsi ereksi sehingga alat vital tidak dapat ereksi ketika mau berhubungan intim, dengan hadirnya pil biru asli cod di cikarang bisa memberikan segalanya bagi pasutri yang ingin mencapai klimaks ketika berhubungan badan baca seterusnya . kebutuhan biologis memang sangat penting untuk anda perhatikan karena jika sampai hal ini kita diamkan maka bisa mengakibatkan retaknya hubungan keluarga hingga dapat menimbulkan penceraian. jual permen soloco cod di karawang barat solusi terbaik bagi pria yang tidak bisa mengonsumsi jenis tablet karena rasanya yang pahit. permen soloco memiliki rasa coklat yang kebanyakan disuka oleh pria maupun wanita dengan rasa yang khas. klik disini . kini klg di semarang tengah banyak anda jumpai ditoko obat yang menjual obat-obatan khas untuk pria akan tetapi keaslian produk harus anda ketehaui sebelum anda membelinya ditempat tersebut info lebih lanjut .