First off, kudos and congrats on the launch, seems like a fun idea! I am curious, as you mentioned reverse engineering. How difficult was it to retrieve the raw gyroscope data from the AirPods - AFAIK there is no API to access this information, right?
Put a sine wave emitter (or multiple) on the scene. Enable head tracking. Analyze stereo sound at the output. Mute output. There you go: you now can track user’s head without direct access to gyroscope data.
Apple does not secretly analyze sine waves to infer head motion. Instead, airpods pro/max/gen-3 include actual IMUs (inertial measurement units), and ios exposes their readings through core motion.
It’s a known research technique called acoustic motion tracking (some labs use inaudible chirps to locate phones or headsets) you mentioned, but it’s not how airpods head tracking works
I think they're more so talking about measuring attenuation that apple applies for the "spatial audio" effect (after apple does all of the fancy IMU tracking for you), by using a known amplitude of signal in, and the ability to programmatically monitor the signal out after the effect, you can reverse engineer a crude estimated angle out of the delta between the two.
I don't think that's how this app works though, after installing it I got a permission prompt for motion tracking.
Since the author of the app mentioned reverse engineering, analyzing audio is a way that immediately came to mind. It should be quite precise, too, only at the expense of extra CPU cycles.
I did not imply that there is no API to get head tracking data (even though Google search overview straight up says that). It’s mostly a thought experiment. Kudos for digging up CMHeadphoneMotionManager.
> Apple does not secretly analyze sine waves to infer head motion.
Duh. The mechanism I described hinges on Apple being able to track head movements in the first place in order to convert that virtual 3D scene to stereo sound.
Fun experiment. Main limitation I see is the delay between actions and commentary because of the whole script generation & TTS overhead. It seems like the commentary can quickly fall behind, especially in fast-paced sports.
Naw there are tricks you can use to pipeline these things so that apparent latency is under 500ms even with significant game state history awareness, and also to interrupt ongoing but freshly out of date commentary.
I couldn’t get it under 250ms though (for rocket league), but the tech should be better now than 2024.
Author here. TTS and script generation can be a bit of an overhead for now, which is why I've worked with metric aggregates - 30+ bounces rather than exactly 33, for example. For this game, one might ideally want this overhead to be less than the time it takes for the ball to bounce from one paddle to another, which can be around 1–2 seconds. However, there may be another strategy to (maybe?) overcome this: start synthesizing numbers (ignoring the fractional part) using TTS and cache them for both commentators. Then, patch those audio clips together after core part is synthesized. It should be doable, I think - I just haven't gotten to it yet. Note that matching the excitement and tempo of core commentary with those numbers is key - otherwise, it will feel janky.
Kudos for building something new, fresh & exploring, experimenting.
I don't see a scenario where this would be useful. It reminds me of exploded-view drawing but I don't see this being useful for textual content. Do you have an explicit use case? The example page, to me, looks very cluttered, overwhelming and IMO aesthetically unpleasing when reading on a mobile device.
Eh, I think it's a neat idea. No one's forced to use or buy this - as is the case with any offered service. Also, the 'qualified human being' in the end is still the tattoo artist who's actually doing the tattoo in this use case.
I assume most people who would use this won't just get a 1:1 copy tattoo of an AI generated result, the artist can still reiterate and use the designs as a draft or inspiration.
I understand the frustration shared in this post but I wholeheartedly disagree with the overall sentiment that comes with it.
The web isn't dead, (Gen)AI, SEO, spam and pollution didn't kill anything.
The world is chaotic and net entropy (degree of disorder) of any isolated or closed system will always increase. Same goes for the web. We just have to embrace it and overcome the challenges that come with it.
I'm not so optimistic. The most basic requirements are:
1. Prove the human-ness of an author...
2. ...without grossly encroaching on their privacy.
3. Ensure that the author isn't passing off AI-generated material as their own.
We'll leave out the "don't let AI models train on my data" part for now.
Whatever solution we come up with, if any, will necessarily be mired in the politics of privacy, anonymity, and/or DRM. In any case, it's hard to conceive of a world where the human web returns as we once knew it.
The good news—such as it is—is that the Web never really became what we assumed it surely would in its early days.
If it was never really the case that you’d be better off for serious or improving reading having only the Web versus only access to a decent library, then we haven’t lost something so precious.
I mean, the most valuable site on the Web is probably a book & research paper piracy website. That’s its crowning achievement. Faster interlibrary loan, basically, but illegal.
Here is an expert saying there is a problem and how it killed its research effort, and yet you say that things are the same as ever and nothing was killed.
1. I am not discrediting the expert in any way, if anything, I think their decision to quit is understandable - there is now a challenge that arose during his research that is not in their interest to pursue (information pollution is not research in corpus linguistics / NLP).
2. I never said that things are the same as ever, quite the opposite actually. I am saying the world evolves constantly. It's naive to say company X/Y/Z killed something or made something unusable, when there is constant inevitable change. We should focus on how to move forward giving this constraint, and not dwell on times where the web was so much 'cleaner' and 'nicer', more manageable etc.