The modern AI phone support systems I’ve encountered aren’t able to do anything or go off script, so it sounds better but it’s still a lousy experience.
I've fed thousands of dollars to Anthropic/OAI/etc for their coding models over the past year despite never having paid for dev tools before in my life. Seems commercially viable to me.
> I've fed thousands of dollars to Anthropic/OAI/etc for their coding models over the past year despite never having paid for dev tools before in my life. Seems commercially viable to me.
For OpenAI to produce a 10% return, every iPhone user on earth needs to pay $30/month to OpenAI.
They don't sell their models to individuals only but also to companies with most likely different business and pricing models so that's an overly simplistic view of their business. YoY their spending increases, we can safely assume that one of the reasons is the growing user base.
Time will probably come when we won't be allowed to consume frontier models without paying anything, as we can today, and time will come when this $30 will most likely become double or triple the price.
Though the truth is that R&D around AI models, and especially their hosting (inference), is expensive and won't get any cheaper without significant algorithmic improvements. According to the history, my opinion is that we may very well be ~10 years from that moment.
Not sure where that math is coming from. Assuming it's true, you're ignoring that some users (me) already pay 10X that. Btw according Meta's SEC filings: https://s21.q4cdn.com/399680738/files/doc_financials/2023/q4... they made around $22/month/american user (not even heavy user or affluent iPhone owner) in q3 2023. I assume Google would be higher due to larger marketshare.
Latency may be better, but throughput (the thing companies care about) may be the same or worse, since every step the entire diffusion window has to be passed through the model. With AR models only the most recent token goes through, which is much more compute efficient allowing you to be memory bound. Trade off with these models is more than one token per forward pass, but idk the point where that becomes worth it (probably depends on model and diffusion window size)
I feel like it made the experience worse. Just like removing the headphone jack made things worse. Want to connect a midi keyboard to your iOS device and play some instruments on GarageBand? Do you also want to use external speakers without Bluetooth latency? Well too bad!
I don’t know that world (midi and music) but these seems solvable with a dongle/external device. Some quick googling shows at least a few such devices that seem to do the trick. Obviously those cost more money but even with a headphone jack you’d need an adapter for the midi input to the iOS device right?
Then again, you have to realize that your use-case is almost a rounding error. There just can’t be that people, as a percentage, that have that need and it makes sense (to me) to optimize for the largest pie slice and let dongles/accessories cover the gaps for everyone else.
Right so I need to buy new hardware for something I used to do for free? You don’t need any adapters, just plugin the midi keyboard into the usb port. With the old devices a lightning to usb converter was needed but that let you do lots of other things as well. Let’s be honest Apple removed the jack to get people to buy their shitty AirPods. My solution in the end is to stop buying iPhones and Apple products all together.
>Right so I need to buy new hardware for something I used to do for free?
>With the old devices a lightning to usb converter was needed
So it wasn't for free. You still had to get dongles.
Personally, I just need an audio interface to plug in a guitar, a headphones and speakers into my iPad. I need to buy something, but nothing that I wouldn't have had to buy with a PC either.
It's not like what you are describing is impossible today. With the switch to USB-C, iOS devices are compatible with a vast number of affordable adapters. Some of which add features and ports that realistically couldn't be physically on a phone like HDMI or RJ45.
What percentage of iOS users use a midi keyboard with their devices? 0.01%?
My desktop audio interface plugs right in an iPhone (USB-C to C), no hub or dongle needed, and provides audio in/out, 5-pins midi in/out, microphone preamp, etc.
If it comes to the flexibility of improvising a jam session with inexpensive gear, we are in a much better place today than 10 years ago when phones had headphone jacks. And I say that as someone who uses wired headphones extensively and carries a 3.5mm dongle everywhere.
The switch to USB-C was the whole reason I went to iOS personally. Lack of a proper headphone jack does suck (the Apple USB-C to 3.5mm is quite good however). Too bad we can't have both
At those speeds, it's probably impossible. It would require enormous amounts of memory (which the chip simply doesn't have, there's no room for it) or rather a lot of bandwidth off-chip to storage, and again they wouldn't want to waste surface area on the wiring. Bit of a drawback of increasing density.
When text diffusion models started popping up I thought the same thing as this guy (“wait, this is just MLM”) though I was thinking more MaskGIT. The only thing I could think of that would make it “diffusion” is if the model had to learn to replace incorrect tokens with correct ones (since continuous diffusion’s big thing is noise resistance). I don’t think anyone has done this because it’s hard to come up with good incorrect tokens.
I've played around with MLM at the UTF8 byte level to train unorthodox models on full sequence translation tasks. Mostly using curriculum learning and progressive random corruption. If you just want to add noise, setting random indices to random byte values might be all you need. For example:
I expect it to output the full corrected target bytes. The overall training process follows this curriculum:
Curriculum Level 0: Corrupt nothing and wait until the population/model masters simple repetition.
Curriculum Level 1: Corrupt 1 random byte per target and wait until the population/model stabilizes.
Curriculum Level N: Corrupt N random bytes per target.
Rinse & repeat until all target sequences are fully saturated with noise.
An important aspect is to always score the entire target sequence each time so that we build upon prior success. If we just evaluate on the masked tokens, the step between each level of difficulty would be highly discontinuous in the learning domain.
Ive stopped caring about a lot of the jargon & definitions. I find that trying to stick things into buckets like "is this diffusion" gets in the way of thinking and trying new ideas. I am more concerned with whether or not it works than what it is called.
The problem with that is we want the model to learn to deal with its own mistakes. With continuous diffusion mistakes mostly look like noise, but with what you’re proposing mistakes are just incorrect words that are semantically pretty similar to the real text, so the model wouldn’t learn to consider those “noise”. The noising function would have to generate semantically similar text (e.g., out of order correct tokens maybe? Tokens from a paraphrased version?)
> There are about 936 tokens with very low L2 norm, centered at about 2. This likely means that they did not occur in the training process of GPT-oss and were thus depressed by some form of weight decay.
Afaik embedding and norm params are excluded from weight decay as standard practice. Is this no longer true?
Could it instead be the case that these tokens were initialized at some mean value across the dataset (plus a little noise), and then never changed because they were never seen in training? Not sure if that is state of the art anymore but e.g. in Karpathy's videos he uses a trick like this to avoid the "sharp hockey stick" drop in loss in the early gradient descent steps, which can result in undesirably big weight updates.
Unfortunately the article glances over some of practices of uncovering such patterns in the training data. It goes very straitghfully to the point, no lube needed. It didn't land well for me.
reply