Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Because at some point, someone decided that 8 kbps makes for an acceptable audio stream per subscriber. And at first, the novelty of being able to call anyone anywhere, even with this awful quality, was novel enough that people would accept it. And most people did until the carriers decided they could allocate a little more with VoLTE, if it works on your phone in your area.


> Because at some point, someone decided that 8 kbps makes for an acceptable audio stream per subscriber.

Has it not been like this for a very long time? I was under the impression that "voice frequency" being defined as up to 4 kHz was a very old standard - after all, (long-distance) phone calls have always been multiplexed through coaxial or microwave links. And it follows that 8kbps is all you need to losslessly digitally sample that.

I assumed it was jitter and such that lead to lower quality of VoIP/cellular, but that's a total guess. Along with maybe compression algorithms that try to squeeze the stream even tighter than 8kbps? But I wouldn't have figured it was the 8kHz sample rate at fault, right?


Sure, if you stop after "nobody's vocal coords make noises above 4khz in normal conversation", but the rumbling of the vocal coords isn't the entire audio data which is present in-person. Clicks of the tongue and smacking of the lips make much higher frequencies, and higher sample rates capture the timbre/shape of the soundwave instead of rounding it down to a smooth sine wave. Discord defaults to 64kbps, but you can push it up to 96kbps or 128kbps with nitro membership, and it's not hard to hear an improvement with the higher bitrates. And if you've ever used bluetooth audio, you know the difference in quality between the bidirectional call profile, and the unidirectional music profile, and wished to have the bandwidth of the music profile with the low latency of the call profile.


> Sure, if you stop after "nobody's vocal coords make noises above 4khz in normal conversation"

Huh? What? That's not even remotely true.

If you read your comment out loud, the very first sound you'd make would have almost all of its energy concentrated between 4 and 10 kHz.

Human vocal cords constantly hit up to around 10 kHz, though auditory distinctiveness is more concentrated below 4 kHz. It is unevenly distributed though, with sounds like <s> and <sh> being (infamously) severely degraded by a 4 kHz cut-off.


AMR (adaptive multi-rate audio codec) can get down to 4.75 kbit/s when there's low bandwidth available, which is typically what people complain about as being terrible quality.

The speech codecs are complex and fascinating, very different from just doing a frequency filter and compressing.

The base is linear predictive coding, which encodes the voice based on a simple model of the human mouth and throat. Huge compression but it sounds terrible. Then you take the error between the original signal and the LPC encoded signal, this waveform is compressed heavily but more conventionally and transmitted along with the LPC signal.

Phones also layer on voice activity detection, when you aren't talking the system just transmits noise parameters and the other end hears some tailored white noise. As phone calls typically have one person speaking at a time and there are frequent pauses in speech this is a huge win. But it also makes mistakes, especially in noisy environments (like call centers, voice calls are the business, why are they so bad?). When this happens the system becomes unintelligible because it isn't even trying to encode the voice.


The 8KHz samples were encoded with relatively low encoding complexity PCM (G.711) at 8KHz. That gets to a 64kbps data channel rate. This was the standard for "toll quality" audio. Not 8kbps.

The 8kbps rates on cellular are the more complicated (relative to G.711) AMR-NB encoding. AMR supports voice rates from about 5-12kbps with a typical 8kbps rate. There's a lot more pre and post processing of the input signal and more involved encoding. There's a bit more voice information dropped by the encoder.

Part of the quality problem even today with VoLTE is different carriers support different profiles and calls between carriers will often drop down to the lowest common codec which is usually AMR-NB. There's higher bitrate and better codecs available in the standard but they're implemented differently by different carriers for shitty cellular carrier reasons.


> The 8KHz samples were encoded with relatively low encoding complexity PCM (G.711) at 8KHz. That gets to a 64kbps data channel rate. This was the standard for "toll quality" audio. Not 8kbps.

I'm a moron, thanks. I think I got the sample rate mixed up with the bitrate. Appreciate you clearing that up - and the other info!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: