A lot of styling is done still in the old Advanced Substation subtitle format, which is nice in a whole number of ways but doesn't have any standards working group behind and so it's a bit ignored in software and operations.
People use either some flavor of W3C's Timed Text or WebVTT instead (and it was already a pain to get them to drag their feet into them and drop the old analog broadcast formats). Now, here's the thing. WebVTT isn't radically different in format and features to (A)SSA and it has plenty of styling options... but, once again, a lot of platforms and software are dragging their feet to support them.
So the industry has been sloooowly doing the right thing moving to the W3C standards (not a huge fan of Timed Text myself, but it exists for a reason), but only with the most basic and safe features. Which are also about as many features you get out of plain speech to text output, so it's even easier to make that decision.
The amount of calls on some pages displaying the simplest stuff is mind-boggling. 160 requests for a page just displaying a HTML5 video and a title, 360 requests for a Reddit page, it's nuts. We don't need to be like this.
People use either some flavor of W3C's Timed Text or WebVTT instead (and it was already a pain to get them to drag their feet into them and drop the old analog broadcast formats). Now, here's the thing. WebVTT isn't radically different in format and features to (A)SSA and it has plenty of styling options... but, once again, a lot of platforms and software are dragging their feet to support them.
So the industry has been sloooowly doing the right thing moving to the W3C standards (not a huge fan of Timed Text myself, but it exists for a reason), but only with the most basic and safe features. Which are also about as many features you get out of plain speech to text output, so it's even easier to make that decision.