The whole robotic, monotone, helpful assistant thing was something these companies had to actively hammer in during the post-training stage. It's not really how LLMs will sound by default after pre-training.
I guess they're caring less and less about that effort especially since it hurts the model in some ways like creative writing.
Maybe, but I'm not sure how much the style is deliberate vs. a consequence of the post-training tasks like summarization and problem solving. Without seeing the post-training tasks and rating systems it's hard to judge if it's a deliberate style or an emergent consequence of other things.
But it's definitely the case that base models sound more human than instruction-tuned variants. And the shift isn't just vocabulary, it's also in grammar and rhetorical style. There's a shift toward longer words, but also participial phrases, phrasal coordination (with "and" and "or"), and nominalizations (turning adjectives/adverbs into nouns, like "development" or "naturalness"). https://arxiv.org/abs/2410.16107
How is "development" an adverb or adjective turned into a noun??
It comes from a French word (développement) and that in turns was just a natural derivation of the verb "développer"... no adverbs or adjectives (English or otherwise) seem to come into play here
Sorry, I should have said adjectives or verbs, as it's "develop" turned into a noun. Just like "discernment" or "punishment". The etymology isn't relevant for classifying it as a nominalization, only the grammatical function.
Or maybe they're just getting better at it, or developing better taste. After switching to Claude, I can't go back to ChatGPT's overly verbose bullet-point laden book reports every time I ask a question. I don't think that's pretraining—it's in the way OpenAI approaches tuning and prompting vs Anthropic.
If it's just a different choice during RLHF, I'll be curious to see what are the trade-offs in performance.
The "buddy in a chat group" style answers do not make me feel like asking it for a story will make the story long/detailed/poignant enough to warrant the difference.
I guess they're caring less and less about that effort especially since it hurts the model in some ways like creative writing.