I'd be happy to pay for such a service if the stories were written by a human, but once I learned the stories are AI generated, I bounced. Even if they're edited by a native speaker, something about that feels off. I want to learn from other humans, not from a language model.
Yeah, this is an example of where AI enables a new product by reducing standards of quality and marks of effort.
Rather than doing the work of curating stories with specific educational goals, chosen or crafted by native speakers and perhaps even noted authors, which might be an ongoing process with months of preparation and numerous humans working together towards a common vision, one hobbyist can spend $.035 at a time to accumulate a cache of "good enough" stories in countless languages.
For some, it's exciting to see the barrier to entry lowered so that one hobbyist can create a tool with so much content behind it and dreams of social uplift ask us to cheer when stuff gets cheaper because it becomes more broadly available. So there are real upsides here, but of course those upsides are coming as part of a tradeoff against quality when we look at current/near LLM technology. Those quality tradeoffs aren't going to be suited to everyone, especially those who have the luxury to pay more for better things.
I feel similar, have any thoughts on what it would take for you to learn from a language model? I think for me, it's some sort of review from a native speaker. If they gave a stamp of approval, I wouldn't think twice about learning from a human versus model.
I don't think there's any claim you could make about the language model, factual or otherwise, that would resolve my primary hangup. As a non-native learner of a new language, I do not have a trained bullshit detector for that language. I cannot, by virtue of still being a novice, determine if the sentence structure sounds "weird," and I certainly can't determine if that weirdness is a limitation of the language model, some hallucination, whatever. So, by learning from that model, I would pick up any mistakes it makes and fold those into my own speech patterns.
If I'm going to pick up speech patterns at all, I would really rather pick them up from a native speaker of the language, since at the very least I'll make the sorts of mistakes that a human might make. I want to sound like human, not like a language model. Language models sound like the average of several humans at best, and a strange program trying to imitate human speech at worst.
Once I'm fluent in a language, enough to recognize when the language model itself is probably making a mistake, then I might become comfortable using it. But not as my first introduction to the nuances of the language, when I'm still building my own internal representation up from scratch. After all, my goal is to converse with other human speakers. Shouldn't that be my personal training corpus? I'm a neural network too, and I don't want to feed myself bad data.
Really great food for thought here, I'm going to have linger on it some more. I don't know enough about LLMs to know how to counteract hallucinations, but I wonder if you could construct an LLM against a vetted corpus (much like Anthropic does with Claude) in which it solves for the problem of generating robotic speech patterns. I do agree with you a lot on the "sound" problem because I don't think I'd advise a non-native English speaker to blindly trust ChatGPT as a learning mechanism based on my experience of using it.
So, in conclusion, I'm curious to learn whether this is a solvable problem or if LLMs are inherently not the right tool to use.
Once you have a vetted corpus large enough to train an LLM on, I don't think there's any need to generate even more text with the LLM, since you can use the corpus directly.
>Language models sound like the average of several humans at best, and a strange program trying to imitate human speech at worst.
This isn't really true. Open ai use heavy rlhf to make LLMs sound like that by default but they can sound like whatever. If a native speaker says it's fine then it's fine lol. You can still choose not to use it as you can choose not to do anything but then it's an irrational fear more than real concern.
This is also essentially a substitute for graded readers for language learners
Totally agreed. I'm biased because I learn languages to mostly read texts in original languages. Narratives with well-crafted prose can have hundreds of years of linguistic and stylistic histories, as well as contemporary vernacular, which can tell more about a language than just understanding the plot of the story. I don't think that can ever be fully replaced by an LLM.
There are thousands of short stories at every level of language understanding for nearly every language in existence. I would be more interested in using AI for the languages that don't have these. (say, endangered/extinct languages, oral languages, etc)
>Narratives with well-crafted prose can have hundreds of years of linguistic and stylistic histories, as well as contemporary vernacular, which can tell more about a language than just understanding the plot of the story.
This seems like a werid hangup for what is essentially a substitute for graded readers for language learners. You're not getting any of the things you mentioned going the "human" route.
No one is saying go read these stories over full blown novels. There's no complexity difference between full blown novels and most native short stories either, just length so that's not really an option.
If you could read at that level, you wouldn't be using this or the non-LLM alternative anyway.
Just because a children's story, for example, is "simple" doesn't mean it isn't inflected by human complexities.
When you're learning a language, your brain is going through a unique process of both attention to small detail and rote memorization. If you see a pattern often enough at an early stage of language learning, you'll most likely carry that with you at later stages. Even if you don't notice it at first.
Would you trust an AI to present you with accurate language patterns--speech, vernacular, etc?
>Just because a children's story, for example, is "simple" doesn't mean it isn't inflected by human complexities.
Sure. And again this is why lots of people say recommending a Children's book or show is really a bad idea for beginner learners. Graded readers are an entirely different thing from children's fiction.
>Would you trust an AI to present you with accurate language patterns--speech, vernacular, etc?
Language patterns in text ? Yes. There's nothing special about it lol. For all of GPT-4's misgivings, "wrong" language patterns for English isn't one of them. How it writes usually is just the default hammered in by RLHF and can easily diverge when instructed. So if a native in some other language gives the A-Ok on a piece of text then that's it.
I feel like ChatGPT plus voice input (speech to text) plus voice output at different playback speeds (text to speech) would be a fine way to learn some English.
Maybe it shouldn't be the only method one uses, but I think it is a useful method.
So why not for other languages?
I'd love to use a Japanese LLM with voice capability. It could even write stories for me that I could read and listen to.