I'd be happy to pay for such a service if the stories were written by a human, b...

swatcoder · on Oct 10, 2023

Yeah, this is an example of where AI enables a new product by reducing standards of quality and marks of effort.

Rather than doing the work of curating stories with specific educational goals, chosen or crafted by native speakers and perhaps even noted authors, which might be an ongoing process with months of preparation and numerous humans working together towards a common vision, one hobbyist can spend $.035 at a time to accumulate a cache of "good enough" stories in countless languages.

For some, it's exciting to see the barrier to entry lowered so that one hobbyist can create a tool with so much content behind it and dreams of social uplift ask us to cheer when stuff gets cheaper because it becomes more broadly available. So there are real upsides here, but of course those upsides are coming as part of a tradeoff against quality when we look at current/near LLM technology. Those quality tradeoffs aren't going to be suited to everyone, especially those who have the luxury to pay more for better things.

bavarianbob · on Oct 10, 2023

I feel similar, have any thoughts on what it would take for you to learn from a language model? I think for me, it's some sort of review from a native speaker. If they gave a stamp of approval, I wouldn't think twice about learning from a human versus model.

zeta0134 · on Oct 10, 2023

I don't think there's any claim you could make about the language model, factual or otherwise, that would resolve my primary hangup. As a non-native learner of a new language, I do not have a trained bullshit detector for that language. I cannot, by virtue of still being a novice, determine if the sentence structure sounds "weird," and I certainly can't determine if that weirdness is a limitation of the language model, some hallucination, whatever. So, by learning from that model, I would pick up any mistakes it makes and fold those into my own speech patterns.

If I'm going to pick up speech patterns at all, I would really rather pick them up from a native speaker of the language, since at the very least I'll make the sorts of mistakes that a human might make. I want to sound like human, not like a language model. Language models sound like the average of several humans at best, and a strange program trying to imitate human speech at worst.

Once I'm fluent in a language, enough to recognize when the language model itself is probably making a mistake, then I might become comfortable using it. But not as my first introduction to the nuances of the language, when I'm still building my own internal representation up from scratch. After all, my goal is to converse with other human speakers. Shouldn't that be my personal training corpus? I'm a neural network too, and I don't want to feed myself bad data.

bavarianbob · on Oct 10, 2023

Really great food for thought here, I'm going to have linger on it some more. I don't know enough about LLMs to know how to counteract hallucinations, but I wonder if you could construct an LLM against a vetted corpus (much like Anthropic does with Claude) in which it solves for the problem of generating robotic speech patterns. I do agree with you a lot on the "sound" problem because I don't think I'd advise a non-native English speaker to blindly trust ChatGPT as a learning mechanism based on my experience of using it.

So, in conclusion, I'm curious to learn whether this is a solvable problem or if LLMs are inherently not the right tool to use.

yorwba · on Oct 10, 2023

Once you have a vetted corpus large enough to train an LLM on, I don't think there's any need to generate even more text with the LLM, since you can use the corpus directly.

wahnfrieden · on Oct 10, 2023

yes llm can be deployed simply to search for material, to synthesize it, cite it, etc instead of being used to generate more similar material

famouswaffles · on Oct 10, 2023

>Language models sound like the average of several humans at best, and a strange program trying to imitate human speech at worst.

This isn't really true. Open ai use heavy rlhf to make LLMs sound like that by default but they can sound like whatever. If a native speaker says it's fine then it's fine lol. You can still choose not to use it as you can choose not to do anything but then it's an irrational fear more than real concern.

This is also essentially a substitute for graded readers for language learners

bavarianbob · on Oct 10, 2023

I just read the article and OP actually does get it checked by a native speaker. You still have concerns despite that additional step?

westernaccess · on Oct 10, 2023

Totally agreed. I'm biased because I learn languages to mostly read texts in original languages. Narratives with well-crafted prose can have hundreds of years of linguistic and stylistic histories, as well as contemporary vernacular, which can tell more about a language than just understanding the plot of the story. I don't think that can ever be fully replaced by an LLM.

There are thousands of short stories at every level of language understanding for nearly every language in existence. I would be more interested in using AI for the languages that don't have these. (say, endangered/extinct languages, oral languages, etc)

famouswaffles · on Oct 10, 2023

>Narratives with well-crafted prose can have hundreds of years of linguistic and stylistic histories, as well as contemporary vernacular, which can tell more about a language than just understanding the plot of the story.

This seems like a werid hangup for what is essentially a substitute for graded readers for language learners. You're not getting any of the things you mentioned going the "human" route.

No one is saying go read these stories over full blown novels. There's no complexity difference between full blown novels and most native short stories either, just length so that's not really an option.

If you could read at that level, you wouldn't be using this or the non-LLM alternative anyway.

westernaccess · on Oct 11, 2023

Just because a children's story, for example, is "simple" doesn't mean it isn't inflected by human complexities.

When you're learning a language, your brain is going through a unique process of both attention to small detail and rote memorization. If you see a pattern often enough at an early stage of language learning, you'll most likely carry that with you at later stages. Even if you don't notice it at first.

Would you trust an AI to present you with accurate language patterns--speech, vernacular, etc?

famouswaffles · on Oct 11, 2023

>Just because a children's story, for example, is "simple" doesn't mean it isn't inflected by human complexities.

Sure. And again this is why lots of people say recommending a Children's book or show is really a bad idea for beginner learners. Graded readers are an entirely different thing from children's fiction.

>Would you trust an AI to present you with accurate language patterns--speech, vernacular, etc?

Language patterns in text ? Yes. There's nothing special about it lol. For all of GPT-4's misgivings, "wrong" language patterns for English isn't one of them. How it writes usually is just the default hammered in by RLHF and can easily diverge when instructed. So if a native in some other language gives the A-Ok on a piece of text then that's it.

Shorel · on Oct 10, 2023

I think it is definitely the other way around.

An AI will generate the kind of simplistic stories I need for my current level of proficiency in my target language.

An actual human will more often than not slip some complex grammar and vocabulary before I am ready for it.

When you are at B2-C1 level, then you can appreciate the nuances of human generated stories, but for A1-A2, what difference would it make?

The AI is better because it will make all the content I can consume, immediately and ready.

ronyeh · on Oct 10, 2023

"learn from other humans."

"learn from an LLM who learned from other humans"

I feel like ChatGPT plus voice input (speech to text) plus voice output at different playback speeds (text to speech) would be a fine way to learn some English.

Maybe it shouldn't be the only method one uses, but I think it is a useful method.

So why not for other languages?

I'd love to use a Japanese LLM with voice capability. It could even write stories for me that I could read and listen to.