I picked a random story in Spanish [1], which is my native language.
First, and maybe up for discussion, "gnocchi" in spanish is written "ñoqui". See [2] for a commercial example or check the story's title.
Second, the sentence "Después de aprender a hacer gnocchis con su abuela, Hendrik nunca le gusta los gnocchis de nadie más" is wrong: the part after the comma should read "a Hendrik nunca le gustaron los ñoquis de nadie más". I'm also unhappy with "Siempre dice que faltaba algo" as it makes a funny mix of present and past tenses.
Third, I think the last paragraph is incoherent as Hendrik learns the nutmeg trick twice (learns from grandma about nutmeg -> finds other gnocchis lacking -> learns about nutmeg).
The well-known LLMs are surprisingly bad in languages that are not English. I'm not sure I would trust them just yet.
I had a look at two German stories. They felt a little off. But I know from having looked at graded reader for German in the past, they feel a little weird for native speakers because of their simple grammar while obviously not being aimed at children.
But I have found a spell checking error, inconsistent use of formal and informal voice and an expression that was not quite right in the context. I wonder about the quality assurance part in the setup.
I think the "nutmeg trick" was ok -- it didn't say Hendrik learned the trick from his grandmother the first time, only that there was a trick. Hendrik only learns it later when trying to recreate them.
I agree, though, that a human writer would have probably made this clearer. It would have been made explicit that Hendrik missed the trick the first time around.
Hey! thanks for having a look and for dropping some feedback :) Yea, "gnocchi" is questionable ha but some people do write gnocchi to respect their origin, for example in restaurants. But yeah, fair, probably better to use the real Spanish version.
>> "Hendrik nunca le gusta los gnocchis de nadie más"
will add the "a" at the begginning! thanks for spotting that.
>> "Siempre dice que faltaba algo"
This is a common way of saying things, at least in my region. That makes Spanish harder, there are so many "versions".
>> "he well-known LLMs are surprisingly bad in languages that are not English"
So true.
Anecdotally I've noticed new generations struggle a lot more with language and writing. Imagine the future when they all learn from badly generated AI grammar. Even the good grammar generated writing is still generic AI writing.
This is really great. Congrats on shipping it! You might find https://www.lingq.com helpful as a source of inspiration. I think it's a fairly similar concept.
LingQ's killer feature for me is that as you click on words (or phrases - which I find really helpful btw) to translate them, they are added to your vocabulary list. It will automatically create flashcards for you from this list for SRS. Plus when you're reading a new story, words that are in your vocab list are highlighted yellow and new words are highlighted blue.
thank you! I will have a look at LingQ, it does look interesting. Some other people also asked me to add the vocabulary list for Webbu too, so that should be coming soon :).
This sounds great. I could imagine it takes inspiration from (or could take inspiration from) programs like Dreaming Spanish, a series of videos that are ranked at different levels of Spanish. The philosophy behind the programs is that what is most important is the active consumption of another language, like by listening and trying to understand. That actually our intuition that we need to work on generating the foreign language, by speaking or writing it, hasn't necessarily been shown by research. Instead, actively consuming another language at a level at or very slightly above your current level is perhaps the best way to become fluent at that language.
In terms of the content that you're creating, I always thought it would be very interesting to have leveled plays written in the target language. I always thought that reading how people actually speak might be more useful than reading prose.
Quick note: your article links to webbu.app but the link text is "webu.app"; that was awfully confusing.
Might be worth a pass through to proof-read; I saw another couple of typos, but the page is now throwing a Wordpress database error, so that might be more urgent to look at.
Thank you! I just updated the typo, sorry for the confusion. Thanks for the eagle eyes.The DB issue was just wordpress not being able to handle the traffic. It's back now!
I'd be happy to pay for such a service if the stories were written by a human, but once I learned the stories are AI generated, I bounced. Even if they're edited by a native speaker, something about that feels off. I want to learn from other humans, not from a language model.
Yeah, this is an example of where AI enables a new product by reducing standards of quality and marks of effort.
Rather than doing the work of curating stories with specific educational goals, chosen or crafted by native speakers and perhaps even noted authors, which might be an ongoing process with months of preparation and numerous humans working together towards a common vision, one hobbyist can spend $.035 at a time to accumulate a cache of "good enough" stories in countless languages.
For some, it's exciting to see the barrier to entry lowered so that one hobbyist can create a tool with so much content behind it and dreams of social uplift ask us to cheer when stuff gets cheaper because it becomes more broadly available. So there are real upsides here, but of course those upsides are coming as part of a tradeoff against quality when we look at current/near LLM technology. Those quality tradeoffs aren't going to be suited to everyone, especially those who have the luxury to pay more for better things.
I feel similar, have any thoughts on what it would take for you to learn from a language model? I think for me, it's some sort of review from a native speaker. If they gave a stamp of approval, I wouldn't think twice about learning from a human versus model.
I don't think there's any claim you could make about the language model, factual or otherwise, that would resolve my primary hangup. As a non-native learner of a new language, I do not have a trained bullshit detector for that language. I cannot, by virtue of still being a novice, determine if the sentence structure sounds "weird," and I certainly can't determine if that weirdness is a limitation of the language model, some hallucination, whatever. So, by learning from that model, I would pick up any mistakes it makes and fold those into my own speech patterns.
If I'm going to pick up speech patterns at all, I would really rather pick them up from a native speaker of the language, since at the very least I'll make the sorts of mistakes that a human might make. I want to sound like human, not like a language model. Language models sound like the average of several humans at best, and a strange program trying to imitate human speech at worst.
Once I'm fluent in a language, enough to recognize when the language model itself is probably making a mistake, then I might become comfortable using it. But not as my first introduction to the nuances of the language, when I'm still building my own internal representation up from scratch. After all, my goal is to converse with other human speakers. Shouldn't that be my personal training corpus? I'm a neural network too, and I don't want to feed myself bad data.
Really great food for thought here, I'm going to have linger on it some more. I don't know enough about LLMs to know how to counteract hallucinations, but I wonder if you could construct an LLM against a vetted corpus (much like Anthropic does with Claude) in which it solves for the problem of generating robotic speech patterns. I do agree with you a lot on the "sound" problem because I don't think I'd advise a non-native English speaker to blindly trust ChatGPT as a learning mechanism based on my experience of using it.
So, in conclusion, I'm curious to learn whether this is a solvable problem or if LLMs are inherently not the right tool to use.
Once you have a vetted corpus large enough to train an LLM on, I don't think there's any need to generate even more text with the LLM, since you can use the corpus directly.
>Language models sound like the average of several humans at best, and a strange program trying to imitate human speech at worst.
This isn't really true. Open ai use heavy rlhf to make LLMs sound like that by default but they can sound like whatever. If a native speaker says it's fine then it's fine lol. You can still choose not to use it as you can choose not to do anything but then it's an irrational fear more than real concern.
This is also essentially a substitute for graded readers for language learners
Totally agreed. I'm biased because I learn languages to mostly read texts in original languages. Narratives with well-crafted prose can have hundreds of years of linguistic and stylistic histories, as well as contemporary vernacular, which can tell more about a language than just understanding the plot of the story. I don't think that can ever be fully replaced by an LLM.
There are thousands of short stories at every level of language understanding for nearly every language in existence. I would be more interested in using AI for the languages that don't have these. (say, endangered/extinct languages, oral languages, etc)
>Narratives with well-crafted prose can have hundreds of years of linguistic and stylistic histories, as well as contemporary vernacular, which can tell more about a language than just understanding the plot of the story.
This seems like a werid hangup for what is essentially a substitute for graded readers for language learners. You're not getting any of the things you mentioned going the "human" route.
No one is saying go read these stories over full blown novels. There's no complexity difference between full blown novels and most native short stories either, just length so that's not really an option.
If you could read at that level, you wouldn't be using this or the non-LLM alternative anyway.
Just because a children's story, for example, is "simple" doesn't mean it isn't inflected by human complexities.
When you're learning a language, your brain is going through a unique process of both attention to small detail and rote memorization. If you see a pattern often enough at an early stage of language learning, you'll most likely carry that with you at later stages. Even if you don't notice it at first.
Would you trust an AI to present you with accurate language patterns--speech, vernacular, etc?
>Just because a children's story, for example, is "simple" doesn't mean it isn't inflected by human complexities.
Sure. And again this is why lots of people say recommending a Children's book or show is really a bad idea for beginner learners. Graded readers are an entirely different thing from children's fiction.
>Would you trust an AI to present you with accurate language patterns--speech, vernacular, etc?
Language patterns in text ? Yes. There's nothing special about it lol. For all of GPT-4's misgivings, "wrong" language patterns for English isn't one of them. How it writes usually is just the default hammered in by RLHF and can easily diverge when instructed. So if a native in some other language gives the A-Ok on a piece of text then that's it.
I feel like ChatGPT plus voice input (speech to text) plus voice output at different playback speeds (text to speech) would be a fine way to learn some English.
Maybe it shouldn't be the only method one uses, but I think it is a useful method.
So why not for other languages?
I'd love to use a Japanese LLM with voice capability. It could even write stories for me that I could read and listen to.
I think this is a great idea! I am also working on an app and we've just added a similar feature.
It's illustration heavy picture book with multiple difficulty levels and recordings by native speakers for each line of text. The goal is for students to be able to learn languages while they read. Ideally, without having to translate words - though we do provide a lookup tool. Each line in the story is read by a native speaker, and the reader has the option to record themselves and play it back to check their pronunciation.
Our first story "Ari & Chali go to the Market" teaches Thai classifiers and is perfect for someone who has completed Reading Thai Made easy or is just learning how to read.
We hope to develop more content like this for Thai, as well as other languages.
As a French language learner, one thing I’d love to see is more interesting ways of practicing a specific area/skill (eg in French that might be the passé composé or l’imparfait). There are tons of lessons on line, but they’re boring and uninteresting. They don’t motivate you like a short story or short film where you get both practice and pleasure. It seems very doable to use LLM technologies to transform a given short story to focus on specific language skills (eg rewrite to utilize mostly a specific tense). I know it’s possible because one of my first play experiments with ChatGPT was to have it write a letter to Santa using only specific tenses. In the end you get to reinforce your comprehension and recognition of specific facets of a language. I wish this was more widely available in language learning offerings Hint, hint Duolingo, please consider allowing me to choose the skill I want to learn at a given time, at least in some way. This might apply more to intermediate and advanced learners.
To elaborate on my Duolingo critique… I find it too linear and strict. I can’t really get it to focus on the skills that I’m struggling with or most want to focus upon at a given time. This is especially true when you’re learning using a combined, multifaceted approach (eg YouTube videos + kindle stories + Duolingo + classroom instruction).
The hard part of this is finding stories you want to read. Looks cool, but The one story i read was maybe 5 minutes. To make progress you need an hour or more per day. Well you can make progress on 5 minutes per day, but 500-1000 hours is what it takes to be useful and so 5 minutes a day just doesn't add up fast enough.
Thanks for the feedback! Hopefully at some point we can add more stories, quickly. I'm also looking into other methods of learning, we should have a few more soon.
That is true. At first I thought the stories were created on demand.
Some of us are compulsive readers once we get used to it, and I'm looking forward to being able to do that in my target language.
Nice start! I love the idea of an AI assisted learning experience structured around stories.
I make a tool in the same space (readlang.com). I started it before the current LLM wave but I've recently added LLM-generated explanations and several users have been uploading LLM generated texts to read (including me!). I've been considering adding LLM based practice too, similar to what you've done with the comprehension questions, but haven't got around to that yet.
It looks like great stuff but I've taken Show HN out of the title because (1) blog posts can't be Show HNs - please see the rules: https://news.ycombinator.com/showhn.html; and
I'm learning Japanese and something like this would be beneficial.
> You should be able to tap on a word and get a translation.
This functionality is available already on Kindle though :) you have the option of uploading your own dictionaries to it too.
That, combined with using https://www.clippings.io/ to manage highlighted text, makes the kindle an all round great tool for learning languages from books, or any text really. (you can use calibre to convert into and between most ebook formats)
I look forward to seeing where this goes! I imagine the really difficulty will be applying it to more than just German. Especially when you branch outside of indoeuropean languages
I built something like this for iOS/macOS, Manabi Reader, specifically for Japanese: https://reader.manabi.io It has collections of short stories and other RSS-based reading materials for different levels. The reading materials are all written by native speaker humans, not AI-generated. It's also a general purpose web browser and RSS reader.
Beyond Kindle capabilities, it also tracks the words, kanji, sentences you read to show you analytics based on that, chart your progress against JLPT levels simply by reading, and uses that data (all in your own iCloud/device storage) to coordinate flashcard review. I recently added early Anki integration. Still working hard on it, bringing more languages soon and cross-plat via SwiftWASM a bit later.
I find the feature on Kindle to be awful to use. It's sluggish and I've got to do additional work to import the highlights into clippings where I'll then have to do more work to put it somewhere I can consistently review? ugh. The experience OP and software like LingQ have created makes learning words a joy and it's sustainable.
NHK Web News Easy is pretty good. I built a simple browser extension that would hide the furigana and then show it when you clicked on it. If you're interested, I could dig that up.
Ultimately, I got sick of splicing together different home-grown tools and different web sites that I started to build my own to provide a more integrated learning experience that adapts to how students want to learn.
I am working on a Japanese course with a well-known Japanese teacher, and we're hoping to start rolling that out soon. Feel free to DM me if you're interested in that or want to chat about learning Japanese.
That sounds like a cool project! I started off with NHK Easy but recently moved on to normal NHK news, NHK easy is such a fantastic resource. And having something that shows Furigana to you only when you click on the text sounds great! Right now I'm using Migaku for all my text highlighting / tooltip needs, which is working out great!
I'm currently working on German vocabulary learning app, https://vokabeln.io/, which uses LLMs in a similar way. The app allows you to paste text and extracts the vocabulary to learn. The vocabulary is then repeated with spaced repetition, audios are generated and users can generate infinitely more examples.
I might have made something that works only for me, as getting users seems to be extremely difficult, but I enjoy it much more than anything else I've tried.
It's quite difficult to get attention just off flashcards. Anecdotally my flashcard app gets 1/10 the traffic/interest of my "learn by reading native texts" app which integrates with the flashcard app but also lets people use their preference; a lot of people like to avoid using more than one flashcard app and might already have settled on one, but may be open to other learning techniques besides flashcards or that can extend their existing flashcards system/data
It's also hard to charge money just for flashcards when there are so many free options and the paid options can be pretty good already
I think a big issue on my part is marketing and UX improvements. It can do much more than "just flashcards" apps.
For example, I use it to learn from youtube videos, by copying the transcript to the app, which then processes the words, and creates flashcards for the words that I don't already know.
I'm thinking of integrating youtube transcripts, song lyrics, open books such as Grimms stories directly into it. Should not be that much work. But I needed to release and instead of infinitely adding features to my dev build.
I see those features but they look like they’re positioned as flashcard app features. Think about uses of those materials beyond them ending up as flashcards. That’s what I’m trying to do with Manabi Reader anyhow, still early
I can't wait to get access to ChatGPT with voice so I can practice my Spanish. I suspect that the utility of vanilla ChatGPT for language learning will soon obviate the need for any custom apps.
I've been thinking about a layer on top of these chat apps that augment the experience for learning in particular. I've found that once you dig into the problem, a key element to solve for is reinforcement. The "provide information" part is easy, now how do you make it stick and how do you continue over an extended period of time?
Not a fan of using an AI voice for language learned. The melody and rhythm is off. If you're learning a language, hearing native speakers is important.
First, and maybe up for discussion, "gnocchi" in spanish is written "ñoqui". See [2] for a commercial example or check the story's title.
Second, the sentence "Después de aprender a hacer gnocchis con su abuela, Hendrik nunca le gusta los gnocchis de nadie más" is wrong: the part after the comma should read "a Hendrik nunca le gustaron los ñoquis de nadie más". I'm also unhappy with "Siempre dice que faltaba algo" as it makes a funny mix of present and past tenses.
Third, I think the last paragraph is incoherent as Hendrik learns the nutmeg trick twice (learns from grandma about nutmeg -> finds other gnocchis lacking -> learns about nutmeg).
The well-known LLMs are surprisingly bad in languages that are not English. I'm not sure I would trust them just yet.
[1] https://webbu.app/l/spanish/story/los-%C3%B1oquis-de-la-abue...
[2] https://www.pastasgallo.es/productos/noquis-de-patata-seca/