The problem is that in life, we are accustomed to things becoming easier as we get better at them. So you start drawing faces and it starts out feeling very difficult, but then as you practice more and more, it feels easier and easier. Of course, by the time it's feeling easy, it means that you're no longer actually getting effective practice. But nevertheless, it's the feeling that we are accustomed to. It's how we know we're getting better.
Because spaced repetition is so good at always giving you things that you will find difficult, it doesn't actually feel like you're getting better overall even though you are. The things that you are good at are hidden from you and the things that you are bad at are shown to you. The result is a constant level of difficulty, rather than the traditional decreasing level of difficulty.
I've encountered this problem myself. I built a language learning app for fun, and some of my users feel like they're not learning very much compared to alternatives that don't use spaced repetition. In fact, it's the exact opposite. They learn much more quickly with mine, but they don't have that satisfying feeling of the lessons becoming easy. (Because if I gave them easy challenges, it wouldn't be as productive!)
I'm not sure what the best way to solve this problem is. I would much appreciate any advice.
The language learning app people could try scheduling monthly video chats with native speakers (swapping turns halfway through so it's mutually beneficial) and notice their proficiency improve.
It's hard to get around without marrying the SRS with something like a hierarchical skill tree whose traversal you can be made aware of, or some other visible progress metric (eg, climbing the ELO of encountered puzzles in a chess training engine).
Still: users have to get comfortable with being uncomfortable if they want to profit from these sorts of systems.
A different issue with SRS's lazer accuracy is the Pareto tradeoff between efficiency and robustness.
I’ve always had difficulty remembering vocabulary. I remember cramming German in School 30 years back. We had 20 words we had to learn per week and I could sit a whole night repeating and repeating them just because they wouldn’t stick. And then in the morning they were all gone anyway. So I gather I am a bad language learner.
In your algorithm, do you assume everyone’s recall is the same or do you optimize for a recall rate which make everyone fail a certain percentage of the word? If so, knowing that I am supposed to not remember 70% would be a good reminder in the app to not feel bad.
You want your users to be like weight lifters. No lifter comes out the gym saying, “Man that was the best workout, felt so easy,” to the contrary, lifters use progressive overload to induce difficulty because that difficulty connects to the results they want.
For your users, you need some way to measure the outcome, so that you can show them, “hey look, that mild discomfort lead to more progress on what you care about,” and then you need to consistently message that some difficulty is good.
Mindset change takes consistency and time. Won’t happen over night. You’ll know you succeeded when students become aware of “hey, I’m not learning as well if it doesn’t feel difficult”, and then react by increasing the challenge.
Serious answer: All the dark patterns.
Loot boxes even if they only give users a digital hat, small animated bird (like the green one, but not) doing a silly dance when users get enough correct answers, some weird phrases sprinkled amongst the lessons which make the users laugh.
Just, please let them have an off switch for people like us :)
Failing to do that, one might consider that instead of focusing on how hard each card feels, but rather the size of the corpus that they have "under their belt". This is the case if you're constantly adding new cards - if you are and your cards/day is stable, then you have an ever increasing mound of memorized knowledge.
If you aren't adding new cards, then the cards/day will inevitably go down, barring some actual cognitive issue.
It's a matter of what you focus on as your measure of success.
I want to find a way to show people how much they're learning. And maybe make it more fun. But what I really want to do is change their mindset from "wow, that was easy, i'm learning so much" to "wow, that was hard, i'm learning so much".
You fail miserably at the test at the start of each chapter, and crush it at the end.
The difficult part is deciding how the tests are spread out.
And for what it’s worth, I’ve been able to sufficiently communicate some basics with my wife’s family in Mandarin that has them thrilled with me. So the learning in P is working somewhat.
IMO the way around users feeling like spaced recognition isn't progression is by redefining progression away from memorizing vocabulary into into becoming proficient in conversation both listening and speaking. If spaced recognition vocab is just one feature of a holistic experience, users will judge their progression holistically.
I'm really waiting for that one app that finally connects ChatGPT Advanced Voice Mode or Gemini Live to a context-aware and well trained language tutor. I can already have impromptu practice sessions with both in Mandarin and English but they quickly lose the plot regarding their role as a tutor. I'd love to have them available as part of a learning journey. I can study vocab and flash cards all day but the second that voice starts speaking sentences and I need to understand in real time, I freeze up. The real progress is conversing!
Here’s what I typically do:
- Create a custom GPT (mine is called Polly the Glot) with a system prompt instructing it to act as a language partner that responds only in Chinese or your target language of choice. Further specify that the user will paste a story or topic before beginning practice, and that this should guide the discussion.
- Start a new chat.
- Paste in an article from AP/Reuters.
- Turn on Voice Chat.
At that point, I’ll head out to walk my dog and can usually get about 30 minutes to an hour of solid language practice in.
Fair warning, you'll likely need to be at least an intermediate student by this point otherwise it'll probably be too much over your head.
Caveat: You could including a markdown file of your known vocabulary as an knowledge attachment in the custom GPT but I've no idea how well that would work in practice.
What helped me a lot was doing a lot of listening exercises. Start with concentrating on what you can recognize, not on what you can't. Then listen again and and again and again trying to recognize more and more.
But there's something about the "conversation" between a real human or an AI voice mode where you're not on the rails. It's real time and you have to lock in and understand. That's where the magic happens!
It seems to use a 2 decade old modification of a now 4 decades old algorithm which will be worse and waste more of the user's time than using Anki with FSRS or SuperMemo with SM-18.
My use case is drilling English --> target-language sentences, as well as law-related knowledge, miscellaneous facts, etc. Still mulling over what to do about other skills-based practices, à la Andy Matuschak's concept of "spaced everything".
- Hashcards: https://github.com/eudoxia0/hashcards
- HN discussion about Hashcards: https://hn.algolia.com/?q=hashcards
- Matuschak's "spaced everything": https://notes.andymatuschak.org/Spaced_everything
Sentence practice is really the best way to do things imo. Studying vocabulary in isolation is so limited by comparison. So nice moves there.
^[1]: Actually, my thing does target-language to english drills, not the other way around.
Very similar to Anki but with a sane UI.
We keep forgetting stuff. But we can remember it more by active recalling. And there is an evidence that you can recall with intervals that grow, to make it optimal. That’s it really. Everything else is tooling on top of that simple fact.
You fail the whole day.
Don't have the feeling anything sticks.
Then, the next day it works right from the start.
No new insights, nothing, it just works.
Not sure if it's factually correct, but it seems about right, sleeping seems to be the magic sauce, and the time when all memories are written from RAM to disk.
It seems to be a thing with practicing motion sequences.
Sure, you can sort of use SRS here, but it's suboptimal and probably will leave too many cards in the top priority "learning" pile causing too much load, or you train incorrectly.
Still, I agree that this is MUCH better than NOT doing SRS if you don't have an alternate tool with a better algorithm.
1. It doesn't train real task performance. There is a spectrum of problems that people solve. On one end it is the recall of randomized facts in a flashcard prompt->answer way. On the other end is task performance, which can be more formally thought of as finding a path through a state space to reach some goal. The prompt->answer end is what SR systems relentlessly drill you at.
2. SR is pretty costly, prompt->answer problems are also low value. If you think about real world scenarios, its unlikely that you will come across a specific prompt->answer question. And if you do, the cost of looking it up is usually low.
3. The structure of knowledge stored is very different (and worse). If you think about high performance on a real world task like programming or theorem proving, you don't recall lists of facts to solve it. There's a lot about state space exploration, utilising principles of the game, leveraging known theorems, and so on.
This is a more descriptive version of the "rote memorization" argument. There's two common counters to this:
1. Learning is memorization. This is strictly true, but the prompt->answer way of learning is a specific kind of memorization. There's a correlation-causation fallacy here - high performers trained in other ways can answer prompts really well, it doesn't mean answering prompts really well means you will becoming high performing.
2. Memorization is a part of high performance, and SR is the optimal way to learn it. This is generally true, but in many cases the memorization part is often very small.
These ideas more accurately predict how SR is only significantly better in specific cases where the value of prompt-answer recall is really high. This is a function of both the cost to failing to remember and the structure of knowledge. So medical exams, where you can't look things up and is tested a lot as prompt->recalls, SR finds a lot of use.
My own guess for the what the next generation of learning systems that will be an order of magnitude more powerful will look like this:
1. Domain specific. You won't have a general system you chuck everything in. Instead you will have systems which are built differently to each task, but on the similar principles (which are explained below).
2. Computation instead of recall - the fundamental unit of "work" will shift from recalling the answer to a prompt to making a move in some state space. This can be taking a step in a proof, making a move in chess, writing a function, etc.
3. Optimise for first principles understanding of the state space. A state space is a massive, often exponential tree. Human minds cannot realistically solve anything in it, if not for our ability to find principles and generalise them to huge swaths of the state space. This is closely related to meta-cognition, you want to be thinking about solving as much has solving specific instances of a task.
4. Engineered for state space exploration - a huge and underdeveloped ability of machines is to help humans track the massive state space explorations, and evaluate and feedback to the user. The most common used form of this is currently git + testing suites. A future learning system could have a git like system to keep track of multiple branches that are possible solutions, and the UX features to evaluate various results of each branch.
Whether memorization is useful on its own largely depends on the task.