I am a faithful user of flashcards to study Chinese words, with Anki as my software of choice to take care of the spaced repetition rescheduling. Even though I try to keep my queue empty on a daily basis, there are still days when I feel like I’m swimming against the tide. If I look at my forecast of upcoming cards, the level of daily cards quickly drops to a low baseline after a week or so. Yet, I never seem to reach the level that Anki’s forecast graph promises me. Then there are other days where I get weary of the constant drilling and skip a few days. When I come back to study, I have a large queue of overdue cards waiting for me (as expected). However, once those cards are cleared, Anki’s forecast of future cards is surprisingly good—maybe better than if I hadn’t skipped those days. Am I being punished for my diligence? Is this just my perception of the flashcard experience, or am I encountering something tangible related to SRS scheduling?

A way to test various theories was to create a simulation of Anki’s SRS scheduling. Spaced repetition is a system of scheduling the reviews of individual facts, where each fact is reviewed near a point where it’s about to be forgotten. Facts that are recalled correctly in this time period get reinforced in memory, and these facts are then rescheduled at an increasing interval in the future. Anki uses a modified version of the SM-2 method historically used in the SuperMemo SRS software. For each presented card, the user scores on a scale from 1 to 4 the ability to recall, where 1 is forgotten, and 2-4 are varying levels of confidence (“hard”, “good”, or “easy”) in the recall of the tested card. New cards that are answered 2 or 3 (4 is unavailable for new cards) are rescheduled on an initial schedule that is preconfigured, defaulting to 1 or 4 days. Reviewed cards that are answered 2-4 are rescheduled for exponentially longer times into the future. For example, a new card answered 3 (easy for a new card), then answered 5 times at level 4 (easy for reviewed cards) will be repeated at 4 days from the initial encounter, then 10 more days, then 26, then 67, and finally 174 days. A reviewed cards answered at level 1 (a “lapsed” card) resets this review interval almost as if it were a new card. For example, if the card above were forgotten on the sixth review, the interval would be reset to 4 days at most, even if the same card was remembered for an interval of 174 days at some point in the past.

There are some nuances to the simplistic description above of Anki’s SRS scheduling formula. The great thing about Anki is that the source code is readily available. That makes it easy to see the method itself in code, and write a quick program in Python to duplicate the function. My implementation uses object-oriented classes representing a card and a deck containing them. The Card class keeps track of the reviews of every card, and also handles the rescheduling based on previous answers and the current ease. The Deck class loops through all the Cards due for the simulated session, recalculating the next rescheduling interval based on the data parameters specific to each individual card.

The initial data to seed the cards in the simulation was taken straight from my Anki deck. The format of the Anki files is simply a SQLite database, so importing the current review interval and last recall level for each card was straightforward from an SQL query. I simulated the random answering of cards by creating a probability distribution of each recall level 1-4 as a function of the last recall level. To do this, I ran SQL queries on the history data for my Anki deck to find the real counts for each card review I made on the deck. The results look like the table below.

Probability of Transition to New Level
Old Level 1 (review) 2 (hard) 3 (good) 4 (easy)
1 (new) 0.628 0.133 0.239 N/A
1 (review) 0.148 0.852 N/A N/A
2 (hard) 0.169 0.155 0.189 0.486
3 (good) 0.199 0.185 0.212 0.404
4 (easy) 0.142 0.111 0.17 0.577

There is a distinction in Anki between new cards and reviewed cards. New cards only have 3 levels, and level 1 just puts the card back on the stack for a review a small amount of time later. Levels 2 and 3 set the initial rescheduling once the new card has been remembered successfully. However, level 3 is only available the first time the card is shown (although this sems to have changed in newer versions of Anki). Reviewed cards move between levels 1-4, with level 1 indicating a lapse in recall. However, this state is transient, as the failed card will continue to be reviewed in the same day’s session until it’s successfully recalled. The only transitions available for lapsed cards are 1 or 2, so that all lapsed cards will end up at level 2 by the end of the session.

What you may have noticed from this probability distribution is that I only knew 37% of new cards the first time I saw them, while the other 63% needed to be learned. That’s a very low success rate. This was a special vocabulary deck I used for cramming, so I wasn’t surprised by the high percentage of unfamiliar cards. Also note that the probability of a learned card being forgotten (i.e., lapsed) in this particular deck is between 14% and 20%, depending on the ease of the last review. These are pretty unimpressive, but not unreasonable numbers. While a 15% failure rate seems high, consider that a failure rate close to 0% would mean either a perfect memory, or a constant drilling of a large number of facts to ensure nothing is forgotten. SuperMemo allows its lapse rate to be configured between 3%-20% (affecting the rescheduling factors), but it suggests a reasonable default of 10%. Also, note that my chance of forgetting is lower for cards I previously rated as Hard (level 2) vs. cards I rated as Good (level 3). And the chance of rating Easy (4) is also higher for Hard cards than for Good. What this could mean is that I should have been more critical of my card ratings. Rating difficult cards as Good reschedules them farther in the future than they should have been, and I was more likely to forget them.

I have made the Python scripts available for others to try out. If you want to run your own experiments, you can look at the included sample programs to see some ways that the simulation can be used. I continue to find ways to improve the simulation, so this is definitely a work in progress. My next posts will be to use the simulator in various ways to test aspects of Anki that apply to the way I tend to use it. I have already found some interesting results, and I will be reporting on these results in the coming weeks.