You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copyright (c) 2022 [MaiMemo](https://www.maimemo.com/), Inc. MIT License.
4
+
5
+
Stochastic-Shortest-Path-Minimize-Memorization-Cost (SSP-MMC) is a spaced repetition scheduling algorithm used to help learners remember more words in MaiMemo, a language learning application in China.
6
+
7
+
This repository contains a public release of the data and code used for several experiments in the following paper (which introduces SSP-MMC):
8
+
9
+
> Waiting for the result of SIGKDD2022
10
+
11
+
# Software
12
+
13
+
The file `data_preprocessing.py` is used to preprocess data for the DHP model.
14
+
15
+
The file `cal_model_param.py` contains the DHP model and HLR model.
16
+
17
+
The file `model/utils.py` saves the parameters of the DHP model for training and simulation.
18
+
19
+
The file `algo/main.cpp` contains a Cpp implementation of SSP-MMC, which aims at finding the optimal policy.
20
+
21
+
The file `simulator.py` provides an environment for comparing different scheduling algorithms.
22
+
23
+
## Workflow
24
+
25
+
1. Run `data_preprocessing.py` -> `halflife_for_fit.tsv`
26
+
2. Run `cal_model_param.py` -> `intercept_` and `coef_` for the DHP model
27
+
3. Save the parameters to the function `cal_recall_halflife` and ` cal_forget_halflife` in `model/utils.py` and the function `cal_next_recall_halflife` in `algo/main.cpp`
28
+
4. Run `algo/main.cpp` -> optimal policy in `algo/result/`
29
+
5. Run `simulator.py` to compare the SSP-MMC with several baselines.
30
+
31
+
## Data Set and Format
32
+
33
+
The dataset is available on [Dataverse](https://doi.org/10.7910/DVN/VAGUL0) (1.6 GB). This is a 7zipped TSV file containing our experiments' 220 million MaiMemo student memory behavior logs.
34
+
35
+
The columns are as follows:
36
+
37
+
-`u` - student user ID who reviewed the word (anonymized)
38
+
-`w` - spelling of the word
39
+
40
+
-`i` - total times the user has reviewed the word
41
+
-`d` - difficulty of the word
42
+
-`t_history` - interval sequence of the historic reviews
43
+
-`r_history` - recall sequence of the historic reviews
44
+
-`delta_t` - time elapsed from the last review
45
+
-`r` - result of the review
46
+
-`p_recall` - probability of recall
47
+
-`total_cnt` - number of users who did the same memory behavior
0 commit comments