Skip to content

Commit 9acb776

Browse files
committed
recent edits added to workbench version
1 parent 6508c13 commit 9acb776

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

episodes/05-tf-idf-documentEmbeddings.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ TF-IDF stands for term frequency-inverse document frequency and can be calculate
6363

6464
**Term frequency(*t*,*d*)** is a measure for how frequently a term, *t*, occurs in a document, *d*. The simplest way to calculate term frequency is by simply adding up the number of times a term occurs in a document, and dividing by the total word count in the document.
6565

66-
**Inverse document frequency** measures a term's importance. Document frequency is the number of documents, *N*, a term occurs in, so inverse document frequency gives higher scores to words that occur in fewer documents.
66+
**Inverse document frequency** measures a term's importance. Document frequency is the number of documents a term occurs in, so inverse document frequency gives higher scores to words that occur in fewer documents.
6767
This is represented by the equation:
6868

6969
IDF(*t*) = ln[(*N*\+1) / (DF(*t*)+1)]
@@ -83,7 +83,8 @@ Now that we've seen how TF-IDF works, let's put it into practice.
8383

8484
Earlier, we preprocessed our data to lemmatize each file in our corpus, then saved our results for later.
8585

86-
Let's load our data back in to continue where we left off:
86+
Let's load our data back in to continue where we left off. First, we'll mount our google drive to get access to our data folder again.
87+
8788

8889
```python
8990
from pandas import read_csv

0 commit comments

Comments
 (0)