Skip to content

Commit 16b67bd

Browse files
replace x with t
1 parent d8db24c commit 16b67bd

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

_episodes/05-tf-idf-documentEmbeddings.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,14 +55,14 @@ However, rare is a relative term. In a corpus of documents about blue whales, th
5555

5656
One method for constructing more advanced word embeddings is a model called TF-IDF.
5757

58-
TF-IDF stands for term frequency-inverse document frequency. The model consists of two parts: term frequency and inverse document frequency. We multiply the two terms to get the TF-IDF value.
58+
TF-IDF stands for term frequency-inverse document frequency and can be calculated for each document, *d*, and term, *t*, in a corpus. The calculation consists of two parts: term frequency and inverse document frequency. We multiply the two terms to get the TF-IDF value.
5959

60-
**Term frequency(t,d)** is a measure how frequently a term, *t*, occurs in a document, *d*. The simplest way to calculate term frequency is by simply adding up the number of times a term occurs in a document, and dividing by the total word count in the document.
60+
**Term frequency(t,d)** is a measure for how frequently a term, *t*, occurs in a document, *d*. The simplest way to calculate term frequency is by simply adding up the number of times a term occurs in a document, and dividing by the total word count in the document.
6161

6262
**Inverse document frequency** measures a term's importance. Document frequency is the number of documents, *N*, a term occurs in, so inverse document frequency gives higher scores to words that occur in fewer documents.
6363
This is represented by the equation:
6464

65-
IDF(x) = ln[(N+1) / (DF(T)+1)]
65+
IDF(t) = ln[(N+1) / (DF(T)+1)]
6666

6767
where...
6868
* N represents the total number of documents in the corpus

0 commit comments

Comments
 (0)