format equation a bit more

qualiaMachine · web-flow · commit bf9f78cb946e · 2024-04-15T19:21:07.000-05:00
diff --git a/_episodes/05-tf-idf-documentEmbeddings.md b/_episodes/05-tf-idf-documentEmbeddings.md
@@ -57,18 +57,18 @@ One method for constructing more advanced word embeddings is a model called TF-I
 
 TF-IDF stands for term frequency-inverse document frequency and can be calculated for each document, *d*, and term, *t*, in a corpus. The calculation consists of two parts: term frequency and inverse document frequency. We multiply the two terms to get the TF-IDF value.
 
-**Term frequency(t,d)** is a measure for how frequently a term, *t*, occurs in a document, *d*. The simplest way to calculate term frequency is by simply adding up the number of times a term occurs in a document, and dividing by the total word count in the document.
+**Term frequency(*t*,*d*)** is a measure for how frequently a term, *t*, occurs in a document, *d*. The simplest way to calculate term frequency is by simply adding up the number of times a term occurs in a document, and dividing by the total word count in the document.
 
 **Inverse document frequency** measures a term's importance. Document frequency is the number of documents, *N*, a term occurs in, so inverse document frequency gives higher scores to words that occur in fewer documents.
 This is represented by the equation:
 
-IDF(t) = ln[(N+1) / (DF(T)+1)]
+IDF(*t*) = ln[(*N*+1) / (DF(*t*)+1)]
 
 where...
-* N represents the total number of documents in the corpus
-* DF(t) represents document frequency for a particular term/word, t. This is the number of documents a term occurs in.
+* *N* represents the total number of documents in the corpus
+* DF(*t*) represents document frequency for a particular term/word, *t*. This is the number of documents a term occurs in.
 
-The key thing to understand is that words that occur in many documents produce smaller IDF values since the denominator grows with DF(x).
+The key thing to understand is that words that occur in many documents produce smaller IDF values since the denominator grows with DF(*t*).
 
 We can also embed documents in vector space using TF-IDF scores rather than simple word counts. This also weakens the impact of stop-words, since due to their common nature, they have very low scores.