Update Gutenberg.qmd

qualiaMachine · web-flow · commit dcf23913e25e · 2025-03-25T20:41:31.000-05:00
diff --git a/Toolbox/Data/Gutenberg.qmd b/Toolbox/Data/Gutenberg.qmd
@@ -29,7 +29,7 @@ The [Project Gutenberg](https://www.gutenberg.org/) dataset contains text from t
 - **Long-form text**: The dataset includes full-length novels, short stories, and essays, making it ideal for tasks that require understanding context over longer sequences of text.
 
 #### Key applications 
-- **Language modeling**: With its vast variety of literary styles and genres, Gutenberg serves as a valuable resource for training and evaluating language models like [GPT](https://openai.com/research/gpt-3) and [BERT](https://arxiv.org/abs/1810.04805). Pre-training on Gutenberg’s diverse text corpus allows models to capture nuanced linguistic patterns, which can later be fine-tuned for more specific NLP tasks.
+- **Language modeling**: With its vast variety of literary styles and genres, Gutenberg serves as a valuable resource for training and evaluating language models like [GPT](https://openai.com/research/) and [BERT](https://arxiv.org/abs/1810.04805). Pre-training on Gutenberg’s diverse text corpus allows models to capture nuanced linguistic patterns, which can later be fine-tuned for more specific NLP tasks.
 - **Text classification**: The dataset can be applied to classification tasks such as genre classification or sentiment analysis. Researchers often use Gutenberg to train classifiers that distinguish between literary styles or detect emotional tone in texts.
 - **Summarization and translation**: Due to the diversity in content, Gutenberg is commonly used to test summarization models (e.g., creating concise book summaries) and translation algorithms across different literary forms.
 - **Topic modeling**: The diverse collection of texts allows for the exploration of underlying themes or topics through techniques like Latent Dirichlet Allocation (LDA) or Non-negative Matrix Factorization (NMF), enabling researchers to uncover hidden patterns in the literature.