📰 Fake News Classifier

Hi there 👋

This project will deep dive you into the NLP and LLMs tools and techniques that are most commanly used.

Fake News Classifier

Project Overview

The Fake News Classifier is a machine learning project aimed at identifying and categorizing news articles as either real or fake. By leveraging Natural Language Processing (NLP) techniques and machine learning models, the project preprocesses textual data and predicts its authenticity with high accuracy.

📰 Fake News Classifier

🚀 Project Overview

Fake News Classifier is a Machine Learning-based NLP project designed to classify news articles as Real or Fake. By employing advanced Natural Language Processing (NLP) techniques and supervised learning algorithms, the model delivers an impressive accuracy of 90%.

🌟 Key Features

Pre-processing: NLTK, Regular Expressions, Stemming, Lemmatization, TF-IDF, Bag of Words (BoW), Count Vectorizer.
Models Used:
- Data Mining: Porter Stemmer, Count Vectorizer
- Classification: Multinomial Naïve Bayes
Dataset: Custom or publicly available datasets with 5000 max features for feature extraction.
Performance: Achieves 90% accuracy on the test set.

🛠️ Tools and Libraries

Python
Pandas, NumPy, Matplotlib
Scikit-learn
NLTK

🧪 Methodology

Data Preprocessing:
- Cleaned and tokenized the text using Regular Expressions.
- Removed stop words and punctuations.
- Applied Stemming and Lemmatization for word normalization.
- Extracted features using TF-IDF, BoW, and Count Vectorizer.
Model Training:
- Utilized Porter Stemmer for feature mining.
- Applied Multinomial Naïve Bayes for classification.
Evaluation:
- Tested on a separate dataset.
- Visualized results with graphs for accuracy, precision, and recall.

📊 Visualizations

Here are some visual insights into the project's performance: A classification matrix, also known as a confusion matrix, is a table used to evaluate the performance of a classification model. It compares the predicted labels from the model with the actual labels (true values) from the data. The confusion matrix provides a summary of prediction results and is typically used for binary and multi-class classification problems.

For binary classification, the confusion matrix looks like this:

Predicted Positive (1) Predicted Negative (0) Actual Positive (1) True Positive (TP) False Negative (FN) Actual Negative (0) False Positive (FP) True Negative (TN) Where:

True Positive (TP): The number of instances where the model correctly predicted the positive class. False Positive (FP): The number of instances where the model incorrectly predicted the positive class (Type I error). True Negative (TN): The number of instances where the model correctly predicted the negative class. False Negative (FN): The number of instances where the model incorrectly predicted the negative class (Type II error). From the confusion matrix, several important performance metrics can be derived, such as:

Matrix shows classification stats for PassiveModel

Matrix shows classification stats for Multinomial Naive bayes Model

Distribution of top features extracted from the dataset.

📂 Project Structure

Fake-News-Classifier/
├── data/                  # Dataset files  
├── notebooks/             # Jupyter notebooks for EDA and development  
├── src/                   # Python source files  
│   ├── preprocessing.py   # Data preprocessing code  
│   ├── train_model.py     # Model training script  
│   └── predict.py         # Prediction script  
├── requirements.txt       # Required libraries  
├── README.md              # Project documentation  
└── results/               # Evaluation results and graphs

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Fake_news_classifier_lammet.ipynb		Fake_news_classifier_lammet.ipynb
Fake_news_classifier_stemm.ipynb		Fake_news_classifier_stemm.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hi there 👋