Skip to content

Commit d10cbde

Browse files
committed
lung cancer model added
1 parent 19a88ff commit d10cbde

11 files changed

+1383
-0
lines changed
+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
https://www.kaggle.com/datasets/mysarahmadbhat/lung-cancer/data
2+
LUNG CANCER DATA
+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
We have used EDA to display the following:
2+
1. Distirbution of Lung Cancer Density by Age
3+
2. Used plots to display the count of "yes" and "no"s by various parameters
4+
3. Used barcharts to see if SMOTE is needed
5+
4. Use corelation heatmaps to see dependence of various columns (as it approaches 1, we get more positive correlation)
Loading
Loading
Loading
Loading
Loading
+55
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
## Lung Cancer Detection Using Ten Models
2+
# Goal
3+
The goal is to compare performance of standard machine learning models to Keras Sequential Model, which is in total 10 models
4+
## Dataset
5+
The dataset is : https://www.kaggle.com/datasets/mysarahmadbhat/lung-cancer/data
6+
## Description
7+
The effectiveness of cancer prediction system helps the people to know their cancer risk with low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system .
8+
9+
## What I have done
10+
1. Data cleaning and removal of duplicates.
11+
2. EDA to see dependance of parameters and data distribution
12+
3. SMOTE to balance classes for imbalanced data
13+
4. Using 9 models (std. ML) for checking performance and classification
14+
5. Designing a Keras Sequential Model for Lung Cancer Detection
15+
16+
## Libraries used
17+
1. numpy
18+
2. pandas
19+
3. matplotlib
20+
4. seaborn
21+
5. tensorflow
22+
6. keras
23+
7. sklearn
24+
25+
## Visualization
26+
![Alt text](<../Images/Screenshot (379).png>)
27+
![Alt text](<../Images/Screenshot (381).png>)
28+
![Alt text](<../Images/Screenshot (384).png>)
29+
30+
## Models Used
31+
1. Logistic Regression
32+
2. KNN
33+
3. SVC
34+
4. DecisionTree Classifier
35+
5. Random Forest Classifier
36+
6. Catboost Classifier
37+
7. XGBoost Classifier
38+
8. LGBM Classifier
39+
9. Keras Sequential Models
40+
## Accuracy
41+
1. Logistic Regression - 0.95
42+
2. KNN - 0.94
43+
3. SVC - 0.95
44+
4. DecisionTree Classifier -0.94
45+
5. Random Forest Classifier - 0.95
46+
6. Catboost Classifier - 0.96
47+
7. XGBoost Classifier - 0.95
48+
8. LGBM Classifier - 0.95
49+
9. Gradient Boosting Classifier -0.95
50+
9. Keras Sequential Model - 0.98
51+
## Conclusion
52+
Successfully able to develop a Machine Learning Model that can Analyse or Predict Lung Cancer.
53+
Keras Sequential Model is most useful at 98%, with Catboost coming in at close second (96%). SMOTE is needed to balance classes. The sequential model is effective for lung cancer detection because it enables the construction of a step-by-step neural network, allowing the model to learn hierarchical representations. This is crucial for capturing intricate patterns in medical data. Additionally, CatBoost, a gradient boosting algorithm, complements the sequential model by enhancing its predictive power. CatBoost handles categorical features adeptly, vital in medical datasets, and mitigates overfitting. The combination of a sequential model and CatBoost leverages their respective strengths, resulting in a robust and accurate system for lung cancer detection.
54+
55+
## Aindree Chatterjee

Lung Cancer Detection/Models/ml-all.ipynb

+1
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)