|
| 1 | +# Fraudulent Transactions Detection |
| 2 | + |
| 3 | +**AIM** |
| 4 | + |
| 5 | +Develop a model for predicting fraudulent transactions for a financial company and use insights from the model to develop an actionable plan. |
| 6 | + |
| 7 | +**DATASET** |
| 8 | + |
| 9 | +https://www.kaggle.com/datasets/miznaaroob/fraudulent-transactions-data |
| 10 | + |
| 11 | +**CONTENT** |
| 12 | + |
| 13 | +Data for the case is available in CSV format having 6362620 rows and 10 columns. |
| 14 | + |
| 15 | +**WHAT I HAD DONE** |
| 16 | + |
| 17 | +First I imported all the required libraries and dataset for this project. Then I did some EDA to find which mode of transaction results into most fraudulent transactions. Then I worked throught to treat any inconsistency in the data. Then I proceeded to build the model. I worked two different models and compared results from both to select mode appropriate one for this project. First I used a logistic regression model classify Fraudulent and Non fraudulent transactions. Next I worked with Random Forest classifier model to amp up the accuracy which resulted in some improvement from the previous LR moel. At the end I observed an accuracy of 99.97. |
| 18 | + |
| 19 | +**MODELS USED** |
| 20 | + |
| 21 | +The models are: |
| 22 | + |
| 23 | +1. Logistic Regression |
| 24 | +2. Random Forest Classifier |
| 25 | + |
| 26 | +**HOW TO RUN** |
| 27 | + |
| 28 | +Upload kaggle api key file and fraud_transaction_detection.ipynb file on colab and just run the code. |
| 29 | + |
| 30 | +**LIBRARIES NEEDED** |
| 31 | + |
| 32 | +* Opendatasets (for downloading the dataset) |
| 33 | +* Pandas - for data analysis |
| 34 | +* Numpy - for data analysis |
| 35 | +* matplotlib - for data visualization |
| 36 | +* seaborn - for data visualization |
| 37 | +* itertools - for data analysis |
| 38 | + |
| 39 | +**VISUALIZATION** |
| 40 | + |
| 41 | +### Dataset Head snipshot |
| 42 | + |
| 43 | + |
| 44 | +### Confusion matrix for Logistic Regression model |
| 45 | + |
| 46 | + |
| 47 | +### Confusion matrix for Random Forest Classifier model |
| 48 | + |
| 49 | + |
| 50 | +By viewing Confusion Matrix it is easily deduced that RFC model is the best model for this project. |
| 51 | + |
| 52 | +### Accuracy score for Random Forest Classifier in comparison to Logistic Regression |
| 53 | + |
| 54 | + |
| 55 | +**ACCURACIES** |
| 56 | + |
| 57 | +| Model | Architecture | Accuracy in % (on testing data) | |
| 58 | +| ------------- |:---------------------------------:|:-------------:| |
| 59 | +| Model 1 | Logistic Regression model |99.83 | |
| 60 | +| Model 2 | Random Forest classifier Model |99.97 | |
| 61 | + |
| 62 | + |
| 63 | +**CONCLUSION** |
| 64 | + |
| 65 | +I was successfully able to find the most accurate model to detect fraudlent transactions. |
| 66 | + |
| 67 | +# Tanish Khandelwal |
| 68 | + |
| 69 | +Connect with me on Linkedin: https://www.linkedin.com/in/tanish-khandelwal-da/ |
| 70 | + |
| 71 | +Check out my Github profile: https://github.com/tknishh |
0 commit comments