Principal Component Analysis and Clustering

In this project we look at real world problem of clustering 168 coutries based on some Macro-economic indicators. Some examples are child mortality, exports and imports expressed as percentage of GDP etc. The dataset consists of nine such features for all the 168 countries. Our objective is to explore, if the countries fall into distinct clusters based on some similarities. This could help us identify countries desperate in need of financial need.

We start by asking by removing Outliers. These are countries with very high GDP per capita. Then we implement Principal Component Analysis ( PCA ) for dimensionality reduction. We notice that , even with 6 features we are able to retain 98% of the variance in the data.

K-Means Cluster

With the number of features reduced to six, we implement K-Means cluster on the dataset. The initial choice of K is done by Elbow Method and Solhouette Score. In both cases, we notice that K=4 is optimum. Since with six dimensions, we cannot visualize the data, so we choose to take pairs of features like Life Expectency vs Child Mortality, Health Spending vs Child Mortality, income vs child mortality.

We notice our itutions coming true in the visualizations. Countries doing ver poorly in terms of child mortality are poor in these three metrics as well.

Hierarchical Clsutering

In this section we try three methods, Ward, Single and Complete. It turns out, that the Ward method gives us the cleanest clusters.

Conclusions

We notice that both methods give us almost the same countries which need financial-aid. The countries doing poorly in terms of child mortality are have very poor macro-economic indicators.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Assignment 1 Presentation.pdf		Assignment 1 Presentation.pdf
Country-data.csv		Country-data.csv
PCA and Clustering Assignment.ipynb		PCA and Clustering Assignment.ipynb
README.md		README.md
data-dictionary.csv		data-dictionary.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Principal Component Analysis and Clustering

K-Means Cluster

Hierarchical Clsutering

Conclusions

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Arindam75/PCA

Folders and files

Latest commit

History

Repository files navigation

Principal Component Analysis and Clustering

K-Means Cluster

Hierarchical Clsutering

Conclusions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages