In this project we look at real world problem of clustering 168 coutries based on some Macro-economic indicators. Some examples are child mortality, exports and imports expressed as percentage of GDP etc. The dataset consists of nine such features for all the 168 countries. Our objective is to explore, if the countries fall into distinct clusters based on some similarities. This could help us identify countries desperate in need of financial need.
We start by asking by removing Outliers. These are countries with very high GDP per capita. Then we implement Principal Component Analysis ( PCA ) for dimensionality reduction. We notice that , even with 6 features we are able to retain 98% of the variance in the data.
With the number of features reduced to six, we implement K-Means cluster on the dataset. The initial choice of K is done by Elbow Method and Solhouette Score. In both cases, we notice that K=4 is optimum. Since with six dimensions, we cannot visualize the data, so we choose to take pairs of features like Life Expectency vs Child Mortality, Health Spending vs Child Mortality, income vs child mortality.
We notice our itutions coming true in the visualizations. Countries doing ver poorly in terms of child mortality are poor in these three metrics as well.
In this section we try three methods, Ward, Single and Complete. It turns out, that the Ward method gives us the cleanest clusters.
We notice that both methods give us almost the same countries which need financial-aid. The countries doing poorly in terms of child mortality are have very poor macro-economic indicators.