This dataset is a collection of restaurants registered on Zomato in Bengaluru City. It contains over 50,000 rows and 17 columns, offering a fairly large dataset for analysis. The dataset provides valuable insights into restaurant details, such as location, ratings, type, cost, and more.
This project aims to provide hands-on experience in performing various data analysis tasks, while also understanding how real-world problem statements are analyzed using this data.
Dataset - Kaggle Zomata Dataset
- Identifying and dealing with missing data in the dataset to ensure clean and usable data.
- Analyze and understand the numerical columns in the dataset.
- Analyze and understand the categorical columns in the dataset.
- Identifying relationships between different features to draw useful conclusions and patterns.
- Read the Dataset: Import the dataset to begin the analysis.
- Understand Each Feature: Investigate and document the details of each column.
- Explore Dataset Info: Use dataset functions to get an overview, including the data types, missing values, etc.
- Describe the Data: Generate descriptive statistics for numerical columns and identify categorical and numerical features.
- Delete Redundant Columns: Remove unnecessary columns to make the data more manageable.
- Rename Columns: Ensure columns have clear and consistent names.
- Drop Duplicates: Remove any duplicate rows.
- Clean Individual Columns: Address inconsistencies or anomalies in individual columns.
- Remove NaN Values: Handle missing data by removing or replacing NaN values.
- Check for More Transformations: Evaluate if any further transformations are needed for the dataset.
- Restaurants Delivering Online or Not: Visualize the distribution of restaurants offering online delivery.
- Restaurants Allowing Table Booking or Not: Visualize the proportion of restaurants allowing table booking.
- Table Booking Rate vs Rating: Investigate the relationship between the availability of table booking and restaurant ratings.
- Best Location: Find out which locations have the best restaurants based on rating or reviews.
- Relation Between Location and Rating: Analyze how restaurant ratings vary by location.
- Restaurant Type: Visualize the different types of restaurants and their distribution.
- Gaussian Restaurant Type and Rating: Explore the relationship between restaurant type and its rating using a Gaussian distribution.
- Types of Services: Explore the different types of services provided by restaurants.
- Relation Between Type of Service and Rating: Investigate how the type of service affects restaurant ratings.
- Cost of Restaurant: Visualize the cost distribution of restaurants in the dataset.
- No. of Restaurants in a Location: Find out which locations have the highest number of restaurants.
- Most Famous Restaurant Chains in Bengaluru: Identify the most popular restaurant chains based on data analysis.
- Write down the inferences about what you have learned from the dataset.
- Discuss the insights gained from the analysis, including relationships between different features, patterns, and trends observed.
- Identify potential problems or challenges that can be solved using this dataset.
- Python: The primary programming language for data analysis.
- Pandas: For data manipulation and cleaning.
- NumPy: For numerical operations and handling arrays.
- Jupyter Notebooks: For an interactive data analysis environment.