View Project
I have just completed case study for various Indian food cuisines. The dataset includes the dishes name, diet type, course name, description, cuisine name, ratings, similar dishes, ingredients, preparation time, cooking time, total time, servings, and recipe instructions. The goal of this case study is to clean and analyze the data to draw meaningful insights that can help improve data quality and support better decision-making.
Objectives
Data Cleaning: Apply best practices for data cleaning to ensure the dataset is accurate, consistent, and usable.
Data Analysis: Conduct exploratory data analysis (EDA) to uncover trends and patterns in the data.
Key Insights
Popular Cuisines: Punjabi, South Indian, and Bengali cuisines are the most represented in the dataset.
Diet Preferences: Vegetarian dishes dominate, suggesting a high preference for vegetarian options.
Cooking Time and Ratings: Dishes with longer cooking times tend to have slightly lower ratings, potentially indicating a trade-off between complexity and satisfaction.
Ingredient Complexity: Recipes with more ingredients are generally more complex but do not always correlate with higher rating
Findings
The dataset has no null values. A significant majority of dish ratings are clustered around 4.9. There is no correlation between dish preparation time and its rating, nor between the length of recipe instructions and dish ratings. Additionally, some invalid data types were found in the "Ratings of Dish" column, which may affect analysis accuracy. It is important to address these invalid entries to ensure reliable insights. Overall, while the data is mostly complete and consistent, the identified issues have been corrected for more precise analysis and to understand any underlying trends or patterns better.
Recommendations
Data Quality Improvement
Consistency: Ensure consistency in text fields (e.g., cuisine names, dish names) by creating standardized lists.
Completeness: Implement robust data collection methods to minimize missing values and inconsistencies.
Validation: Introduce validation rules to check for unrealistic values (e.g., negative cooking times).
Future Data Collection
Enhanced Details: Collect additional metadata such as preparation difficulty and regional variations.
User Feedback: Incorporate user reviews and comments to provide more context to ratings.
Standardized Categories: Use controlled vocabularies for cuisine types, diet types, and ingredients to improve data consistency.
Analysis and Reporting
Ongoing Analysis: Regularly update the dataset and re-evaluate the analysis to capture emerging trends.
Visualization: Use interactive dashboards to make insights accessible and actionable for stakeholders.
Conclusion
The data cleaning and analysis revealed valuable insights into Indian food cuisines, including popular dishes, cuisine types, and cooking time trends. By addressing the recommendations, future data collection efforts can enhance data quality and support more informed decision-making.
Built with