World Cuisines Data Project

Alyssa Crawford

Notes from 2023:
This Jupyter Notebook was part of my final project for ISTA 131 "Dealing with Data" in 2020.
I used MatPlotLib to explore a Kaggle competition dataset that was used to train ML models to guess cuisine / culture of a recipe based on its ingredients. The data originated from US recipe website Yummly.
I attempted to explore trends in ingredients used across the world. It was a fun exercise, but also majorly flawed, haha... I believe I did a lot of the preliminary data cleaning by hand, ex. combining "chilli pepper", "chili pepper", and "chile pepper" into one ingredient. Since the data comes from an English-speaking, US-based website, the authenticity of each recipe is questionable. Maybe I should just add the "-American" suffix to each cuisine category...
Also, the Python code for this project can be found on my GitHub ! It was written during my third semester at UArizona, so it might be a little rough around the edges (you can tell that I was still more comfortable with Java at the time, since I threw everything into a class). You might also notice that the styling of the figures is mostly kept default -- I could've changed that, but I think they look pretty good as is.

In [ ]:
from world_cuisines import *
In [ ]:
# read data, clean data, and make some reused calculations
database = World_Cuisines("train.json")
clean_data1 complete
totals_data complete
make_top_ten complete
clean_data2 complete
make_top_ten complete
make_percentages complete
make_category_data complete
make_top_ten complete
make_percentages complete
In [ ]:
database.make_top_ten_graphs()
In [ ]:
database.make_top_ten_cat_graphs()
In [ ]:
database.make_animal_products_graph()

Notes from 2023 on this next graph:
This graph is definitely interesting, but the premise has some flaws, for sure. There are certainly other major sources of salt/sodium that I left out. Cheese can be salty, but isn't always a significant source of salty flavor in a dish. Plus, although that wasn't really the focus, this says nothing of the actual amount of each salty ingredient used. Comparing the volume of salty ingredients might look very different from the frequency.

In [ ]:
database.make_sodium_graph()

Notes from 2023 on this next graph:
The premise here was that fatty and acidic/sour elements of food are known to balance each other out. I believe I did explore other flavors, and these two had the most interesting relationship.
I could have done a lot more statistical analysis on this data, or at the very least displayed the R2.

In [ ]:
database.make_fat_acid_graph()