{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this practice exercise, we will analyze Chipotle order data. The file \"Chipotle.tsv\" contains this information in a tab separated file. First, read in the data. You might find the delimiter input to the pandas read_csv method useful. Use the head method to check that everything has been read in correctly. "
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" order_id | \n",
" quantity | \n",
" item_name | \n",
" choice_description | \n",
" item_price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" Chips and Fresh Tomato Salsa | \n",
" NaN | \n",
" $2.39 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" Izze | \n",
" [Clementine] | \n",
" $3.39 | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" Nantucket Nectar | \n",
" [Apple] | \n",
" $3.39 | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" Chips and Tomatillo-Green Chili Salsa | \n",
" NaN | \n",
" $2.39 | \n",
"
\n",
" \n",
" 4 | \n",
" 2 | \n",
" 2 | \n",
" Chicken Bowl | \n",
" [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | \n",
" $16.98 | \n",
"
\n",
" \n",
" 5 | \n",
" 3 | \n",
" 1 | \n",
" Chicken Bowl | \n",
" [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... | \n",
" $10.98 | \n",
"
\n",
" \n",
" 6 | \n",
" 3 | \n",
" 1 | \n",
" Side of Chips | \n",
" NaN | \n",
" $1.69 | \n",
"
\n",
" \n",
" 7 | \n",
" 4 | \n",
" 1 | \n",
" Steak Burrito | \n",
" [Tomatillo Red Chili Salsa, [Fajita Vegetables... | \n",
" $11.75 | \n",
"
\n",
" \n",
" 8 | \n",
" 4 | \n",
" 1 | \n",
" Steak Soft Tacos | \n",
" [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... | \n",
" $9.25 | \n",
"
\n",
" \n",
" 9 | \n",
" 5 | \n",
" 1 | \n",
" Steak Burrito | \n",
" [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... | \n",
" $9.25 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" order_id quantity item_name \\\n",
"0 1 1 Chips and Fresh Tomato Salsa \n",
"1 1 1 Izze \n",
"2 1 1 Nantucket Nectar \n",
"3 1 1 Chips and Tomatillo-Green Chili Salsa \n",
"4 2 2 Chicken Bowl \n",
"5 3 1 Chicken Bowl \n",
"6 3 1 Side of Chips \n",
"7 4 1 Steak Burrito \n",
"8 4 1 Steak Soft Tacos \n",
"9 5 1 Steak Burrito \n",
"\n",
" choice_description item_price \n",
"0 NaN $2.39 \n",
"1 [Clementine] $3.39 \n",
"2 [Apple] $3.39 \n",
"3 NaN $2.39 \n",
"4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98 \n",
"5 [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... $10.98 \n",
"6 NaN $1.69 \n",
"7 [Tomatillo Red Chili Salsa, [Fajita Vegetables... $11.75 \n",
"8 [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... $9.25 \n",
"9 [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... $9.25 "
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"#Write your code here\n",
"df_chipotle = pd.read_csv(\"Data/chipotle_orders.tsv\",\\\n",
" delimiter = \"\\t\")\n",
"\n",
"df_chipotle.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now take a look at the dtypes attribute to see how the columns are stored."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"order_id int64\n",
"quantity int64\n",
"item_name object\n",
"choice_description object\n",
"item_price object\n",
"dtype: object"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Write your code here\n",
"df_chipotle.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will write code to answer the following questions:\n",
"\n",
"1. What fraction of customers order burritos versus burrito bowls?\n",
"2. What is the breakdown of salsa choice?\n",
"3. What is the average amount of money spent on a Chipotle order?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For question 1, we will use the apply method to create a column that will read \"Burrito\" for items that are a burrito, \"Bowl\" for items that are a bowl and \"Other\" for every other item."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def Burrito_or_Bowl(row, extra_input):\n",
" item = row[\"item_name\"]\n",
" quantity = row[\"quantity\"]\n",
" if \"Bowl\" in item and quantity==1:\n",
" return \"Single Bowl\"\n",
" elif \"Bowl\" in item and quantity>1:\n",
" return \"Mult Bowl\"\n",
" elif \"Burrito\" in item:\n",
" return \"Burrito\"\n",
" else:\n",
" return extra_input\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create new column called \"Bowl_Burrito\" and compute fractions of bowl versus burrito orders. The value_counts() method could be useful here."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"drink/snack 2119\n",
"Single Bowl 1269\n",
"Burrito 1172\n",
"Mult Bowl 62\n",
"Name: Bowl_Burrito, dtype: int64"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Write you code here\n",
"df_chipotle[\"Bowl_Burrito\"] = df_chipotle.\\\n",
" apply(Burrito_or_Bowl,\\\n",
" extra_input = \"drink/snack\",\\\n",
" axis = 1)\n",
"df_chipotle.Bowl_Burrito.value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For question 2, we need to write another function to be used with apply to create a column for the salsas. Let's look at the structure of the choice_description column:"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"order_id 2\n",
"quantity 2\n",
"item_name Chicken Bowl\n",
"choice_description [Tomatillo-Red Chili Salsa (Hot), [Black Beans...\n",
"item_price $16.98 \n",
"Bowl_Burrito Mult Bowl\n",
"Name: 4, dtype: object"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_chipotle.loc[4,:]"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]'"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Look at choice_description\n",
"df_chipotle.loc[4,\"choice_description\"]"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'[Tomatillo Red Chili Salsa, [Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]'"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Another one\n",
"df_chipotle.loc[7,\"choice_description\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that this column is a string, which contains what looks like a list of lists with the salsa type as the first entry in the list. Make sure you get rid of the \"(Hot)\" in the salsa description. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, let's only look at orders without an NaN in the choice_description column."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" order_id | \n",
" quantity | \n",
" item_name | \n",
" choice_description | \n",
" item_price | \n",
" Bowl_Burrito | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" Izze | \n",
" [Clementine] | \n",
" $3.39 | \n",
" drink/snack | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" Nantucket Nectar | \n",
" [Apple] | \n",
" $3.39 | \n",
" drink/snack | \n",
"
\n",
" \n",
" 4 | \n",
" 2 | \n",
" 2 | \n",
" Chicken Bowl | \n",
" [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | \n",
" $16.98 | \n",
" Mult Bowl | \n",
"
\n",
" \n",
" 5 | \n",
" 3 | \n",
" 1 | \n",
" Chicken Bowl | \n",
" [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... | \n",
" $10.98 | \n",
" Single Bowl | \n",
"
\n",
" \n",
" 7 | \n",
" 4 | \n",
" 1 | \n",
" Steak Burrito | \n",
" [Tomatillo Red Chili Salsa, [Fajita Vegetables... | \n",
" $11.75 | \n",
" Burrito | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" order_id quantity item_name \\\n",
"1 1 1 Izze \n",
"2 1 1 Nantucket Nectar \n",
"4 2 2 Chicken Bowl \n",
"5 3 1 Chicken Bowl \n",
"7 4 1 Steak Burrito \n",
"\n",
" choice_description item_price Bowl_Burrito \n",
"1 [Clementine] $3.39 drink/snack \n",
"2 [Apple] $3.39 drink/snack \n",
"4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98 Mult Bowl \n",
"5 [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... $10.98 Single Bowl \n",
"7 [Tomatillo Red Chili Salsa, [Fajita Vegetables... $11.75 Burrito "
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Drop NaN using dropna() method\n",
"\n",
"df_chipotle.dropna(inplace = True)\n",
"df_chipotle.head()"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Write function here\n",
"def Get_Salsa(description):\n",
" \n",
" if \"Salsa\" in description.split(\",\")[0]:\n",
" parsed_des = description.strip(\"[]\").\\\n",
" replace(\"-\", \" \").\\\n",
" replace(\",\", \"\").\\\n",
" split(\" \")\n",
" salsa_name = []\n",
" for word in parsed_des:\n",
" \n",
" if word!=\"Salsa\":\n",
" salsa_name.append(word)\n",
" else:\n",
" return \" \".join(salsa_name)\n",
" return parsed_des\n",
" else:\n",
" return \"No Salsa\"\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Fresh Tomato 1365\n",
"Roasted Chili Corn 633\n",
"No Salsa 576\n",
"Tomatillo Red Chili 480\n",
"Tomatillo Green Chili 322\n",
"Name: choice_description, dtype: int64"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Add the new column\n",
"#df_chipotle[\"Salsa\"] = df_chipotle.choice_description.\\\n",
" #apply(Get_Salsa)\n",
" \n",
"df_chipotle.head()"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Fresh Tomato 1365\n",
"Roasted Chili Corn 633\n",
"No Salsa 576\n",
"Tomatillo Red Chili 480\n",
"Tomatillo Green Chili 322\n",
"Name: Salsa, dtype: int64"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Use value counts to get the breakdown\n",
"df_chipotle.Salsa.value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For question 3, use iterrows to calculate the average price per order. Note that the item price column is a string, so first use string methods plut the as.type() method to recast this column as a float."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" order_id | \n",
" quantity | \n",
" item_name | \n",
" choice_description | \n",
" item_price | \n",
" Bowl_Burrito | \n",
" Salsa | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" Izze | \n",
" [Clementine] | \n",
" 3.39 | \n",
" drink/snack | \n",
" No Salsa | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" Nantucket Nectar | \n",
" [Apple] | \n",
" 3.39 | \n",
" drink/snack | \n",
" No Salsa | \n",
"
\n",
" \n",
" 4 | \n",
" 2 | \n",
" 2 | \n",
" Chicken Bowl | \n",
" [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | \n",
" 16.98 | \n",
" Mult Bowl | \n",
" Tomatillo Red Chili | \n",
"
\n",
" \n",
" 5 | \n",
" 3 | \n",
" 1 | \n",
" Chicken Bowl | \n",
" [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... | \n",
" 10.98 | \n",
" Single Bowl | \n",
" Fresh Tomato | \n",
"
\n",
" \n",
" 7 | \n",
" 4 | \n",
" 1 | \n",
" Steak Burrito | \n",
" [Tomatillo Red Chili Salsa, [Fajita Vegetables... | \n",
" 11.75 | \n",
" Burrito | \n",
" Tomatillo Red Chili | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" order_id quantity item_name \\\n",
"1 1 1 Izze \n",
"2 1 1 Nantucket Nectar \n",
"4 2 2 Chicken Bowl \n",
"5 3 1 Chicken Bowl \n",
"7 4 1 Steak Burrito \n",
"\n",
" choice_description item_price Bowl_Burrito \\\n",
"1 [Clementine] 3.39 drink/snack \n",
"2 [Apple] 3.39 drink/snack \n",
"4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans... 16.98 Mult Bowl \n",
"5 [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... 10.98 Single Bowl \n",
"7 [Tomatillo Red Chili Salsa, [Fajita Vegetables... 11.75 Burrito \n",
"\n",
" Salsa \n",
"1 No Salsa \n",
"2 No Salsa \n",
"4 Tomatillo Red Chili \n",
"5 Fresh Tomato \n",
"7 Tomatillo Red Chili "
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Write your code here\n",
"df_chipotle[\"item_price\"] = df_chipotle.item_price\\\n",
" .str.replace(\"$\", \"\").astype(\"float\")\n",
"df_chipotle.head() "
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"16.5969558101473"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Get the average proce of each order\n",
"dict_total = {}\n",
"for index, row in df_chipotle.iterrows():\n",
" \n",
" order_id = row[\"order_id\"]\n",
" price = row[\"item_price\"]\n",
" \n",
" if order_id in dict_total:\n",
" dict_total[order_id]+=price\n",
" else:\n",
" dict_total[order_id] =price\n",
"\n",
"np.mean(list(dict_total.values()))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
},
"toc": {
"nav_menu": {
"height": "12px",
"width": "252px"
},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"toc_cell": false,
"toc_position": {},
"toc_section_display": "block",
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}