{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "In this practice exercise, we will analyze Chipotle order data. The file \"Chipotle.tsv\" contains this information in a tab separated file. First, read in the data. You might find the delimiter input to the pandas read_csv method useful. Use the head method to check that everything has been read in correctly. " ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
order_idquantityitem_namechoice_descriptionitem_price
011Chips and Fresh Tomato SalsaNaN$2.39
111Izze[Clementine]$3.39
211Nantucket Nectar[Apple]$3.39
311Chips and Tomatillo-Green Chili SalsaNaN$2.39
422Chicken Bowl[Tomatillo-Red Chili Salsa (Hot), [Black Beans...$16.98
531Chicken Bowl[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...$10.98
631Side of ChipsNaN$1.69
741Steak Burrito[Tomatillo Red Chili Salsa, [Fajita Vegetables...$11.75
841Steak Soft Tacos[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...$9.25
951Steak Burrito[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...$9.25
\n", "
" ], "text/plain": [ " order_id quantity item_name \\\n", "0 1 1 Chips and Fresh Tomato Salsa \n", "1 1 1 Izze \n", "2 1 1 Nantucket Nectar \n", "3 1 1 Chips and Tomatillo-Green Chili Salsa \n", "4 2 2 Chicken Bowl \n", "5 3 1 Chicken Bowl \n", "6 3 1 Side of Chips \n", "7 4 1 Steak Burrito \n", "8 4 1 Steak Soft Tacos \n", "9 5 1 Steak Burrito \n", "\n", " choice_description item_price \n", "0 NaN $2.39 \n", "1 [Clementine] $3.39 \n", "2 [Apple] $3.39 \n", "3 NaN $2.39 \n", "4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98 \n", "5 [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... $10.98 \n", "6 NaN $1.69 \n", "7 [Tomatillo Red Chili Salsa, [Fajita Vegetables... $11.75 \n", "8 [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... $9.25 \n", "9 [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... $9.25 " ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "#Write your code here\n", "df_chipotle = pd.read_csv(\"Data/chipotle_orders.tsv\",\\\n", " delimiter = \"\\t\")\n", "\n", "df_chipotle.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now take a look at the dtypes attribute to see how the columns are stored." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "order_id int64\n", "quantity int64\n", "item_name object\n", "choice_description object\n", "item_price object\n", "dtype: object" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Write your code here\n", "df_chipotle.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will write code to answer the following questions:\n", "\n", "1. What fraction of customers order burritos versus burrito bowls?\n", "2. What is the breakdown of salsa choice?\n", "3. What is the average amount of money spent on a Chipotle order?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For question 1, we will use the apply method to create a column that will read \"Burrito\" for items that are a burrito, \"Bowl\" for items that are a bowl and \"Other\" for every other item." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def Burrito_or_Bowl(row, extra_input):\n", " item = row[\"item_name\"]\n", " quantity = row[\"quantity\"]\n", " if \"Bowl\" in item and quantity==1:\n", " return \"Single Bowl\"\n", " elif \"Bowl\" in item and quantity>1:\n", " return \"Mult Bowl\"\n", " elif \"Burrito\" in item:\n", " return \"Burrito\"\n", " else:\n", " return extra_input\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create new column called \"Bowl_Burrito\" and compute fractions of bowl versus burrito orders. The value_counts() method could be useful here." ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "drink/snack 2119\n", "Single Bowl 1269\n", "Burrito 1172\n", "Mult Bowl 62\n", "Name: Bowl_Burrito, dtype: int64" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Write you code here\n", "df_chipotle[\"Bowl_Burrito\"] = df_chipotle.\\\n", " apply(Burrito_or_Bowl,\\\n", " extra_input = \"drink/snack\",\\\n", " axis = 1)\n", "df_chipotle.Bowl_Burrito.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For question 2, we need to write another function to be used with apply to create a column for the salsas. Let's look at the structure of the choice_description column:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "order_id 2\n", "quantity 2\n", "item_name Chicken Bowl\n", "choice_description [Tomatillo-Red Chili Salsa (Hot), [Black Beans...\n", "item_price $16.98 \n", "Bowl_Burrito Mult Bowl\n", "Name: 4, dtype: object" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_chipotle.loc[4,:]" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]'" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Look at choice_description\n", "df_chipotle.loc[4,\"choice_description\"]" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'[Tomatillo Red Chili Salsa, [Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]'" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Another one\n", "df_chipotle.loc[7,\"choice_description\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that this column is a string, which contains what looks like a list of lists with the salsa type as the first entry in the list. Make sure you get rid of the \"(Hot)\" in the salsa description. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's only look at orders without an NaN in the choice_description column." ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
order_idquantityitem_namechoice_descriptionitem_priceBowl_Burrito
111Izze[Clementine]$3.39drink/snack
211Nantucket Nectar[Apple]$3.39drink/snack
422Chicken Bowl[Tomatillo-Red Chili Salsa (Hot), [Black Beans...$16.98Mult Bowl
531Chicken Bowl[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...$10.98Single Bowl
741Steak Burrito[Tomatillo Red Chili Salsa, [Fajita Vegetables...$11.75Burrito
\n", "
" ], "text/plain": [ " order_id quantity item_name \\\n", "1 1 1 Izze \n", "2 1 1 Nantucket Nectar \n", "4 2 2 Chicken Bowl \n", "5 3 1 Chicken Bowl \n", "7 4 1 Steak Burrito \n", "\n", " choice_description item_price Bowl_Burrito \n", "1 [Clementine] $3.39 drink/snack \n", "2 [Apple] $3.39 drink/snack \n", "4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98 Mult Bowl \n", "5 [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... $10.98 Single Bowl \n", "7 [Tomatillo Red Chili Salsa, [Fajita Vegetables... $11.75 Burrito " ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Drop NaN using dropna() method\n", "\n", "df_chipotle.dropna(inplace = True)\n", "df_chipotle.head()" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Write function here\n", "def Get_Salsa(description):\n", " \n", " if \"Salsa\" in description.split(\",\")[0]:\n", " parsed_des = description.strip(\"[]\").\\\n", " replace(\"-\", \" \").\\\n", " replace(\",\", \"\").\\\n", " split(\" \")\n", " salsa_name = []\n", " for word in parsed_des:\n", " \n", " if word!=\"Salsa\":\n", " salsa_name.append(word)\n", " else:\n", " return \" \".join(salsa_name)\n", " return parsed_des\n", " else:\n", " return \"No Salsa\"\n", " \n" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Fresh Tomato 1365\n", "Roasted Chili Corn 633\n", "No Salsa 576\n", "Tomatillo Red Chili 480\n", "Tomatillo Green Chili 322\n", "Name: choice_description, dtype: int64" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Add the new column\n", "#df_chipotle[\"Salsa\"] = df_chipotle.choice_description.\\\n", " #apply(Get_Salsa)\n", " \n", "df_chipotle.head()" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Fresh Tomato 1365\n", "Roasted Chili Corn 633\n", "No Salsa 576\n", "Tomatillo Red Chili 480\n", "Tomatillo Green Chili 322\n", "Name: Salsa, dtype: int64" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Use value counts to get the breakdown\n", "df_chipotle.Salsa.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For question 3, use iterrows to calculate the average price per order. Note that the item price column is a string, so first use string methods plut the as.type() method to recast this column as a float." ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
order_idquantityitem_namechoice_descriptionitem_priceBowl_BurritoSalsa
111Izze[Clementine]3.39drink/snackNo Salsa
211Nantucket Nectar[Apple]3.39drink/snackNo Salsa
422Chicken Bowl[Tomatillo-Red Chili Salsa (Hot), [Black Beans...16.98Mult BowlTomatillo Red Chili
531Chicken Bowl[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...10.98Single BowlFresh Tomato
741Steak Burrito[Tomatillo Red Chili Salsa, [Fajita Vegetables...11.75BurritoTomatillo Red Chili
\n", "
" ], "text/plain": [ " order_id quantity item_name \\\n", "1 1 1 Izze \n", "2 1 1 Nantucket Nectar \n", "4 2 2 Chicken Bowl \n", "5 3 1 Chicken Bowl \n", "7 4 1 Steak Burrito \n", "\n", " choice_description item_price Bowl_Burrito \\\n", "1 [Clementine] 3.39 drink/snack \n", "2 [Apple] 3.39 drink/snack \n", "4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans... 16.98 Mult Bowl \n", "5 [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... 10.98 Single Bowl \n", "7 [Tomatillo Red Chili Salsa, [Fajita Vegetables... 11.75 Burrito \n", "\n", " Salsa \n", "1 No Salsa \n", "2 No Salsa \n", "4 Tomatillo Red Chili \n", "5 Fresh Tomato \n", "7 Tomatillo Red Chili " ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Write your code here\n", "df_chipotle[\"item_price\"] = df_chipotle.item_price\\\n", " .str.replace(\"$\", \"\").astype(\"float\")\n", "df_chipotle.head() " ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "16.5969558101473" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Get the average proce of each order\n", "dict_total = {}\n", "for index, row in df_chipotle.iterrows():\n", " \n", " order_id = row[\"order_id\"]\n", " price = row[\"item_price\"]\n", " \n", " if order_id in dict_total:\n", " dict_total[order_id]+=price\n", " else:\n", " dict_total[order_id] =price\n", "\n", "np.mean(list(dict_total.values()))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" }, "toc": { "nav_menu": { "height": "12px", "width": "252px" }, "number_sections": true, "sideBar": true, "skip_h1_title": false, "toc_cell": false, "toc_position": {}, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }