{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "In this practice exercise, we will analyze Chipotle order data. The file \"Chipotle.tsv\" contains this information in a tab separated file. First, read in the data. You might find the delimiter input to the pandas read_csv method useful. Use the head method to check that everything has been read in correctly. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
order_idquantityitem_namechoice_descriptionitem_price
011Chips and Fresh Tomato SalsaNaN$2.39
111Izze[Clementine]$3.39
211Nantucket Nectar[Apple]$3.39
311Chips and Tomatillo-Green Chili SalsaNaN$2.39
422Chicken Bowl[Tomatillo-Red Chili Salsa (Hot), [Black Beans...$16.98
531Chicken Bowl[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...$10.98
631Side of ChipsNaN$1.69
741Steak Burrito[Tomatillo Red Chili Salsa, [Fajita Vegetables...$11.75
841Steak Soft Tacos[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...$9.25
951Steak Burrito[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...$9.25
\n", "
" ], "text/plain": [ " order_id quantity item_name \\\n", "0 1 1 Chips and Fresh Tomato Salsa \n", "1 1 1 Izze \n", "2 1 1 Nantucket Nectar \n", "3 1 1 Chips and Tomatillo-Green Chili Salsa \n", "4 2 2 Chicken Bowl \n", "5 3 1 Chicken Bowl \n", "6 3 1 Side of Chips \n", "7 4 1 Steak Burrito \n", "8 4 1 Steak Soft Tacos \n", "9 5 1 Steak Burrito \n", "\n", " choice_description item_price \n", "0 NaN $2.39 \n", "1 [Clementine] $3.39 \n", "2 [Apple] $3.39 \n", "3 NaN $2.39 \n", "4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98 \n", "5 [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... $10.98 \n", "6 NaN $1.69 \n", "7 [Tomatillo Red Chili Salsa, [Fajita Vegetables... $11.75 \n", "8 [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... $9.25 \n", "9 [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... $9.25 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "#Write your code here\n", "df_chipotle = pd.read_csv(\"Data/chipotle_orders.tsv\",\\\n", " delimiter = \"\\t\")\n", "\n", "df_chipotle.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now take a look at the dtypes attribute to see how the columns are stored." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "order_id int64\n", "quantity int64\n", "item_name object\n", "choice_description object\n", "item_price object\n", "dtype: object" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Write your code here\n", "df_chipotle.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will write code to answer the following questions:\n", "\n", "1. What fraction of customers order burritos versus burrito bowls?\n", "2. What is the breakdown of salsa choice?\n", "3. What is the average amount of money spent on a Chipotle order?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For question 1, we will use the apply method to create a column that will read \"Burrito\" for items that are a burrito, \"Bowl\" for items that are a bowl and \"Other\" for every other item." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def Burrito_or_Bowl(item):\n", " \n", " return None" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create new column called \"Bowl_Burrito\" and compute fractions of bowl versus burrito orders. The value_counts() method could be useful here." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Write you code here\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For question 2, we need to write another function to be used with apply to create a column for the salsas. Let's look at the structure of the choice_description column:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]'" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Look at choice_description\n", "df_chipotle.loc[4,\"choice_description\"]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'[Tomatillo Red Chili Salsa, [Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]'" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Another one\n", "df_chipotle.loc[7,\"choice_description\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that this column is a string, which contains what looks like a list of lists with the salsa type as the first entry in the list. Make sure you get rid of the \"(Hot)\" in the salsa description. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's only look at orders without an NaN in the choice_description column." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Drop NaN using dropna() method\n", "\n" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Write function here\n", "def Get_Salsa(description):\n", " \n", " return None\n", " \n", " \n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Add the new column\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Use value counts to get the breakdown\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For question 3, use iterrows to calculate the average price per order. Note that the item price column is a string, so first use string methods plut the as.type() method to recast this column as a float." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Write your code here\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Get the average proce of each order\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" }, "toc": { "colors": { "hover_highlight": "#DAA520", "navigate_num": "#000000", "navigate_text": "#333333", "running_highlight": "#FF0000", "selected_highlight": "#FFD700", "sidebar_border": "#EEEEEE", "wrapper_background": "#FFFFFF" }, "moveMenuLeft": true, "nav_menu": { "height": "12px", "width": "252px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "skip_h1_title": false, "threshold": 4, "toc_cell": false, "toc_position": {}, "toc_section_display": "block", "toc_window_display": false, "widenNotebook": false } }, "nbformat": 4, "nbformat_minor": 2 }