{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Practice with Merges" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Run the cells below to create the following dataframes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "df_purchases contains information on various customer purchases." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Item_PurchasedName
0BallJake
1ShirtJoe
2ShoesEllen
3BallJoe
4ShirtEllen
\n", "
" ], "text/plain": [ " Item_Purchased Name\n", "0 Ball Jake\n", "1 Shirt Joe\n", "2 Shoes Ellen\n", "3 Ball Joe\n", "4 Shirt Ellen" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "from pandasql import sqldf\n", "\n", "names = [\"Jake\", \"Joe\", \"Ellen\", \"Joe\", \"Ellen\"]\n", "item = [\"Ball\", \"Shirt\", \"Shoes\", \"Ball\", \"Shirt\"]\n", "\n", "df_purchases = pd.DataFrame({\"Name\": names, \"Item_Purchased\": item} )\n", " \n", "df_purchases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "df_items contains information on the prices of each item." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ItemPrice
0Ball10
1Shirt15
2Shoes75
\n", "
" ], "text/plain": [ " Item Price\n", "0 Ball 10\n", "1 Shirt 15\n", "2 Shoes 75" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item = [\"Ball\", \"Shirt\", \"Shoes\"]\n", "price = [10,15,75]\n", "\n", "df_items = pd.DataFrame({\"Item\": item, \"Price\":price})\n", "\n", "df_items" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "df_customer contained information on the customers " ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AllowanceCustomers
015Jake
112Joe
285Ellen
\n", "
" ], "text/plain": [ " Allowance Customers\n", "0 15 Jake\n", "1 12 Joe\n", "2 85 Ellen" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "customers = [\"Jake\", \"Joe\", \"Ellen\"]\n", "allowance = [15, 12, 85]\n", "\n", "df_customers = pd.DataFrame({\"Allowance\":allowance, \"Customers\":customers})\n", "\n", "df_customers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question:** How many customers went over their allowance?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create df_1, which is df_purchases with an additional column that gives the price of each item" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create df_2, which stores the total amount that each person spent." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Created df_3, which adds each person't allowance to df_2" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Add a column to df_3 which tells whether or not they went over their allowance or not." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute how many people went over." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Running Back Analysis Tool\n", "\n", "The goal in this practice is to analyze the number of rushing yards gained by an inputted running back over the various weeks that the data contains. We also would like to incorporate whether the given running back went up against good defenses. To do so, we will compute the average rushing yards given up by each of the given running back's opponents throughout the weeks.\n" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Place your imports here\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Read in the data from the following 4 csv files: nflgames.csv, nflteams.csv, nflplayers.csv, rushingstats.csv. Filter the rushing stats so it only corresponds to running backs (PositionID = 2)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RushingStatsIDPlayerIDPositionIDTeamIDGameIDAttYdsLngTDRunShortRunFiveRunFifteenRunThirtyPlusRZInsideTwentyRZInsideTenRZInsideFiveRZTDRZTotal
101172111170131570010011
11127212856410610100202
121372131163220542020002
131472141466211940030013
14157215642291411010011
\n", "
" ], "text/plain": [ " RushingStatsID PlayerID PositionID TeamID GameID Att Yds Lng TD \\\n", "10 11 7 2 1 1 11 70 13 1 \n", "11 12 7 2 1 2 8 56 41 0 \n", "12 13 7 2 1 3 11 63 22 0 \n", "13 14 7 2 1 4 14 66 21 1 \n", "14 15 7 2 1 5 6 42 29 1 \n", "\n", " RunShort RunFive RunFifteen RunThirtyPlus RZInsideTwenty RZInsideTen \\\n", "10 5 7 0 0 1 0 \n", "11 6 1 0 1 0 0 \n", "12 5 4 2 0 2 0 \n", "13 9 4 0 0 3 0 \n", "14 4 1 1 0 1 0 \n", "\n", " RZInsideFive RZTD RZTotal \n", "10 0 1 1 \n", "11 2 0 2 \n", "12 0 0 2 \n", "13 0 1 3 \n", "14 0 1 1 " ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#read in the data below\n", "games = pd.read_csv(\"../Data/Football_Data/nflgames.csv\")\n", "teams = pd.read_csv(\"../Data/Football_Data/nflteams.csv\")\n", "nflPlayers = pd.read_csv(\"../Data/Football_Data/nflplayers.csv\")\n", "rushingStats=pd.read_csv(\"../Data/Football_Data/rushingstats.csv\")\n", "\n", "rushingStats = rushingStats.loc[rushingStats[\"PositionID\"] == 2, :]\n", "rushingStats.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Enter the name of the running back you would like to analyze and then find the player's player ID and Team ID." ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Write your code here\n", "name = \"DeMarco Murray\"\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next select the rushing stats corresponding to this player and merge this df with the games df to find the week. In this merged df select the following columns: TeamID, GameID, Yds, Week, HomeTeamID, AwayTeamID. Sort the resulting df by Week." ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Write your code here\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a line chart showing the rushing yards per game for this player." ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will continue to add to the plot above, but first we need to do some more calculations. We would like to find the avg. rushing yards given up by each of this player's opponents. Add a column to the df you just plotted that stores this player's opponent each week." ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Write your code here\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Merge the df above with games to get all of the games which involve one of the player's opponents. First merge on the HomeTeamID and then on the AwayTeamID and then concatenate." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Select out only GameID_x, Week_x and Opponent_ID and rename the columns to get rid of _x." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now merge this with rushing states to get the rushing yards gained by opponent running backs in each of these games." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we group by Opponent_ID." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, merge this with rush_stats_w_week." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And add this average as red x's to the plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" }, "toc": { "nav_menu": { "height": "48px", "width": "252px" }, "number_sections": true, "sideBar": true, "skip_h1_title": false, "toc_cell": false, "toc_position": {}, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }