# List Comprehension

Together with for loops, list comprehensions are one of the most prominent contexts in which the iteration protocol is applied. Lets look at some code that adds 10 to each number in a list.

In [20]:
#Using for loop
L=[1,2,3,4,5]
for i in range(len(L)):
    L[i]+=10
L

[11, 12, 13, 14, 15]

Using list comprehension, we can replace the loop with a single expression that produces the desired result.  It can run much faster than manual *for* loop statements because iterations are performed at C language speed.

In [21]:
#Using list comprehension - Far more concise
L=[1,2,3,4,5]
L = [x + 10 for x in L]
L


[11, 12, 13, 14, 15]

List comprehensions are written in square brackets because they are ultimately a way to construct a new list.  They begin with an arbitrary expression that we make up, which uses a loop variable that we make up (x+10).  That is followed by what you should now recognize as the header of a *for* loop, which names the loop variable, and an iterable object (for x in L). To run the expression, Python executes an iteration across L inside the interpreter, assigning x to each item in turn, and collects the results of running the items through the expression on the left side.

## Extended List Comprehension Syntax

List comprehension can be even more advanced in practice.  As one particularly useful extension, the *for* loop can have an associated *if* clause to filter out the result items for which the test is not true 

In [9]:
#Lets add 10 but only keep even numbers
LEven = [x+10 for x in range(1,6) if (x+10)%2==0]
LEven

[12, 14]

In [10]:
#With if and else
[x**2 if x>5 else x**3 for x in range(10)]

[0, 1, 8, 27, 64, 125, 36, 49, 64, 81]

List comprehensions may contain nested loops, coded as a series of *for* clauses.  In fact, their full syntax allows for any number of *for* clauses and an optional associated *if* clause.

In [28]:
#nested for loop in list comprehension
[x+y for x in 'abc' for y in 'def' if x+y!='ad']

['ae', 'af', 'bd', 'be', 'bf', 'cd', 'ce', 'cf']

Again, one way to understand this expression is to convert it to two nested for loops and an if statement.

In [8]:
#Without list comprehension
l=[]
for x in 'abc':
    for y in 'def':
        if x+y!='ad':
            l.append(x+y)
l

['ae', 'af', 'bd', 'be', 'bf', 'cd', 'ce', 'cf']

We can also use comprehension to build dictionaries

In [32]:
#Comprehension with dictionaries
D = {x:x**2 for x in range(5)}
D

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

We can even throw in an *if* statement

## Using List Comprehension on Files

Recall that the file object has a **readlines** method that loads the file into a list of strings, where each element of the list is a line.

In [12]:
f= open("myFile.txt")
lines = f.readlines()
lines

['hello text file \n', 'goodbye text file \n']

This works but the lines all have the newlines character, which we will probably need to strip off. We can accomplish this with list comprehension.

In [9]:
#Stripping new line character
[line.strip(' \n') for line in lines]

['hello text file', 'goodbye text file']

We can actually do this even more concisely by recognizing that the *open(file)* command returns an iterable object.

In [13]:
#One line to read and strip
[line.strip(' \n') for line in open("myFile.txt")]

['hello text file', 'goodbye text file']

In [7]:
#We could also convert every letter to uppercase
[line.rstrip().upper() for line in lines]

# Python Packages

Python contains many important packages that help us to accomplish tasks such as random number generation or matrix algebra.  Anaconda comes with all of these packages installed but it is not difficult to install them yourself.  We will be focusing on the following packages:

- os: operating system interface
- Numpy: random number generation, matrix algebra
- Scipy: random number generation, optimization, machine learning
- Pandas: reading csv/xlsx files, SQL
- Operator, itertools: these are used less frequently but are useful for accomplishing specific tasks that I'll cover.

## os 

This is an operating system interface that lets you create, move, change files.  Let have a look at some of the cool things we can do. 

In [4]:
import os

#Get the current working directory
os.getcwd()


'/Users/feldman/Documents/Teaching/WashU/OSCM400/2017/Lectures/Lecture_8_Packages/Lecture'

In [14]:
#Change the current working directory

os.chdir('/Users/feldman/Documents/Teaching/WashU/OSCM400/2017/Lectures/Lecture_8_Packages')
os.getcwd()

'/Users/feldman/Documents/Teaching/WashU/OSCM400/2017/Lectures/Lecture_8_Packages'

In [9]:
#List directories and files
os.listdir()

['.DS_Store', 'Lecture', 'Practice']

In [12]:
#Easily create a file path (don't have to worry about slashes)
new_path = os.path.join(os.getcwd(),'Lecture')

#listdir method can take file path as input
os.listdir(new_path)

['.ipynb_checkpoints', 'Python Packages.ipynb']

In [16]:
# Walk through all paths, directories and files

path = os.getcwd()

for path, directory, file_name in os.walk(path):
    print("Path:", path)
    print("Directory:", directory)
    print("File:", file_name)
    
    

Path: /Users/feldman/Documents/Teaching/WashU/OSCM400/2017/Lectures/Lecture_8_Packages
Directory: ['Lecture', 'Practice']
File: ['.DS_Store']
Path: /Users/feldman/Documents/Teaching/WashU/OSCM400/2017/Lectures/Lecture_8_Packages/Lecture
Directory: ['.ipynb_checkpoints']
File: ['Python Packages.ipynb']
Path: /Users/feldman/Documents/Teaching/WashU/OSCM400/2017/Lectures/Lecture_8_Packages/Lecture/.ipynb_checkpoints
Directory: []
File: ['Python Packages-checkpoint.ipynb']
Path: /Users/feldman/Documents/Teaching/WashU/OSCM400/2017/Lectures/Lecture_8_Packages/Practice
Directory: []
File: []



## Numpy + Scipy

Pronounced "num-pie",  this is the best package for random number generation. However, before we jump into the uses of any packages lets see how you import them.

In [1]:
#First way 
import numpy

#The mean method
print(numpy.mean([1,2,3]))

#We will be using the random module inside of numpy to generate uniform [0,1] random variable
numpy.random.uniform()

2.0


0.5328561376354561

In [2]:
#Second way -  give name to numpy with as.  This is one that is generally used.
import numpy as np

np.random.uniform()

0.027436863568071845

In [3]:
# If we import everyting using * we can jump to the random line
from numpy import *

random.uniform()

0.21334445478026276

Let's move on to the tools in the numpy.random module for generating random numbers.  The ability to generate random numbers is critical in understanding randomness through simulation.  One application that we will study is Monte Carlo simulation, which will rely on our ability to generate random numbers of all types.  

In [4]:
import numpy as np

#Generate one uniform [0,1] random number
np.random.uniform()

0.5791555754105322

In [7]:
#Generate one uniform [a,b] random variable
np.random.uniform(5,10)

9.941979024761663

In [9]:
#Generate a normal random variables with mean 0 and stdev 1.
np.random.normal(0,1)

-0.14285249600758188

In [14]:
#Generate a random integer in the interval [a,b)
np.random.randint(5,10)

9

In [7]:
#Generate multiple random ints
np.random.randint(5,10,5)

array([5, 7, 9, 6, 5])

In all of the above example we can also give a third input that specifies the number of random variables we want.

In [8]:
#Generate 10 uniform [0,1] random variables
randomNums = np.random.uniform(0,1,10) 
randomNums

array([ 0.2117795 ,  0.76081062,  0.23766499,  0.83641908,  0.20315904,
        0.92398427,  0.91204644,  0.69010628,  0.69096369,  0.20860248])

The result is numpy array, which behaves in many ways just like a list.  We make a slight detour from our example tour to quickly cover numpy arrays.  In fact, if you don't feel like working with a numpy array, you can easily convert it to a list:

In [9]:
#Conver numpy array to a list
list(randomNums)

[0.21177950415175739,
 0.76081061885530299,
 0.23766499236350802,
 0.83641907938154092,
 0.20315903698219517,
 0.92398427378219739,
 0.91204644434311433,
 0.69010627553821768,
 0.69096369110227562,
 0.20860247683230626]

In [10]:
#Create numpy array
a = np.array([1,2,3,4])
a

array([1, 2, 3, 4])

In [11]:
#Get the number of entries.  Note that shape return a tuple
len(a), shape(a)

(4, (4,))

In [12]:
#Pick out one entry
a[0]

1

In [13]:
#Pick out a slice - exactly the same as a list
a[1:3]

array([2, 3])

In [14]:
#They are iterable
for i in a:
    print(i)

1
2
3
4


Back to our example tour...

In [16]:
#Using shuffle method you can shuffle a list or numpy array in place
b=[4,5,6]
np.random.shuffle(b)
b

[6, 5, 4]

In [18]:
#Sample uniformally from a list or numpy array using the chocie method.
np.random.choice(b)

4

To see all the distribution you can sample from and all the little tricks that numpy.random allows you to do, visit  [numpy.random site](https://docs.scipy.org/doc/numpy/reference/routines.random.html).  One last numpy tool I use in the linespace method which produces evenly spaced grids on intervals.

In [17]:
#np.linspace(a,b,s) evenly divides the interval [a,b] into s points
np.linspace(0,10,11)

array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.])

You can generate custom discrete distributions using the scipy.stat ([scipy.stats site](https://docs.scipy.org/doc/scipy-0.17.1/reference/stats.html)) package.  If you visit this site, you will notice that you can do all of the random number generation we just covered with this package

In [18]:
import scipy.stats as st

#lets sample a bias coin 100 times, 0 is heads (0.25 prob) and 1 is tails (0.75 prob)

#First creates the distribution using rv_discrete
distrib = st.rv_discrete(values=([0,1], [0.25, 0.75]))

flips = distrib.rvs(size=100)
flips

array([1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0,
       1, 1, 0, 1, 1, 1, 1, 0])

In [19]:
#How many tails were there? Foreshadowing...
sum(flips)

67

## Other Useful Packages

We can use the packagage operator.itemgetter to sort by the second entry, for example, in a list of lists.

In [20]:
import operator as op

inventories = [['apples',5], ['oranges',6], ['bananas',3]]

#itemgetter returns a callable object that fetches item from its operand

#the call getcount(r) return r[1]
getCount = op.itemgetter(1)

getCount(inventories[1])

6

In [21]:
#We can use the key input in the sorted function to sort by the second index
sorted(inventories, key =getCount)

[['bananas', 3], ['apples', 5], ['oranges', 6]]

We can use the package itertools to look at all of the permuations of given size of a list.  This might be useful in a discrete optimization problem to check all possible solutions or if you want to compute some sort of probability. 

In [22]:
import itertools as it

l = [1,2,3]
#Produces iterable 
sizeTwo = it.permutations(l,2)

sizeTwo

<itertools.permutations at 0x113fc94c0>

In [23]:
#Wrap in list function to get list of tuples
list(sizeTwo)

[(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]

In [1]:
#There is also a combinations method
list(it.combinations(l,2))

NameError: name 'it' is not defined