Create Project and Download the Data
Download the Data
- Create a folder called Sales-Win-Loss, this will be your project folder.
- Download WA_Fn-UseC_-Sales-Win-Loss.csv (Original Data Source: IBM)
- Download the starter file for the project (Don’t worry no hints are given away!)
- Place both of these full files into the project folder
- Open ann.py in Spyder
[collapse]
Folders
- Using the python os library, create two variables, one called basepath and one called outpath, these variables are used to find input files and place output files. Basepath should be the current folder and outpath should be a folder in basepath called out.
Reference Material
https://www.w3schools.com/python/python_modules.asp
https://docs.python.org/3.6/library/os.path.html
https://docs.python.org/3.6/tutorial/controlflow.html
Hint 1
Import the OS python library with:
import os
[collapse]
Hint 2
Set basepath to the current folder:
basepath = '.'
[collapse]
Hint 3
Set outpath to the folder out in the current folder:
outpath = os.path.join (basepath, "out")
[collapse]
Hint 4
Create the out folder if it doesn’t exist:
if not os.path.exists(outpath): os.makedirs(outpath)
[collapse]
Full Solution
import os # find the right path for batch ai vs local basepath = '.' outpath = os.path.join (basepath, "out") if not os.path.exists(outpath): os.makedirs(outpath)
[collapse]
[collapse]
Hello Pandas!
- Load the data file we downloaded earlier into a variable named dataset using the pandas library.
Reference Material
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
Hint 1
Import the pandas python library with:
import pandas as pd
[collapse]
Hint 2
The parameter to pass to read_csv is:
os.path.join (basepath, 'WA_Fn-UseC_-Sales-Win-Loss.csv')
[collapse]
Full Solution
import pandas as pd # Importing the dataset dataset = pd.read_csv(os.path.join (basepath, 'WA_Fn-UseC_-Sales-Win-Loss.csv'))
[collapse]
[collapse]
Hello Numpy!
- Initialize numpy with a random seed of 7, this allows each run’s randomness to be the same, allowing for consistency between runs.
Reference Material
https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html
Hint 1
Import the numpy python library with:
import numpy as np
[collapse]
Full Solution
import numpy as np # fix random seed for reproducibility seed = 7 np.random.seed(seed)
[collapse]
[collapse]
Examine the DataFrame
- Make sure that the current folder in file explorer in spyder is Sales-Win-Loss
- Highlight all of your code and press CTRL-Enter (Run all the highlighted code)
- Wait for execution to complete in the IPython Console in the lower right pane.
- In the upper right pane select the tab “variable explorer”
- Examine the value of dataset by double-clicking. It will take a while to display, you should get a view that looks like this:
- Look through the data, and get an understanding of the types of data in each column.
[collapse]