Create Project and Download the Data
- Create a folder called Sales-Win-Loss, this will be your project folder.
- Download WA_Fn-UseC_-Sales-Win-Loss.csv (Original Data Source: IBM)
- Download the starter file for the project (Don’t worry no hints are given away!)
- Place both of these full files into the project folder
- Open ann.py in Spyder
- Using the python os library, create two variables, one called basepath and one called outpath, these variables are used to find input files and place output files. Basepath should be the current folder and outpath should be a folder in basepath called out.
Import the OS python library with:
Set basepath to the current folder:
basepath = '.'
Set outpath to the folder out in the current folder:
outpath = os.path.join (basepath, "out")
Create the out folder if it doesn’t exist:
if not os.path.exists(outpath): os.makedirs(outpath)
import os # find the right path for batch ai vs local basepath = '.' outpath = os.path.join (basepath, "out") if not os.path.exists(outpath): os.makedirs(outpath)
- Load the data file we downloaded earlier into a variable named dataset using the pandas library.
Import the pandas python library with:
import pandas as pd
The parameter to pass to read_csv is:
os.path.join (basepath, 'WA_Fn-UseC_-Sales-Win-Loss.csv')
import pandas as pd # Importing the dataset dataset = pd.read_csv(os.path.join (basepath, 'WA_Fn-UseC_-Sales-Win-Loss.csv'))
- Initialize numpy with a random seed of 7, this allows each run’s randomness to be the same, allowing for consistency between runs.
Import the numpy python library with:
import numpy as np
import numpy as np # fix random seed for reproducibility seed = 7 np.random.seed(seed)
- Make sure that the current folder in file explorer in spyder is Sales-Win-Loss
- Highlight all of your code and press CTRL-Enter (Run all the highlighted code)
- Wait for execution to complete in the IPython Console in the lower right pane.
- In the upper right pane select the tab “variable explorer”
- Examine the value of dataset by double-clicking. It will take a while to display, you should get a view that looks like this:
- Look through the data, and get an understanding of the types of data in each column.