Create Project and Download the Data
- Create a folder called Sales-Win-Loss, this will be your project folder.
- Download WA_Fn-UseC_-Sales-Win-Loss.csv (Original Data Source: IBM)
- Download the starter file for the project (Don’t worry no hints are given away!)
- Place both of these full files into the project folder
- Open Sales.py in Spyder
- Using the python os library, create two variables, one called basepath and one called outpath, these variables are used to find input files and output files. Basepath should be the current folder and outpath should be a folder in basepath called out.
Import the OS python library with:
Set basepath to the current folder:
basepath = '.'
Set outpath to the folder out in the current folder:
outpath = os.path.join (basepath, "out")
Create the out folder if it doesn’t exist:
if not os.path.exists(outpath): os.makedirs(outpath)
import os # find the right path for batch ai vs local basepath = '.' outpath = os.path.join (basepath, "out") if not os.path.exists(outpath): os.makedirs(outpath)
- Load the data file we downloaded earlier into a variable named dataset using the pandas library
Import the pandas python library with:
import pandas as pd
The parameter to pass to read_csv is:
os.path.join (basepath, 'WA_Fn-UseC_-Sales-Win-Loss.csv')
import pandas as pd # Importing the dataset dataset = pd.read_csv(os.path.join (basepath, 'WA_Fn-UseC_-Sales-Win-Loss.csv'))
- Intialize numpy with a random seed of 7, this allows each run’s randomness to be the same, allowing for consistency between runs.
Import the numpy python library with:
import numpy as np
import numpy as np # fix random seed for reproducibility seed = 7 np.random.seed(seed)
- Highlight all of your code and press CTRL-Enter (Run all the highlighted code)
- Wait for execution to complete in the IPython Console in the lower right pane.
- In the upper right pane select the tab “variable explorer”
- Examine the value of dataset by double clicking, it will take a while to display but you should get a view that looks like this:
- Look through the data, and get an understanding of the types of data in each column.