Adding a job to an Azure Batch AI cluster can be a time consuming task to perform prone to errors due to the many fields that need to be configured. Since all jobs end up in the same resource group it can be difficult to tell which is which. Automatically generated names can help solve this problem.
Adding a Batch AI job can be done in the portal UI, but the experience is pretty bad, there are a lot of fields to fill in, and many of them will require some effort to figure out. Here is an example of part of the fields required:
The easiest way to avoid this is to script this using the Powershell CLI. Start by creating the following JSON file:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"properties": { | |
"nodeCount": 1, | |
"tensorFlowSettings": { | |
"pythonScriptFilePath": "$AZ_BATCHAI_MOUNT_ROOT/afs/CatOrDog/catordog.py", | |
"masterCommandLineArgs": "-p" | |
}, | |
"stdOutErrPathPrefix": "$AZ_BATCHAI_MOUNT_ROOT/afs/CatOrDog", | |
"inputDirectories": [ | |
{ | |
"id": "SCRIPT", | |
"path": "$AZ_BATCHAI_MOUNT_ROOT/afs/CatOrDog" | |
} | |
], | |
"outputDirectories": [ | |
{ | |
"id": "DEFAULT", | |
"pathPrefix": "$AZ_BATCHAI_MOUNT_ROOT/afs/CatOrDog/out" | |
} | |
] | |
} | |
} |
In this example file we are constraining the job to only use 1 node. This example is my CatOrDog example from a previous article and I have placed all of the content in my azure file share in the folder “CatOrDog”, this is specified in the inputDirectories section. We also specify the python file that is executed when the job is run in pythonScriptFilePath. stdOutErrPathPrefix and outputDirectories are used to specify where the output ends up. This represents most of the info we would have to enter in the portal, and since it never changes we can save the file and resuse it over and over again. Configure this as needed to match your folder and file name structure.
Next create the file batchit.ps1:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
az account set –subscription <MYSUBSCRIPTIONID> | |
az batchai job create –config job.json –name CatOrDog-$(Get-Date -UFormat "%Y_%m_%d-%H_%M_%S") –cluster-name <MYCLUSTERNAME> –resource-group <MYCLUSTERRESOURCEGROUP> –location <MYCLUSTERLOCATION> |
Replace <MYSUBSCRIPTIONID> with the guid of your subscription, which can be found via the portal, or querying at the CLI. Replace <MYCLUSERNAME>, <MYCLUSTERRESOURCEGROUP>, and <MYCLUSTERLOCATION> with the information related to the batch AI cluster that this job will execute on. This continues the CatOrDog example, and uses that to generate the name of the job, substitute something appropriate for your cluster.
Next execute your script from powershell. After a few runs you will see that your list of jobs looks like this:
Notice how each job is timestamped making it east to understand the order they were created in to help find the desired execution. It is still worth cleaning out your jobs occasionally, but this helps a lot.