In this tutorial, we will show you how to integrate SageMaker with GitHub. There are some steps that need to be done. We will start with GitHub and the personal access tokens.
Generate a GitHub Personal Access Token
We got to the GitHub account and we click on the Settings.
Then we click on the Developer Settings, then on the Personal access tokens and then on the Generate new token.
Then, on the Note field, we give an explanatory name, we chose the sagemaker, for the expiration date, we chose No expiration, although it is not recommended, then we click on the repo and finally we click on the Generate token.
Once the token is generated, we must copy it and save it.
Add a Git Repo to SageMaker
We enter the AWS Console and we go to SageMaker service. Then we go to Notebook and then to Git repositories and we click on the Add repository button.
Then we land on the following page:
We give a name for the AMZ SageMaker Repo, in our case we named it GitHubExample. In the Git Repository URL, we paste the GitHub repo that we want to integrate, in our case was the https://github.com/pipinho13/sagemaker-github-integration.git. Finally, we choose the Create secret, and we enter the name of the secret that we created in the previous step, i.e. sagemaker, the Username is our GitHub email and the Password is the personal secret that we saved earlier during the first step. If everything is OK you should get this message:
We click on the Add repository and the repository appears in Git repositories.
Integrate Git Repo with SageMaker
At this point, we can create a notebook instance that will be linked with our Git Repo. We go to Notebooks -> Notebook instances -> Create notebook instance.
We give an arbitrary name to our Notebook instance, in our case PredictiveHacks and under the Git repositories section, we choose the repository that we just created, the GitHubExample.
Once the Notebook instance is created, we can click on the Open Jupyter Lab.
As we can see, the GitHub repo is integrated and the README.md file appears in our notebook.
Let’s create a new Jupyter Notebook that we will push to the Git Repo. We have created a simple notebook called MyExample.ipynb.
Push to GitHub
Under the Git section on the left, we hover on the file and we click on the plus button.
Finally, it asks us to add the name and the email and we are done. Then we can go to the History section, to see the changes.
If we want to push these changes to the GitHub repo, we click on the circled button that we show below.
Finally, if we go to the GitHub repo we should see the Jupyter notebook.
Close the Notebook Instance
Do not forget to close the Notebook instance since you get charged as long as it is running.
The Key Takeaway
Whenever we work on Data Science projects, it is important to keep track of the changes by using version control. With SageMaker we are able to integrate GitHub and GitLab with our projects.