As Data Scientists, we want our work to be reproducible, meaning that when we share our analysis, everyone should be able to re-run it and come up with the same results. This is not always easy, since we are dealing with different operating systems (iOS, Windows, Linux) and different programming language versions and packages. That is why we encourage you to work with virtual environments like conda environments. Another more robust solution from conda environments is to work with Dockers.
Scenario: We have run an analysis using Python Jupyter Notebooks on our own data, and we want to share this analysis with the Predictive Hacks community ensuring that everyone will be able to reproduce the results.
Run the Analysis Locally
For simplicity, let’s assume that I have run the following analysis:

In essence, I try to run a sentiment analysis on my_data.csv
using the pandas
, numpy
and vaderSentiment
libraries. Thus, I want to share this Jupyter notebook and to be plug and play. Let’s see how I can create a docker image containing a Jupyter Notebook, as well as my data and the required libraries.
Jupyter Docker Stacks
Jupyter Docker Stacks are a set of ready-to-run Docker images containing Jupyter applications and interactive computing tools. You can use a stack image to do any of the following (and more):
- Start a personal Jupyter Notebook server in a local Docker container
- Run JupyterLab servers for a team using JupyterHub
- Write your own project Dockerfile
We will build our custom image based on jupyter/scipy
Create the requirements.txt File
The Jupyter Docker core images contain the most common libraries, but it is possible to need to install some extra. Like in our case that we wanted to install the vaderSentiment==3.3.2
library. This means that we have to create the requirements.txt
file.
The requirements.txt
file is:
vaderSentiment==3.3.2
Create the Dockerfile
Now we need to create the Dockerfile
as follows:
FROM jupyter/scipy-notebook COPY requirements.txt ./requirements.txt COPY my_data.csv ./my_data.csv COPY my_jupyter.ipynb ./my_jupyter.ipynb RUN pip install -r requirements.txt
So, we start our image with the jupyter/scipy-notebook
then we copy the required files from our local computer to the image. Note that we could have used paths and directories. Finally, we install the required libraries in the requirements.txt
file.
Build the Dockerfile
Since we have created the Dockerfile, we are ready to build it. The command is the following. Note that you can give any name. I chose to call it mysharednotebook
. Tip: Do not forget the period!
$ docker build -t mysharednotebook .
If you want to make sure that your image has been created, you can type:
$ docker images
to get the docker images.
Run the Image
If we want to make sure that the image is running as expected we run:
$ docker run -it -p 8888:8888 mysharednotebook
And we will get a link for our jupyter notebook!

If you want to see which containers are running you can type:
$ docker ps -a
Push your Image to Docker Hub
Once you make sure that the image works as expected, you can push it to Docker Hub so that everyone will be able to pull it. The first thing that you need to do, is to tag your image.
$ docker tag 9811503b3d3a gpipis/mysharednotebook:first
The 9811503b3d3a
is that Image ID obtained it from the command docker images
. The gpipis
is my username and the mysharednotebook
is the image name that I have created above. Finally, the :first
is an optional tag.
Now we are ready to push the image by typing:
$ docker push gpipis/mysharednotebook:first

Pull the Image from Docker Hub
The work above is done by the person who wants to share his/her work. Now, let’s see how we can get this image and work on the reproducible Jupyter Notebook.
What we have to do is to pull
the image by typing:
$ docker pull gpipis/mysharednotebook:first
And now we are ready to run it by typing:
$ docker run -it -p 8888:8888 gpipis/mysharednotebook:first

If we copy-paste the URL to our browser we get:

Notice that you can change the port. For example, if you want to run on 8889
then you type:
$ docker run -it -p 8888:8888 gpipis/mysharednotebook:first
and you have to change also to port to your URL:
http://127.0.0.1:8889/?token=7e767d9a8dbb92e9d93ce7a5f52ba3c524a3cfcc65401714
The Takeaway
When you want to share your work with many people and you want them to be able to reproduce your analysis, then the best approach is to work with Dockers.