Here’s what I’m assuming: You have a machine running a *nix command line. You know how to launch your preferred terminal. You have git installed. You have Docker installed and can run docker commands from your preferred terminal.

Here’s what I’m promising: The bare minimum needed to get up and running with Docker and Jupyter Lab on your local machine.

Note for Mac users: My understanding is that—for Mac users—installing the desktop app is the “official” way to use Docker. I’ve not had any trouble with this and actually prefer to use the GUI in some cases, though you never have to interact with it if you prefer not to. Further, on MacOS, the Docker desktop app needs to be running for the command line interface commands to work.

Three steps to Jupyter

  1. Clone the public repository I’ve created for this post:
    git clone [email protected]:alexmill/dockerized_jupyter.git
    
  2. “Build” the container for this project. This downloads all the pre-requisite software to run a container. Launch your preferred terminal (either Mac’s default Terminal app or something like iTerm2 running zsh). Navigate to this directory on your local machine:
    cd ./jupyter_lab_demo
    

    Assuming Docker has been installed properly, you can run the following command that compiles a local docker image that we will run in the next step.

    docker build -t jupyter_lab_docker .
    
  3. Run a Jupyter Lab Server
    • List your built images
      docker image ls
      
    • Take note of the hash associated with the image you created tagged jupyter_lab_demo; replace that $IMAGE_ID in the run command below. Alternatively, you could run the following command (without replacing anything), which will create a local variable named $IMAGE_ID:
       IMAGE_ID=$(docker image ls | grep jupyter_lab_docker | awk '{print $3}')
      

      Then run the following command to start the Jupyter Lab Notebook server:

       docker run \
        --volume $(pwd):/home/jovyan \
        --publish 8888:8888 \
        --env JUPYTER_ENABLE_LAB="yes" \
        $IMAGE_ID
      

What exactly does this do?

  • Builds a Docker image from the Dockerfile in this post’s linked repository.
    • As you can see if you inspect the file yourself, this Dockerfile is based on the base-notebook Docker image developed by Project Jupyter. As configured, this image first installs Python, then installs all modules listed in requirements.txt. It also launches a Jupyter Lab notebook server locally which you can access in your browser at http://localhost:8888.
  • Runs a local Docker container from the folder in which this repository was downloaded in.
  • Shares local files between the isolated Docker container and your local machine.
    • This is achieved by using Docker’s volume mount functionality; in the run command above, the --volume command tells docker to share the current working directory with the container to be created.

Does this back up or save my work?

Locally, yes. Remotely, no. The work you create from within the Jupyter Lab instance launched by this Docker command will persist locally. To back up your work elsewhere, you can use Git and Github. If you want to grok the Docker workflow and back up everything through Docker images, look into Docker Compose and Docker Cloud. Again, this functionality is enabled by Docker’s --volume functionality.

Why?

Why exactly would anyone want to run a local Jupyter Lab instance from within Docker (or a dockerized container)?

  1. Curiosity.
  2. Hopes that Dockerized workflows may improve the reproducibility of quantitative science.
  3. Hopes that Dockerized workflows may one day improve the composability of Science, by opening up new avenues of collaboration, structure, and incentives within Science.
  4. Procrastination.

How do I take advantage of this workflow?

You will still need (and should still use) Git or some form of version control for your data/files/etc. However, the benefit of the workflow I’m explaining in this post, is that if you only interact with files in this directory through the dockerized Jupyter Lab instance, it will always be (in theory) possible to recreate your entire analysis and workflow from scratch on any platform that can run Docker.

The spirit of Docker is that any platform/installation/requirements needed for what is accomplished within an image is configured within the Docker framework (i.e., in the Dockerfile). Once you embrace and learn the Docker framework, the potential for building composable data science workflows becomes immense.