Initial Steps at Getting polynote and ZeroToJupyterHub to work together (ish)

JupyterHub has long been the de facto leader in the notebook space. Polynote is a new approach to notebooks with multi-language kernels and a focus on first-class Scala support. There is a lot of tooling to make deploying & managing JupyterHub easy for various environments. Since Polynote is relatively new, the same tooling does not yet exist. Since I am lazy, and I wanted a great Notebook for Spark (w/ Scala & Python), I figured I’d try and see if I could get my ZeroToJupyterHub deployment to launch a Polynote notebook. This way I can have multi-user Kubernetes deployments of Polynote alongside my other Kernels that I’m using (Python w/Dask, Python w/Ray, etc.). It turns out, yes it is possible, and there are of course some things I learned along the way and despite my initial thought this would "take an hour or so" it ended up being an 8 part stream, and I don’t have security set up.

The first step I did was creating a new Dockerfile for JupyterHub to launch so that we could put any customizations needed inside of it. We could view this Dockerfile as "foreshadowing" for the other changes I’m going to describe:

FROM holdenk/polynote:dev

# Being root again
USER root
# A script to ignore everything we tell it. I'm sure that's useful
COPY scripts/jupyter-fake-launch.sh  ./polynote/
# Copy config file
COPY --chown=${NB_USER}:${NB_USER} docker/jupyter/config.yml /config.yml
# Back to being a safe-ish user
USER ${NB_USER}
# Use our custom launch script
ENTRYPOINT ["./polynote/jupyter-fake-launch.sh"]

Polynote is configured by a config.yml file, and ZeroToJupyterHub mounts the user storage at /home/joyvan & expects the server to be listening on port 8888 (instead of Polynotes default of 8192). So I added a config.yml of

listen:
  host: 0.0.0.0
  port: 8888
storage:
  dir: /home/jovyan

ZeroToJupyterHub is designed to launch JupyterHub containers and so it passes some command-line arguments as expected by JupyterHub. The simplest option I could think of was to create a script that ignored most of the information from JupyterHub and reformatted the other arguments as needed.

#!/usr/bin/env bash
set -ex
# Ignore everything the Jupyter launcher tells us
cp -r /opt/notebooks/examples /home/jovyan/ || echo "Examples copied"
# Use the JUPYTERHUB_SERVICE_PREFIX as the base_uri since we need to match the reverse proxy set up by zero-to-jupyterhub.
echo "
ui:
  base_uri: $JUPYTERHUB_SERVICE_PREFIX
" >> /config.yml
./polynote/polynote.py --config /config.yml

Unfortunately, while Polynote has support for being served by reverse proxies at sub-urls it makes the assumption that the reverse proxy is always rewriting the requests to be relative to "/". Fixing this required adding a new function (normalizePath) which was relatively straightforward, although I did screw up putting it in all of the right places which led me to a lot of confusion (and is honestly part of why it was an 8 part stream).

    def normalizePath(path: String): String = {
      path.stripPrefix(userPrefix)
    }

All in all the (not very nice, but serviceable) integration is on my GitHub Polynote integrate-with-zero-to-jupyterhub branch.

In addition, while I was working on this I found various issues with the Polynote Dockerfile and build instructions not being in sync. This is pretty normal, but since I was new to the project I took several miss-steps, but I’ve submitted some PRs (including some merged) to clarify the issues for others.

While it’s possible to launch the Polynote notebooks there is still a lot of work to be done. First off is there is no security once the notebook is launched, and the second matter is including the traditional JupyterHub header so the user can stop & restart their Polynote environment as desired.