staticnotes.org

Clearing up confusion around IPython, ipykernel, and Jupyter notebooks

⋅ 4 minute read

Contents

One of my big recurring time sinks while doing data science work used to be trying to get my colleagues’ jupyter notebooks to run on my machine. The main contributing factors:

So this is my attempt at a friendcatcher external link . I hope this saves you a few minutes the next time you run into similar issues.

The different components

Let’s distinguish the components that play a role in running a jupyter notebook:

graph TD;

colab(Google Colab UI) <--> ipykernel(ipykernel)
vs(VS Code UI) <--> ipykernel
ui(jupyter notebook UI) <-->  server
server(jupyter server) <-->  ipykernel
ipykernel <-->  ipython[IPython] 

Jupyter kernels vs. shell environment

One source of confusion is that jupyter kernel can point to a different python executable than your shell environment.

To get an overview of available jupyter executables you can use:

Every jupyter kernel folder includes a kernel.json file that links to the python executable that is being used. Note this can be different to the python executable referenced by your current shell. Moreover, the shell environment of a Jupyter notebook uses the python executable used to launch the notebook.

You can create new kernels using the ipykernel package:

1$ python -m ipykernel install --user --name envname --display-name "Python (envname)"

Dependency management

Since I use VS Code as my frontend I just need to add the ipykernel package into the virtual envrionment that I use to manage all other dependencies used to run the notebook. This ensures that the same python executable is used for the kernel and the shell environment.This is well explained here external link .

These are the steps to create a new environment for a jupyter notebook:

  1. Create project folder:
    1$ mkdir notebook_project
    
  2. Create new virtual environment in the folder, then activate it
    1$ cd notebook_project
    2$ python3 -m venv .venv
    3$ source .venv/bin/activate
    
  3. Install ipykernel (and other dependencies) using pip (make sure the venv is activated):
    1$ python3 -m pip install ipykernel
    2$ python3 -m pip install pandas
    
  4. Create a new notebook
    1$touch mynotebook.ipynb
    
  5. Open notebook in VSCode and in the top right corner select Select Kernel –> Python Environment –> .venv (.venv/bin/python)
  6. You should now be able to run the notebook and use the pandas package inside the notebook.
  7. To add new dependencies:
    • Use the terminal: $ python3 -m pip install <package_name>
    • Install from within a notebook cell:
      1import sys
      2!{sys.executable} -m pip install <package_name>
      
    • Specify your dependencies in a requirements.txt

I use poetry external link for virtual environments and dependency management. So in step 2 I would instead use:

1$ cd notebook_project
2$ poetry init

and install packages via:

1$ poetry add ipykernel
2$ poetry add pandas

If I want to use the default Jupyter UI, I can install the jupyter metapackage into my environment and then start the UI with:

1$ poetry add jupyter
2$ poetry run jupyter notebook

Links

If you have any thoughts, questions, or feedback about this post, I would love to hear it. Please reach out to me via email.

#python   #data-engineering   #data-science   #notebook