Papermill: Parameterizing, executing, and analyzing Jupyter Notebooks

85 points by mooreds a day ago | 63 comments

What is the benefit of parameterizing a jupyter notebook over just writing python that's not in a jupyter notebook? I like jupyter notebooks for rapid prototyping but once I want to dial some logic in, I switch to just writing a .py file.

singhrac a day ago | root | parent | next |

We use papermill extensively, and our team is all good programmers. The difference is plots. It is a lot easier to write (and modify our existing template) to create a plot for X vs Y than it is to build and test a script that outputs e.g. a PDF.

For example, if your notebook runs into a bug, you can just run all the cells and then examine the locals after it breaks. This is extremely common when working with data (e.g. "data is missing on date X for column Y... why?").

I think most of the "real" use cases for notebooks is data analysis of various kinds, which is why a lot of people dismiss them. I wrote a blog post about this a while ago: https://rachitsingh.com/collaborating-jupyter/

It's a literate programming tool. If you find literate programming useful (such as Donald Knuth's Latex) then you can write a Jupyter notebook, add text, add latex, titles, paragraphs, explanations, stories and attach code too. Then, you can just run it. I know that this sounds pretty rare but this is mostly how I write code (not in Jupyter notebook, I use Markdown instead and write code in a combination of Obsidian and Emacs). To me, code is just writing, there is no difference between prose, poetry, musical notation, or computer programming. They're just different languages that mean something to human beings and I think they're done best when they're treated like writing.

zelphirkalt a day ago | root | parent | next |

Does it support more of literate programming than the small amount of features, that normal Jupyter notebook supports?

I always wish they would take a hint from Emacs org mode and make notebooks more useful for development.

gnulinux a day ago | root | parent |

No it supports less actually. Obsidian is only a markdown editor, it does allow you to edit code fragments like code (so there is basic code highlighting, auto-tabbing etc) but that's it. I personally find this a lot easier in some cases. I find that sometimes if the code is too complicated that you need anything more than just "seeing" you probably need to break it further down to its atomic elements. For certain kinds of development, I do find myself needing to be in "programming groove" then I use Emacs. But other times, I accompany the code with a story and/or technical description so it feels like the end goal is to write the document, and not the code. Executable code is just a artifact that comes with it. It's definitely a niche application as far e.g. the industry goes.

crabbone a day ago | root | parent | prev |

I have to disagree... Literate programming is still programming: it produces programs (but with an extra effort of writing documentation up-front).

Jupyter is a tool to do some exploratory interactive programming. Most notebooks I've seen in my life (probably thousands at this point) are worthless as complete programs. They are more akin to shell sessions, which, for the most part, I wouldn't care storing for later.

Of course, Jupyter notebooks aren't the same as shell sessions, and there's value in being able to re-run a notebook, but they are so bad at being programs, that there's a probably a number N in low two-digits, where if you expect to have to run a notebook more than N times, you are better off writing an actual program instead.

gnulinux a day ago | root | parent | next |

Literate programming is not just "documentation + code" any more than a textbook you read about Calculus is "documentation + CalculusCode" or a novel is "documentation + plot". It goes way beyond that, using literate programming you can attach an arbitrary text that accompanies the code such that fragments of your code is simply one part of the whole text. Literate programming is not just commenting (or supercommenting), if it were, you could use comments, it's a practice of simply attaching fragments of code in a separate text such that you can then later utilize that separate text the same way you utilize code. When you write a literate program, your end goal is the text and the program, not just the program. You can write a literate program, and publish it as is as a textbook, poem, blog post, documentation, website, fiction, musical notation etc... Unless you think that all human writing is documentation then literate programming is not just documentation.

crabbone 21 hours ago | root | parent |

Yes. I tried it. And, eh... it's documentation + code (you can publish code + documentation as a textbook, poem, blog post, Website just as well). No need to exaggerate. It's also very inconvenient to write, for zero benefits. It's kind of like writing prose in one language, and then translating individual pieces of it into another language, while hoping that somehow the sum will still come out OK.

Some people like challenge in their lives... and I don't blame them. For sport, I would also rewrite some silly programs in languages I never intend to use, or do some code-golfing etc. Literate programming belongs in this general area of making extra effort to accomplish something that would've been trivial to do in a much simpler way.

dayjaby 10 hours ago | root | parent |

> it's kind of like writing prose in one language, and then translating individual pieces of it into another language

That's why I'm stuck in Tolstoi's War and Peace. You have to know French to get past the first few pages.

crabbone 7 hours ago | root | parent |

Ha! I had to read that in the 8th grade. At first it was very confusing, because I thought I for some reason got a book in French. But then I just skipped to the part where it started in Russian. Later, after a discussion in class, I got a vague idea that that part was some sort of a description of a ball and some high-society stuff... it wasn't at all useful for any further work we had to do on the book, so, I don't actually know what that part was about. All further reading and discussion focused on countess Olga and her thinking about the war (which the teacher claimed was the reflection of Tolstoy's own views).

But, more to the point of literal programming: it's not the only tool that wants programmers to write some sort of a plan or a sketch of the code before writing code. A much more popular technique is TDD, which, again, wants programmers to write something informally first, and then formalize it later in code. And, as with literal programming, my experience was that it's not helpful to the point of being a distraction.

There's a good reason to think that some sort of a sketch or a blueprint might be useful for the future program. It works like that in many other disciplines. Artists would make sketches before painting the picture, engineers make blueprints etc.

I think that the reason why literal programming doesn't work is because unlike a sketch or a blueprint, one has to carry it on forever (and propagate back the changes, once they are discovered) as long as the code is being worked on. It probably would've worked better if it was some sort of a plan that can be abandoned at any point, something to give the development the initial push, but not requiring any further maintenance.

abdullahkhalids a day ago | root | parent | prev |

> Don’t get discouraged because there’s a lot of mechanical work to writing. There is, and you can’t get out of it. I rewrote A Farewell to Arms at least fifty times. You’ve got to work it over. The first draft of anything is shit. Ernest Hemingway

This is how all intellectual work proceeds. Most of the stuff you write is crap. After many iterations you produce one that is good enough for others. Should we take away the typewriter from the novel writers too, along with Jupyter notebooks from scientists, because most typed pages are crap?

crabbone 21 hours ago | root | parent |

I think, you completely missed the point... I compared Jupyter notebooks to shell sessions: it doesn't make them bad (they are, however, but for a different reason). I don't think that shell sessions are bad. The point I'm making is that Jupyter notebooks aren't suitable for being independent modules inside a larger program (and so are shell sessions). The alternative is obvious: just write the program.

Can you possibly make Jupyter notebook act like a module in a program? -- with a lot of effort and determination, yes. Should you be doing this, especially since the alternative is very accessible and produces far superior results? -- Of course no.

Using your metaphor, I'm not arguing for taking the typewriter away from the not-so-good writers. I'm arguing that maybe they can use a computer with a word processor, so that they don't waste so much paper.

As an MLE who comes from backend web dev, I have flip-flopped on notebooks. I initially felt that everything should be in a python script. But I see the utility in notebooks now.

For notebooks in an ML pipeline, I find that data issues are usually where things fail. Being able to run code "up to" a certain cell and create plots is invaluable. Creating reports by creating a data frame and displaying it as a cell is also super-handy.

You say, "dial some logic in", which is begging the wrong question (in my experience, at least). The logic in ML is usually very strait forward. It's about the data coming into your process and how your models are interacting with it.

jamesblonde a day ago | root | parent |

I agree completely with this. Papermill output is a notebook - that is the log file. You can double click on it, it opens in 1-2 seconds and you can see visually how far your notebook progressed and any plots you added for debugging.

There are a lot of people who are not expert Python programmers, but know enough to pull data from various sources and make plots. Jupyter{Notebook,Lab} is great for that.

As you say, from a programmer's point of view the logical thing to do is to convert the notebook to a Python module. But that's an extra step that may not be necessary in some cases.

FWIW I used papermill in my Master's thesis to analyze a whole bunch of calibration data from IMUs. This gave me a nicely readable document with the test report, conclusions etc. for each device pretty easily.

Some of the replies here are pretty good, I basically agree with “if it works for your data scientists then why not”.

I’m actually a software developer with 10 years experience and also happen to do data science. And found myself in situations where I parametrized a notebook to run in production. So it’s not that I can’t turn it to plain python. The main reasons are

1. I prototype in a notebook. Translating to python code requires extra work. In this case there’s no extra dev involved, it’s just me. Still it’s extra work.

2. You can isolate the code out of the notebook and in theory you’ve just turned your notebook into plain py. You could even log every cell output to your standard logging system. But you loose context of every log. Some cells might output graphs. The notebook just gives you a fast and complete picture that might be tedious to put together otherwise.

3. The saved notebook also acts as versioning. In DS work you could end up with lots of parameters or small variations of the same thing. In the end what has little variations I put in plain python code. What’s more experimental and subject to change I put in the notebook. In certain cases it’s easier than going through commit logs.

4. I’ve never done this but a notebook is just json so in theory you could further process the output with prestodb or similar.

It's the same tradeoff of turning an excel spreadsheet into a proper program.

If you do so, you gain:

* the rigor of the SDLC

* reusability by other developers

* more flexible deployment

But you lose the ability for a non-programmer to make significant changes. Every change needs to go through the programmer now.

That is fine if the code is worth it, but not every bit of code is.

fifilura a day ago | root | parent |

It also implies that an engineer has better understanding of what is supposed to be done and can discover all the error modes.

In my experience, most of the time the problem is in the input and interpretation of the data. Not fixable by a unit test.

I agree. I was at a company where some DS was really excited about Papermill, and I was trying to explain that this is an excellent time to stop working in a notebook and start writing reusable code.

I was aghast to learn that this person had never written non-notebook based code.

Code notebooks are great as notebooks, but should in no way replace libraries and well structured Python projects. Papermill to me is a huge anti-pattern and a sign that your team is using notebooks wrong.

jdiez17 a day ago | root | parent |

So you think it was a good move to scoff at someone for using a computer for their work in a way that is different from your preferences?

crystal_revenge a day ago | root | parent |

Notebooks are great as notebooks, but it's very well established, even in the DS community, that they are a terrible way to write maintainable, sharable, scalable code.

It's not about preference, it's objectively a terrible idea to build complex workflows with notebooks.

The "scoff" was in my head, the action that came out of my mouth was to help them understand how to create reusable Python modules to help them organize their code.

The answer is to help these teams build an understanding of how to properly translate their notebook work into re-useable packages. There is really no need for data scientists to follow terrible practices, and I've worked on plenty of teams that have successfully been able to onboard DS as functioning software engineers. You just need a process and a culture that notebooks cannot be the last stage of a project.

fifilura a day ago | root | parent |

The thing with data pipelines is they have a linear execution. You start from the top and work your way down.

Notebooks do that, and even leave a trace while doing it. Table outputs, plots, etc.

It is not like a python backend that listens to events and handle them as they come, sometimes even in parallel.

For data flow, the code has an inherent direction.

crystal_revenge a day ago | root | parent |

> Notebooks do that, and even leave a trace while doing it.

Perhaps the largest critique against notebooks is that they don't enforce a linear execution of cells. Every data scientist I know has been bitten by this at least once (not realizing they're in a stale cell that should have been updated).

Sure you could solve this by automating the entire notebook ensuring top-down execution order but then why in the world are you using a notebook like this? There is no case I can think of where this would be remotely better than just pulling out the code into shared libraries.

I've worked on a wide range of data science teams in my career and by far the most productive ones are the ones that have large shared libraries and have a process in place for getting code out of notebooks and into a proper production pipeline.

Normally I'm the person defending notebooks since there's a growing number of people who outright don't want to see them used ever. But they do have their place, as notebooks. I can't believe I'm getting down voted for suggesting one shouldn't build complex workflows using notebooks.

I used papermill a while ago to automate a long-running python-based data aggregation task. Airflow would log in remotely to the server, kick-off papermill and track it's progress. Initially I wanted to use pure python, but the connection disconnected frequently disallowing me to track the progress, and also jupyter enabled quick debugging where something went wrong.

Not one of my proudest moments, but it got the job done.

My experience is more with Databricks, and their workflow system... but the concept is exactly the same.

It let's data scientists work in the environment they work best in, and it makes it easier to productionize work. If you seperate them, then there's a translation process to move the code into whatever the production format is which means extra testing, and extra development.

I think there are places where the figure-it-out-in-a-notebook part is one person's job, and then including it in a pipeline is another person's job.

If they can call the notebook like a function, the second person's job becomes much easier.

crabbone a day ago | root | parent |

I've been that person, and no it doesn't. It makes my life suck, if I have to include a notebook instead of an actual program in a larger program. Notebooks don't compose well, they are too dependent on the specifics of the environment in which they were launched, they have excessive source code that's also machine-generated and is hard to work with for humans.

As a stop-gap solution, for cases like a single presentation / proof-of-concept that doesn't need to live on and be reused -- it would work. Anything that doesn't match this description will accumulate technical debt very quickly.

__MatrixMan__ a day ago | root | parent |

I sort of suspected that adding parameters was not the end of the story. My experience with this was just "make it work with papermill", so the notebooks I tested with were nice and self contained.

Although it does seem like packaging dependencies and handling parameters are separate problems, so I'm not sure if papermill is to be blamed for the fact that most notebooks are not ready to be handled like a black box, even after they're parameter-ready. Something like jupyenv is needed also.

crabbone 21 hours ago | root | parent |

Jupyter is not the end of the story here. There are plenty of "extensions". These extensions go, generally, down two different ways: kernels and magic.

It's not very common for Jupyter magic to be added ad hoc by users, but it typically creates a huge dependency on the environment, so no jupyenv is going to help (eg. all the workload-manager related magic to launch jobs in Slurm / OpenPBS).

Kernels... well, they can do all sorts of things... beyond your wildest dreams and imagination. And, unlike magic, they are readily available for the end-user to mess with. And, of course, there are a bunch of pre-packaged ones, supplied by all sorts of vendors who want, in this way, to promote their tech. Say, stuff like running Jupyter over Kubernetes with Ceph volumes exposed to the notebook. There's no easy way of making this into a "module" / "black box" that can be combined with some other Python code. It needs a ton of infra code to support this, if it's meant to be somewhat stand-alone.

It might be a pretty useful tool for education. College courses related to Python and AI on Coursera have heavily used Jupyter Notebook for assignments and labs.

z3c0 a day ago | root | parent | prev |

Parameterizing notebooks is a feature common to modern data platforms, and most of its usefulness comes from saving the output. That makes it easier to debug ML pipelines and such, cos the code, documentation, and last output are all in one place. However I don't see any mention of what happens to the outputs with this tool.

__mharrison__ a day ago | prev | next |

I teach a lot using Jupyter. It is certainly possible to use SWE worst practices in Jupyter easily.

I am often in front of folks who "aren't computer programmers" but need to use Python tools to be successful. One of my covert goals is to teach SWE best practices inside of notebooks. It requires a little more typing but eases the use of notebooks, refactoring, testing, moving to scripts, and using tooling like Papermill.

akshayka a day ago | root | parent |

Have you considered using marimo notebooks?

https://github.com/marimo-team/marimo

marimo notebooks are stored as pure Python (executable as scripts, versionable with git), and they largely eliminate the hidden state problem that affects Jupyter notebooks -- delete a variable and it's automatically removed from program memory, run a cell and all other cells that use its variables are marked as stale.

marimo notebooks are also readily parametrized with CLI arguments, so you can do: python notebook.py -- -foo 1 -bar 2 ...

Disclosure: I'm a marimo developer.

cycomanic a day ago | root | parent | next |

There is also jupytext which converts Jupyter notebooks on the fly to a number of different formats (Markdown, python,...). It's at the core of the Jupyterbook project IIRC and IMO the best method to use Jupyter with git.

__mharrison__ a day ago | root | parent |

I use Jupytext (and my own conversion utilities) all the time. I write my books inside of Jupyter these days.

You should make a new post about the new sandbox feature.

It’s basically Pluto notebooks for Python?

ThouYS a day ago | root | parent | prev |

thanks mate, exactly what I've been looking for

edublancas a day ago | prev | next |

Papermill is great but has quite some limitations because it spins up a new process to run the notebook:

- You cannot extract live variables (needed for testing)

- Cannot use pdb for debugging

- Cannot profile memory usage

You can do all of that with ploomber-engine (https://github.com/ploomber/ploomber-engine).

Disclaimer: I'm the author of this package

ziddoap a day ago | root | parent | next |

Not disclosed in this comment is that edublancas is

>Ploomber (YC W22) co-founder.

Kalanos a day ago | root | parent |

who is a great technologist with a lot of hands on experience. if it made sense to leverage papermill, he would have done so and focused on something else.

ziddoap a day ago | root | parent |

What does any of this have to do with disclosure?

Kalanos a day ago | root | parent |

calling attention to disclosure suggests bias. i'm obviously saying that i trust him not to be biased.

throwpoaster a day ago | root | parent | prev |

iirc, a few years back I was able to do all of these things with the Papermill IPython runtime.

Papermill is great, but yes: lots of room to hack on it and make it better.

edublancas a day ago | root | parent |

has papermill deprecated the ipython runtime? I used papermill extensively in the past and I never saw that in their docs.

throwpoaster a day ago | root | parent |

It’s been a while but you do it with a custom kernel and maybe some entry point tweaks. IIRC.

pplonski86 a day ago | prev | next |

I have few ML pipelines that are simply using nbconvert to execute notebooks. Regarding python script vs notebook debate I think it all depends on your use case. I like that I can display plots in notebooks without any additional work.

miohtama a day ago | prev | next |

I looked Papermill back in a day, but found it easier call nbclient and nbconvert directly

https://github.com/tradingstrategy-ai/trade-executor/blob/ma...

morkalork a day ago | prev | next |

I once built an unholy combination of papermill and nbconvert to mass produce monthly reports using a "template" notebook. All the code was imported from a .py file so the template just took a client ID as input and called out to render_xyz(...) in each section. It was nice because it produced a bunch of self-contained static files and wrote them to a network drive. It was definitely a solution to the problem.

jamesblonde a day ago | prev | next |

For MLOps platforms, Papermill is the one of the reasons why we don't include experiment tracking out of the box any longer in Hopsworks. You can easily see the results of training runs as notebooks - including loss curves, etc. Any models that completed get registered in the model registry along with plots, a model card, and model evaluation/validation metrics.

notpushkin a day ago | prev | next |

> Do you want to run a notebook and depending on its results, choose a particular notebook to run next?

Hell no. I want to rewrite all that as a proper script or Python module.

p4ul a day ago | root | parent |

Indeed! I feel like we as a community have taken a wrong turn with our use of notebooks. I think they have benefits in some specific use cases (e.g., teaching, demos, etc.), but otherwise, I think they mostly encourage bad practices for software development.

etbebl 20 hours ago | prev | next |

Am I the only one who cringes at the name of this tool? The implication being that it's a way to churn out low-quality papers, probably with some p-hacking along the way.

cmcconomy a day ago | prev | next |

In the past, I've used this to generate HTML outputs to reflect a series of calculations and visualizations so that we can share with clients.

Kalanos a day ago | prev | next |

Is this still being developed? The last commit to the main library was 5 months ago and its tied to exceptions/tests.

reeboo a day ago | root | parent | next |

It's a thin wrapper around notebooks. Does it really need more features? Not saying that it couldn't, but it is feature complete for what its job is.

Kalanos a day ago | root | parent |

things break due to shifting dependencies.

also, if it isn't maintained by the company that made it, then it is a good sign that they are no longer using it. it suggests that there is a better solution elsewhere.

barrrrald a day ago | root | parent | prev |

Seems like it's mostly died off, most people I know have moved to hosted solutions like Hex or Colab

iamleppert a day ago | prev | next |

Jupyter notebooks are missing strict types, a linter and unit tests. When can those features be added?

ogrisel a day ago | root | parent | next |

With papermill you can parametrize a notebook and run it on different inputs to check that it is not raising uncaught exceptions. This can be wrapped to be part of a pytest test suite, possibly via a some ad-hoc pytest fixture or plugin.

If the notebooks themselves contain assertions to check that expectations on the outputs are met, then you have an automated way to check that the notebooks behave the way you want on some test inputs. For long notebooks, this is more like integration/functional tests rather than unit tests, but I think this is already an improvement over manually run notebooks.

Note sure about strict types: you mean running mypy on a notebook? Maybe this can be helpful:

- https://pypi.org/project/nb-mypy/

About linters, you can install `jupyterlab-lsp` and `python-lsp-ruff` together for instance.

big-chungus4 a day ago | root | parent | prev |

vscode jupyter uses the same extensions as vscode, so you can get a linter and scrict type checking. Not sure about tests though

v3ss0n a day ago | prev |

How hard it is copy , paste and run the note book code within proper http server?

sa-code a day ago | root | parent |

There is some utility in seeing the code and the output right below it.