Things I wish python packaging learned from the JVM Jars

Alon Nisser
ITNEXT
Published in
2 min readJan 28, 2022

--

We’ve moved forward a lot with dependency management in python, but a crucial piece is still missing.

I’ve had my share of python dependency management tooling ranting, but you’ve got to admit python dependency management has gone a really long way. Tools like poetry, based on a series of PEPs gave us both the dependency locking capabilities of NPM and the ability to manage a really simple flow for package building. No more arcane `manifest.in` and setup.py dark magic, but clean, clear and simple `poetry build` and you’ve got your wheels.

Ljubljana Marshes Wheel, from around 3150 BCE (restored model of the oldest exactly radiocarbon dated wooden wheel part in the world).

And Wheels are great! But.. when being installed download all the specified dependencies, which again, is a logical solution for most use cases today, unless.. your library package depends on a private repo package? Or is it intended for installation in an internetless “secure” environment? Then we’re in trouble.

Personally, I didn’t bump into this until recently. Most of my dependencies installations are in docker builds, in a managed CI environment where I can provide private repo credentials, without them being exposed on the actual artifacts. But I needed to add a private package in my Databricks installed wheel with a spark job, and suddenly I was in trouble.

There are workarounds, You might specifically install also your private dependencies before you install your wheel, you might provide the credentials in the production environment. Or set up a mirror/cache to replace pypi with your packages. Other options include projects like Wagon or Shiv, Or manually bundling all the wheels for the dependencies

Enter the Uber Jar

The JVM ecosystem has bumped into this issue a long time ago, and developed tooling for this. Assembly plugins exist in all major build tools (gradle, maven, sbt for scala, etc) allowing you to create “Uber Jars” containing all dependencies and delivering them together to production. They also allow you to control what not to include, for example, if you get spark pre-installed in every cluster you don’t need it to be part of the UberJar, and tools to resolve conflicts if needed.

Epilogue

I’ll be happy to stand corrected and find a simple way to pack my dependencies into the wheel for an air-locked environment. But if not, I sure hope the python dependency management tools would provide us with a solution. I’m sure someone is already working on that.

Special thanks to Shai Berger, Tal Einat, Meir Kriheli, Avraham Serour, and the whole Pyweb-IL group for their suggestions and input and for being a great place to learn and a community for so many years.

--

--