In Defense of Package Managers
It’s not their fault your build broke!
Open Source package managers are one of the most maligned pieces of software in common use today. I’m here to correct that criticism and tell developers that it’s not the package managers you hate — it’s what they’ve made you become. This contains a bit of a history lesson to explain how we got here today, as well as what I think the package management world will look like in the future.
What Is A Package Manager?
A package manager is anything developers run to install packages! This category includes language-level package managers like npm
, yarn
, pip
, go mod
, maven
, gradle
, wapm
, etc. This category also includes system-level package managers like apt-get
, yum
, dnf
, apk
, etc. The all work roughly the same at a high level:
- A user asks to install a package or set of packages
- The package manager performs some basic dependency resolution
- The package manager calculates the full set of transitive dependencies, including version conflict resolution
- The package manager installs them
This is where the similarities stop. Packages are stored and fetched in custom formats, unique to each manager. The APIs used to access package metadata are also bespoke and vary enormously. Version constraint support ranges from fuzzy matching of semver strings, to Golang’s fancy new MVS, to full SAT constraint solver that would be a lot of fun to implement on a whiteboard during an interview. Some installation steps are as simple as unpacking a tarball; some are as dangerous as executing code fetched from the internet.
If you’re lucky, you manage to get the command to finish successfully, resulting in a correctly installed set of packages! After that, any bugs found are your problem.
Why Are Package Managers Hated?
Package managers make you think about things you don’t want to think about. You’re trying to install a piece of software to get your job done, and you get yelled at about version conflicts and deprecated transitive dependencies and maintainers that need funding and vulnerabilities everywhere. Developers have a very selfish view of the software ecosystem — they only care about their code, right now. Engineers like to pretend any code in their dependency tree is 100% stable, secure, and supported. Engineers also pretend they’ve put up a huge warning sign on their code with a waiver and terms and conditions making them immune to complaints about stability.
This obviously doesn’t make sense — no one wants to think about stability for their own code, but their code is someone else’s dependency.
Package managers are just the messenger. It’s not their fault you use 1000 unmaintained libraries full of CVEs. I’m not saying package managers are perfect and that there’s no room for improvement, but most of the criticism is misdirected.
How did we get here?
Open source has been around for ages, why are still reinventing and complaining about package managers? I don’t think this is a big case of NIH, although there’s certainly some of that. It’s because the problem space has changed so much, arguably for the better. The big change has been the shift from “stable distros” to “rolling distros”, and finally to “no distros”. For this to make sense, we first need to define what a “distro”, or “distribution” is, which will require some history.
Traditionally, software was distributed in release archives, or tarballs. Development happened in some kind of VCS (or not!). When the maintainers wanted to cut a release, they chose a commit (or whatever their VCS called it) and exported all of the source code into an archive that they would publish on some kind of website. Their users would then download this code and compile it into tools or add it as a library to their own applications. This was a heavyweight process, so most people only depended on a small set of libraries. The overall process looked roughly like:
This still happens today, but it’s much less common than in the past, because it’s difficult! There’s no central location to find releases of software, there’s no standard metadata about what is inside each tarball, installation steps vary, etc. Each dependency needs to be understood and examined in order to install it manually.
Enter Distributions!
If there’s something developers love to argue about as much as package managers, it’s Linux distributions! There are over 1000 active distributions today, but the first began to appear in 1992, about a year after the initial release of Linux itself! The primary selling point of these initial distributions was easy access to a large catalog of packages via package managers. And this is exactly where the confusion begins! Package managers/distros provide two very different, but complementary services: easy access to a huge catalog of software, and support for that software! Package managers make installing code easy, distros make sure the code you installed works and is secure.
Package manager UX continues to be innovated on today — I have my own opinions here :) — but the process of supporting software hasn’t changed much. This isn’t really rocket science, but it is hard, under-appreciated work. Here’s roughly what it looks like:
It’s easy to overlook, but when you run apt-get install curl
, you’re not installing curl from the upstream maintainers! You’re installing a fork of curl, prepared and maintained by the Debian package maintainers. This fork contains changes to fit it into the Debian package ecosystem, as well as security fixes! This system allows the Debian maintainers to define their own support timelines, decoupled from that of the upstream maintainers. This comes in handy for long-term releases, and allows Linux distributions to make stability guarantees without placing a burden on the original authors. Note: this doesn’t really remove the support burden, it just moves it around.
The Rise Of Language Package Managers
Language package managers have been around almost as long as distro package managers, but have only started to explode in the last decade or so. CPAN for Perl launched in 1995, followed up quickly by PyPI (Python), RubyGems and Maven (Java) in 2003. NPM appeared in 2009, presenting a slightly different take on packaging software.
Unlike most other package managers (still today), NPM allowed for multiple versions of the same dependency by including the full transitive tree of every package. NPM also encouraged the creation of many small, purpose-built packages rather than a few, larger, kitchen-sink ones. These new patterns, combined with the rise in Open-Source Software in general has pushed modern software to a breaking point
A small group of distro maintainers can’t support everything in the world, but language package managers made it so easy to publish code that distros couldn’t keep up. It’s possible to install Python via apt-get
on Debian, and there’s even a large set of Python packages available to be installed this way! But most Python users ignore this and use pip. Why use pip when the distro packages are supported? Well, support usually means slow. This is intentional — the Debian maintainers can’t keep up with every version of every Python package, so they curate and choose a stable set. That’s not always what end users want, they often prefer fine-grained control when writing an application. If you file a bug in a library, or send a patch yourself to fix it, you shouldn’t have to wait 3 years to get this patch into your application code!
Wait So What’s the Problem Again?
Developers want it both ways. They want the convenience and speed of “living at head” with a huge selection of libraries that are updated frequency, but they also want the stability and security of a curated, maintained set of packages. They also want this for free!
No one actively chooses unmaintained, insecure code. But npm
, go get
, and pip install
don’t go out of their way to warn or prevent you from using it. Things have started to change with the growth of SCA tooling like snyk
, trivy
and deps.dev, but for the most part users still don’t know or don’t care where their dependencies come from. Language package managers provide all of the convenience of distro package managers, with none of the trust.
Distros with support lifecycles of 3 years are fundamentally incompatible with package ecosystems where libraries are supported for ~months~. Each Kubernetes release is only supported by upstream maintainers for one year! The Debian maintainers have debated packaging Kubernetes on several occasions, with the unfortunate conclusion that it’s just not possible yet, and might never be.
Where Are We Headed?
I hope package managers become boring, and we start focusing on packages. Care about what you use. Use automation to stay up to date. Stop pretending that open source code is free from bugs. Use supported libraries, and help pay maintainers to support those libraries!
I actually think it’s OK if we bifurcate the package manager landscape between “stable” foundations of OS package managers and the faster-moving language package managers. We just need to acknowledge this risk, and apply be careful when using these faster ones.