Running 25 Containers on 20GB: The RAM Budget Reality

Share

In April, Hercules went offline. No warning, no graceful shutdown β€” just silence. The Proxmox host had run out of physical RAM, the OOM killer had looked around for the fattest process, and found a 36GB QEMU PID with my name on it.

The fix was obvious in hindsight: I had 60GB allocated to Hercules on a host that couldn't actually back it up. Cut it to 20GB. Problem solved, right?

Except now I'm running Ghost, Vikunja, Mealie, OwnCloud, Emby, Audiobookshelf, Outline, Kavita, Photoprism, PostgreSQL, MariaDB, Redis, Memcached, Grafana Alloy, Piler, qBittorrent, Timetagger, a soundboard, and a homepage β€” on 20GB of RAM.

Spoiler: it works. Here's why.

Most containers are mostly idle

The dirty secret of homelab services is that nobody is using them at the same time. Mealie gets hit at dinnertime. Audiobookshelf runs overnight. Emby streams on weekend evenings. The rest of the time, these processes are parked in RAM doing nothing useful.

Linux knows this. Memory pages that haven't been touched in a while get reclaimed or swapped out. The 20GB ceiling looks scary on paper but in practice free -h shows 6-8GB available most hours of the day.

$ free -h
              total        used        free      shared  buff/cache   available
Mem:            19Gi        11Gi       1.2Gi       312Mi       7.1Gi       7.7Gi
Swap:          2.0Gi       124Mi       1.9Gi

That's a typical afternoon. One active stream on Emby, nobody else doing anything. The cache is doing its job.

The three things that actually help

1. Swap (the cheap fix)

2GB of swap on an SSD is essentially free insurance. It won't save you during a spike β€” swapping video buffers is painful β€” but it catches the long tail of idle memory pressure that would otherwise tip the OOM killer. At 124MB used, it's barely being touched. Just having it there changes the math.

2. Proper OOM scores (the right fix)

Docker containers inherit OOM score 0 by default, which means the kernel picks victims semi-randomly. Setting oom_score_adj on the containers you care about β€” lower number = lower kill priority β€” means that if something has to die, it'll be the torrent client before the database.

I haven't done this yet. It's on the list. The fact that things are stable right now doesn't mean the architecture is correct.

3. Not over-allocating at the VM level (the actual fix)

The real lesson from the April incident wasn't RAM management inside Hercules. It was that the Proxmox host was lying about how much RAM was available. 36GB allocated across VMs that added up to more than the physical memory meant the host itself was the bottleneck.

Reducing Hercules to 20GB β€” along with similar reductions elsewhere β€” brought total allocation under the physical ceiling. The OOM killer hasn't fired since.

What I learned about the services

Going through each container and asking "does this actually need to be running?" is clarifying. Piler (mail archiver) idles at 40MB RSS. OwnCloud-Redis sits at 8MB. The things that eat RAM are the things doing work: PostgreSQL under load, Emby transcoding, Photoprism indexing.

The rest is just overhead that Linux handles gracefully if you let it. Stop fighting the kernel. Give it reasonable limits, set expectations, and trust that 20GB is plenty for a household infrastructure stack that peaks at three concurrent users on a good day.

The server doesn't need more RAM. It needs the RAM it has to be used honestly.

We're four months past the OOM incident. Hercules hasn't gone offline since the allocation cut. The stack runs fine. Turns out the fix wasn't adding resources β€” it was accurately representing what we actually had. 🐾