Running everything yourself

I rebuilt this homelab from scratch last year and found that most of what I'd built previously was held together by configs I couldn't reproduce quickly. VFIO binding, NFS fstab options, startup ordering — the details that only matter when things break. The rebuild was slower because I documented everything. It runs reliably now.

The hardware is an Intel i5-12600KF with an RTX 3070, 1TB NVMe, Proxmox VE 8 as the hypervisor. Eleven guests: four VMs and seven LXC containers, sharing the same physical hardware but kept appropriately isolated from each other.

text

Proxmox VE 8.x
│
├── VMs (full isolation, own kernel)
│   ├── Home Assistant OS          — home automation
│   ├── TrueNAS SCALE              — storage (HBA passthrough, ZFS mirror)
│   ├── AI                         — Ollama + OpenWebUI (GPU passthrough)
│   └── Dokploy                    — web apps
│
└── LXC containers (shared kernel, lower overhead)
    ├── n8n                        — workflow automation
    ├── cloudflared                — Cloudflare tunnel
    ├── docker                     — Portainer, Homepage, VS Code Server
    ├── paperless-ngx              — document management
    ├── Proxmox Backup Server      — guest snapshots
    ├── kutt                       — link shortener
    └── paperless-gpt              — unused

The VM vs. LXC split is deliberate. VMs get full isolation and their own kernel when they need it: Home Assistant because it interacts with hardware directly, TrueNAS because ZFS wants to own its own memory management and ARC, the AI VM because GPU passthrough requires complete machine-level boundaries. Dokploy runs a WordPress install and a few smaller web apps in a PaaS-style environment that's easier to manage than raw containers for that kind of workload. Everything else runs as an LXC — faster to spin up, lower memory overhead, sufficient for services that don't need a kernel of their own.

Getting the GPU passed through to the AI VM was the most technically demanding part of the build. Before the VFIO config, the firmware prerequisites need to be right: VT-d (Intel IOMMU) enabled, Above 4G Decoding enabled, CSM disabled for UEFI, and specific to the i5-12600KF (no integrated GPU) VGA Detection set to Ignore. Without that last one, the BIOS stalls on cold boot waiting for a display that will never appear on the host. With IOMMU active in the kernel cmdline, VFIO binding looks like this:

bash

# /etc/modprobe.d/vfio.conf
options vfio-pci ids=<GPU_VENDOR_ID>:<GPU_DEVICE_ID>,<AUDIO_VENDOR_ID>:<AUDIO_DEVICE_ID>
softdep nvidia pre: vfio-pci

# /etc/modules-load.d/vfio.conf
vfio
vfio_iommu_type1
vfio_pci

The Z690 chipset puts the GPU in its own IOMMU group without needing an ACS patch, which is a common frustration on cheaper platforms where multiple devices share a group and have to be passed through together or not at all.

The VM has two related headless settings: --vga none removes the virtual display device entirely, and x-vga=0 on the hostpci entry tells the passthrough layer not to treat the GPU as the primary VGA output. Both are needed. One without the other still causes conflicts — the virtual display competes for framebuffer ownership and breaks the noVNC console. Management is via SSH.

One non-obvious thing: after enabling the QEMU guest agent with --agent enabled=1, the virtio-serial channel it needs won't initialize on a reboot. It requires a full stop and start of the VM. A reboot leaves the guest agent offline in Proxmox until you do a cold cycle.

Once the GPU was confirmed working — NVIDIA driver loaded, CUDA recognized, VRAM visible under inference load — Ollama ran model inference entirely on GPU. OpenWebUI sits in front of it, published externally through the tunnel.

Storage runs through TrueNAS SCALE on a ZFS mirror. The HBA passes through to TrueNAS the same way the GPU passes through to the AI VM: the Proxmox host's kernel never touches the drives. ZFS manages them from within the TrueNAS VM, where the ARC lives and where snapshot schedules run.

text

TrueNAS SCALE (ZFS mirror, 2× 10TB)
└── tank/
    ├── paperless/
    │   ├── consume     — NFS export → paperless-ngx LXC
    │   ├── media       — NFS export
    │   └── export      — NFS export
    ├── media/          — SMB share
    └── backups/
        └── proxmox/    — NFS export → Proxmox Backup Server LXC

Snapshot schedules reflect how replaceable each dataset is. Paperless data gets daily snapshots with 30-day retention. Media gets weekly with 8-week retention. A missing media file is annoying; a missing scanned document is harder to recover.

The Proxmox host mounts the TrueNAS NFS shares and passes them into the LXC containers as bind mounts. Keeping the mount lifecycle at the host level means the containers don't need to care about the network path:

bash

# /etc/fstab (Proxmox host)
<TRUENAS_IP>:/mnt/tank/paperless/consume  /mnt/nfs/paperless-consume  nfs  nofail,soft,timeo=30,retrans=3,bg  0  0
<TRUENAS_IP>:/mnt/tank/backups/proxmox   /mnt/nfs/pbs                nfs  nofail,soft,timeo=30,retrans=3,bg  0  0

# Bind into LXC
pct set <CONTAINER_ID> -mp0 /mnt/nfs/paperless-consume,mp=/opt/paperless/consume,shared=1

The NFS options are chosen to survive reboots and network gaps. nofail keeps a missing mount from blocking boot. soft lets NFS operations time out instead of hanging. bg retries the initial mount in the background if the network isn't ready. On a home server with a single uplink, this matters.

One NFS-specific catch: inotify filesystem events don't cross network mounts. Paperless won't detect new files in the consume folder without polling. Setting PAPERLESS_CONSUMER_POLLING=10 in the paperless config puts it on a 10-second polling interval instead of relying on inotify.

Because NFS consumers depend on TrueNAS being fully up, and the backup server depends on TrueNAS too, startup order matters:

text

# Proxmox VM/CT startup config
TrueNAS:       order=1, up=90   # wait 90s for ZFS to finish importing
PBS:           order=2, up=60
paperless-ngx: order=3
AI VM:         order=4, up=60

Without explicit ordering, Proxmox starts guests roughly in parallel. paperless-ngx would try to consume from an NFS share that doesn't exist yet, fail silently, and need a manual restart.

Access to every service goes through a Cloudflare Zero Trust tunnel. The cloudflared LXC maintains a persistent outbound connection to Cloudflare's edge; external requests arrive there and route inward via access policy. Nothing is port-forwarded from the router.

text

Internet
    │
    ├── Protected routes ──► Cloudflare Access (Google OAuth) ─┐
    │                                                            │
    └── Bypass routes (API paths, mobile OAuth) ──────────────►─┤
                                                                 │
                                                         cloudflared LXC
                                                       (outbound tunnel only)
                                                                 │
                                                       Internal services
                                                    (no ports open on router)

Every route is gated behind Google OAuth by default. Bypass rules exist for paths where interactive auth isn't possible: mobile app callbacks, API integrations, Home Assistant's OAuth token endpoint. Each bypass is a separate Cloudflare Access application scoped to that specific path, not a policy exception on the main app.

Backups have two layers. Proxmox Backup Server handles guest snapshots; it runs in an LXC with its datastore on TrueNAS, so all backup data lands on ZFS. But PBS only covers what's inside the guests. It doesn't cover what makes the host able to run them: the VFIO bindings, the NFS fstab, the kernel cmdline with IOMMU flags, the network bridge config.

A second nightly script handles that layer: it archives /etc/pve, network config, fstab, modprobe, and kernel cmdline into a timestamped tarball on TrueNAS, with 14-day rotation managed by a systemd timer:

ini

# /etc/systemd/system/pve-host-backup.timer
[Timer]
OnCalendar=*-*-* 01:00:00
Persistent=true

A PBS restore recovers the VMs. The host config backup recovers the ability to run them.

A few things that only become obvious when they go wrong:

apt full-upgrade immediately after a fresh Proxmox install, not just apt upgrade. Stale packages from the installer break LXC template creation for newer Debian releases. Not in any guide. First thing that bites.

TrueNAS VMs should always have balloon disabled. ZFS ARC needs a fixed memory allocation; a balloon device compressing it causes more problems than a smaller static allocation would.

After restoring the AI VM from a PBS backup, it comes up requesting DHCP and may get a different IP than it had before. Set a static IP via netplan inside the VM immediately after the first restore, before anything else tries to reach it.

Ceph services can be silently enabled on a fresh Proxmox install. Check and mask them if you're not running Ceph.

What a working homelab feels like is different from building one. Building it is a series of problems: wrong IOMMU group, NFS mount hanging on boot, GPU VM IP changing after a restore. Running it is quieter. The digest runs at 8am. Documents scan when they come off the printer. Models respond. The machine is doing things while I'm not thinking about it.

The interesting problems weren't the services. They were the seams: boot order, NFS mount flags, PCIe isolation requirements, the backup gap that only appears when you ask what you'd actually lose. Getting those right is what turns a homelab into infrastructure.

Running everything yourself

Read next