Running Ollama + ROCm on AMD Strix Halo in a Proxmox LXC
AMD's Strix Halo silicon (gfx1151) is fast, weird, and very new — too new for most pre-built Ollama binaries. Here's the full process to get ROCm 7.2 working in a Proxmox LXC, including every gotcha I hit along the way.
I picked up a GMKtec mini PC with the AMD Ryzen AI MAX+ 395 (Strix Halo) — 128GB unified memory, Zen 5 cores, and an RDNA 3.5 GPU that eats LLMs for breakfast. The problem: gfx1151 is so new that none of the pre-built Ollama binaries or Docker images support it. You have to build from source and wrestle with ROCm 7.2.
After a few nights of pain, I got it working. 36 tok/s on gpt-oss:120b. Three models running simultaneously. Here's exactly what I did.
Overview
Strix Halo's gfx1151 GPU is too new for most pre-built Ollama binaries. This guide documents the full process to get native ROCm working in a Proxmox LXC container, including all the gotchas.
What you'll end up with:
- Ollama running natively in an LXC with full GPU access
- ROCm 7.2 with gfx1151 support
- Models: gpt-oss:120b (~36 tok/s), qwen3:32b (~10 tok/s), qwen3:14b (~23 tok/s)
- Multiple models running simultaneously
Part 1: BIOS Configuration
Before anything else, fix the BIOS memory carve-out. GMKtec defaults to reserving ~64GB as static GPU VRAM, leaving only 62GB for the OS.
- Enter BIOS
- Find UMA Frame Buffer Size (usually under Advanced → AMD CBS → GFX Configuration)
- Set to 2GB (minimum) — do NOT leave on Auto, it defaults to huge reservation
- Save and reboot
You should now see ~120GB in the OS:
free -h # Should show ~120GB total
Part 2: Proxmox Host Configuration
2.1 Kernel Parameters
Proxmox with ZFS uses proxmox-boot-tool, not standard grub. Edit /etc/kernel/cmdline:
nano /etc/kernel/cmdline
Add iommu=pt amdgpu.gttsize=126976 ttm.pages_limit=32505856:
root=ZFS=rpool/ROOT/pve-1 boot=zfs iommu=pt amdgpu.gttsize=126976 ttm.pages_limit=32505856
Apply and reboot:
proxmox-boot-tool refresh
reboot
What these do:
| Parameter | Effect |
|---|---|
iommu=pt | Pass-through mode, reduces GPU memory access overhead |
amdgpu.gttsize=126976 | Sets GPU-addressable memory to ~124GB |
ttm.pages_limit=32505856 | Allows TTM to pin ~124GB of pages for GPU |
2.2 udev Rules (GPU Device Permissions)
echo -e 'SUBSYSTEM=="kfd", KERNEL=="kfd", MODE="0666"\nSUBSYSTEM=="drm", KERNEL=="renderD*", MODE="0666"' | tee /etc/udev/rules.d/70-kfd.rules
udevadm control --reload-rules && udevadm trigger
2.3 GPU Performance Mode
Set GPU to high performance (note: doesn't persist across reboots — see Part 6 for fix):
echo 'high' > /sys/class/drm/card1/device/power_dpm_force_performance_level
Check your card name first:
ls /sys/class/drm/ | grep -v '\-' # Find your card name (card0 or card1)
Part 3: LXC Container Setup
3.1 Create the LXC
Use the Proxmox setup script from eikaramba/proxmox-setup-scripts:
cd /root
git clone https://github.com/eikaramba/proxmox-setup-scripts.git
cd proxmox-setup-scripts
./guided-install.sh
Run scripts in order: 001 (tools), 003 (AMD drivers), 007 (udev rules), 031 (create LXC).
3.2 LXC Config
Edit /etc/pve/lxc/100.conf and ensure these lines are present:
memory: 122880
lxc.cgroup2.devices.allow: c 226:* rwm
lxc.cgroup2.devices.allow: c 234:0 rwm
lxc.cgroup2.devices.allow: c 235:* rwm
lxc.mount.entry: /dev/dri/by-path/pci-0000:c5:00.0-card dev/dri/card0 none bind,optional,create=file
lxc.mount.entry: /dev/dri/by-path/pci-0000:c5:00.0-render dev/dri/renderD128 none bind,optional,create=file
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file
lxc.apparmor.profile: unconfined
lxc.cap.drop:
Note: Replace
0000:c5:00.0with your GPU's PCI address. Find it withlspci | grep VGAon the host.
Critical:
c 234:0 rwmis for/dev/kfd(ROCm compute). Without this, HSA runtime gets "Operation not permitted".
After editing, restart the LXC:
pct stop 100 && pct start 100
Part 4: ROCm 7.2 in the LXC
Inside the LXC, install ROCm 7.2 (7.1 does NOT have gfx1151 rocblas kernels):
# Add ROCm 7.2 repo
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | gpg --dearmor | tee /etc/apt/keyrings/rocm.gpg > /dev/null
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/7.2 noble main" > /etc/apt/sources.list.d/rocm.list
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/graphics/7.1/ubuntu noble main" >> /etc/apt/sources.list.d/rocm.list
apt-get update
apt-get install -y rocm-dev rocm-libs rocm-smi rocminfo rocm-utils hipcc
Verify:
HSA_OVERRIDE_GFX_VERSION=11.5.1 rocminfo 2>&1 | head -20
# Should show HSA System Attributes and Agent 1
Part 5: Build Ollama from Source
The pre-built ollama/ollama:rocm Docker image and the standard Ollama install both bundle their own ROCm libraries that don't support gfx1151. You must build from source.
5.1 Install Go
wget https://go.dev/dl/go1.23.5.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.23.5.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
5.2 Install Build Dependencies
apt-get install -y cmake git build-essential python3 zstd
5.3 Clone and Build
git clone https://github.com/ollama/ollama
cd ollama
# Install Ollama first (gets the service + base libs)
curl -fsSL https://ollama.com/install.sh | sh
systemctl stop ollama
# Build HIP library against system ROCm 7.2
cmake -B build -S . \
-DAMDGPU_TARGETS=gfx1151 \
-DCMAKE_HIP_COMPILER=/opt/rocm/lib/llvm/bin/clang \
-DCMAKE_PREFIX_PATH=/opt/rocm \
-Wno-dev
cmake --build build --parallel $(nproc) --target ggml-hip
# Takes ~5 minutes on Strix Halo (Zen 5 rips through it)
# Build the Ollama binary
export PATH=$PATH:/usr/local/go/bin
go build -o ollama-custom .
5.4 Install
# Install custom binary
cp ollama-custom /usr/local/bin/ollama
# Copy ROCm 7.2 libs into Ollama's lib dir
cp /opt/rocm/lib/libhsa-runtime64.so.1* /usr/local/lib/ollama/rocm/
cp /opt/rocm/lib/libamdhip64.so.7* /usr/local/lib/ollama/rocm/
cp /opt/rocm/lib/librocblas.so.5* /usr/local/lib/ollama/rocm/
cp -r /opt/rocm/lib/rocblas/library /usr/local/lib/ollama/rocm/rocblas/
# CRITICAL: Remove the duplicate hip lib in root dir (causes double-load crash)
rm -f /usr/local/lib/ollama/libggml-hip.so
# Copy our compiled hip lib
cp build/lib/ollama/libggml-hip.so /usr/local/lib/ollama/rocm/libggml-hip.so
5.5 Systemd Service Override
mkdir -p /etc/systemd/system/ollama.service.d
cat > /etc/systemd/system/ollama.service.d/override.conf << 'EOF'
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=11.5.1"
Environment="HSA_ENABLE_SDMA=0"
Environment="OLLAMA_HOST=0.0.0.0"
Environment="LD_LIBRARY_PATH=/usr/local/lib/ollama/rocm:/opt/rocm/lib"
EOF
systemctl daemon-reload
systemctl start ollama
5.6 Verify GPU Detection
journalctl -u ollama -n 5 | grep compute
# Should show: library=ROCm compute=gfx1151 total="160+ GiB"
Part 6: Performance Tuning
6.1 tuned Profile
apt install -y tuned
systemctl enable --now tuned
tuned-adm profile accelerator-performance
6.2 Persistent GPU Performance Mode
The power_dpm_force_performance_level resets on reboot. Create a systemd service on the Proxmox host:
cat > /etc/systemd/system/gpu-performance.service << 'EOF'
[Unit]
Description=Set GPU to high performance mode
After=multi-user.target
[Service]
Type=oneshot
ExecStart=/bin/sh -c 'echo high > /sys/class/drm/card1/device/power_dpm_force_performance_level'
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
EOF
systemctl enable --now gpu-performance
Part 7: Pull Models and Test
# Recommended models for 128GB Strix Halo
ollama pull gpt-oss:120b # OpenAI open-source MoE, ~36 tok/s
ollama pull qwen3:32b # Dense model, ~10 tok/s
ollama pull qwen3:14b # Fast, ~23 tok/s
ollama pull qwen2.5-coder:32b # Best local coding model
# Quick benchmark
curl -s http://localhost:11434/api/generate \
-d '{"model":"gpt-oss:120b","prompt":"say hello","stream":false}' | \
python3 -c "import sys,json; r=json.load(sys.stdin); print(r['response']); print(f'tok/s: {r[\"eval_count\"]/r[\"eval_duration\"]*1e9:.1f}')"
Expected Benchmarks
| Model | Parameters | Active Params | tok/s | Notes |
|---|---|---|---|---|
| gpt-oss:120b | 117B | 5.1B (MoE) | ~35-36 | OpenAI open-source |
| qwen3:32b | 32B | 32B | ~10 | |
| qwen3:14b | 14B | 14B | ~23 | |
| qwen2.5-coder:32b | 32B | 32B | ~10 | Best for coding |
Key Gotchas & Lessons Learned
These are the walls I hit. Each one cost me time. Hopefully you can skip them.
🔴 BIOS UMA Frame Buffer — Set to 2GB, not Auto
Symptom: free -h shows only ~62GB despite having 128GB installed.
"Auto" on GMKtec BIOSes statically reserves 64GB for the GPU, leaving half your RAM invisible to the OS. Set UMA Frame Buffer Size → 2GB (minimum available). The kernel GTT params handle dynamic GPU memory allocation instead.
🔴 ROCm 7.1 Won't Work — You Need 7.2
Symptom: Ollama falls back to CPU. rocminfo shows gfx1151 but inference never hits the GPU.
ROCm 7.1's rocblas library is missing gfx1151 kernels. Bootstrap validation fails silently and Ollama gives up. Install ROCm 7.2 specifically — it's the first version with proper gfx1151 support.
🔴 Ollama Bundles Old ROCm — Build from Source
Symptom: ollama serve detects GPU but model loads on CPU.
Both curl -fsSL https://ollama.com/install.sh | sh and the Docker ollama/ollama:rocm image ship with bundled ROCm 6.x libs that have no gfx1151 support. You must build Ollama from source with -DAMDGPU_TARGETS=gfx1151 AND replace the bundled libs with ROCm 7.2 versions.
🔴 Duplicate libggml-hip.so Causes Segfault
Symptom: signal arrived during cgo execution crash on model load.
Ollama installs libggml-hip.so in two places:
/usr/local/lib/ollama/libggml-hip.so← delete this one/usr/local/lib/ollama/rocm/libggml-hip.so← keep this one
Having both causes a double-load crash. Remove the root-level copy.
rm -f /usr/local/lib/ollama/libggml-hip.so
🟡 Missing cgroup Device 234 (/dev/kfd)
Symptom: Operation not permitted from HSA runtime. GPU not detected in LXC.
The Proxmox setup scripts add cgroup rules for DRM devices but miss /dev/kfd (device major 234). Add it manually to /etc/pve/lxc/100.conf:
lxc.cgroup2.devices.allow: c 234:0 rwm
🟡 pct restart Doesn't Apply Memory Changes
Symptom: free -h still shows old memory value after pct restart.
Use pct stop && pct start instead. Restart doesn't fully reinitialise cgroup memory limits.
🟡 HSA_OVERRIDE_GFX_VERSION is Mandatory
Without HSA_OVERRIDE_GFX_VERSION=11.5.1 in the environment, the ROCm HSA runtime doesn't recognise gfx1151 and refuses to initialise. Set it in the Ollama systemd override.
🟡 GPU Power State Defaults to Low
Symptom: GPU clocks stuck at ~600MHz, tok/s much lower than expected.
The GPU idles in a low power state. Set high performance mode on the Proxmox host (not inside the LXC — it's read-only there):
echo high > /sys/class/drm/card1/device/power_dpm_force_performance_level
This doesn't persist across reboots — create a systemd service to set it on boot.
🟡 Proxmox ZFS Uses proxmox-boot-tool, Not grub
Symptom: Kernel params added to /etc/default/grub have no effect after reboot.
Proxmox with ZFS root uses a different bootloader. Edit /etc/kernel/cmdline directly and apply with:
proxmox-boot-tool refresh
reboot
Multi-Agent Setup
With 128GB unified memory you can run multiple models simultaneously. Recommended routing:
| Agent | Model | Why |
|---|---|---|
| Orchestrator | claude-sonnet (API) or gpt-oss:120b | Best reasoning |
| Coder | qwen2.5-coder:32b | Best code quality |
| Fast/monitoring | qwen3:14b | Low latency for frequent tasks |
| Research | gpt-oss:120b | Large context, strong reasoning |
Point all agents at http://192.168.1.9:11434 (or your LXC IP).