Running Ollama + ROCm on AMD Strix Halo in a Proxmox LXC

I picked up a GMKtec mini PC with the AMD Ryzen AI MAX+ 395 (Strix Halo) — 128GB unified memory, Zen 5 cores, and an RDNA 3.5 GPU that eats LLMs for breakfast. The problem: gfx1151 is so new that none of the pre-built Ollama binaries or Docker images support it. You have to build from source and wrestle with ROCm 7.2.

After a few nights of pain, I got it working. 36 tok/s on gpt-oss:120b. Three models running simultaneously. Here's exactly what I did.

Overview

Strix Halo's gfx1151 GPU is too new for most pre-built Ollama binaries. This guide documents the full process to get native ROCm working in a Proxmox LXC container, including all the gotchas.

What you'll end up with:

Ollama running natively in an LXC with full GPU access
ROCm 7.2 with gfx1151 support
Models: gpt-oss:120b (~36 tok/s), qwen3:32b (~10 tok/s), qwen3:14b (~23 tok/s)
Multiple models running simultaneously

Part 1: BIOS Configuration

Before anything else, fix the BIOS memory carve-out. GMKtec defaults to reserving ~64GB as static GPU VRAM, leaving only 62GB for the OS.

Enter BIOS
Find UMA Frame Buffer Size (usually under Advanced → AMD CBS → GFX Configuration)
Set to 2GB (minimum) — do NOT leave on Auto, it defaults to huge reservation
Save and reboot

You should now see ~120GB in the OS:

free -h  # Should show ~120GB total

Part 2: Proxmox Host Configuration

2.1 Kernel Parameters

Proxmox with ZFS uses proxmox-boot-tool, not standard grub. Edit /etc/kernel/cmdline:

nano /etc/kernel/cmdline

Add iommu=pt amdgpu.gttsize=126976 ttm.pages_limit=32505856:

root=ZFS=rpool/ROOT/pve-1 boot=zfs iommu=pt amdgpu.gttsize=126976 ttm.pages_limit=32505856

Apply and reboot:

proxmox-boot-tool refresh
reboot

What these do:

Parameter	Effect
`iommu=pt`	Pass-through mode, reduces GPU memory access overhead
`amdgpu.gttsize=126976`	Sets GPU-addressable memory to ~124GB
`ttm.pages_limit=32505856`	Allows TTM to pin ~124GB of pages for GPU

2.2 udev Rules (GPU Device Permissions)

echo -e 'SUBSYSTEM=="kfd", KERNEL=="kfd", MODE="0666"\nSUBSYSTEM=="drm", KERNEL=="renderD*", MODE="0666"' | tee /etc/udev/rules.d/70-kfd.rules
udevadm control --reload-rules && udevadm trigger

2.3 GPU Performance Mode

Set GPU to high performance (note: doesn't persist across reboots — see Part 6 for fix):

echo 'high' > /sys/class/drm/card1/device/power_dpm_force_performance_level

Check your card name first:

ls /sys/class/drm/ | grep -v '\-'  # Find your card name (card0 or card1)

Part 3: LXC Container Setup

3.1 Create the LXC

Use the Proxmox setup script from eikaramba/proxmox-setup-scripts:

cd /root
git clone https://github.com/eikaramba/proxmox-setup-scripts.git
cd proxmox-setup-scripts
./guided-install.sh

Run scripts in order: 001 (tools), 003 (AMD drivers), 007 (udev rules), 031 (create LXC).

3.2 LXC Config

Edit /etc/pve/lxc/100.conf and ensure these lines are present:

memory: 122880
lxc.cgroup2.devices.allow: c 226:* rwm
lxc.cgroup2.devices.allow: c 234:0 rwm
lxc.cgroup2.devices.allow: c 235:* rwm
lxc.mount.entry: /dev/dri/by-path/pci-0000:c5:00.0-card dev/dri/card0 none bind,optional,create=file
lxc.mount.entry: /dev/dri/by-path/pci-0000:c5:00.0-render dev/dri/renderD128 none bind,optional,create=file
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file
lxc.apparmor.profile: unconfined
lxc.cap.drop:

Note: Replace 0000:c5:00.0 with your GPU's PCI address. Find it with lspci | grep VGA on the host.

Critical: c 234:0 rwm is for /dev/kfd (ROCm compute). Without this, HSA runtime gets "Operation not permitted".

After editing, restart the LXC:

pct stop 100 && pct start 100

Part 4: ROCm 7.2 in the LXC

Inside the LXC, install ROCm 7.2 (7.1 does NOT have gfx1151 rocblas kernels):

# Add ROCm 7.2 repo
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | gpg --dearmor | tee /etc/apt/keyrings/rocm.gpg > /dev/null
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/7.2 noble main" > /etc/apt/sources.list.d/rocm.list
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/graphics/7.1/ubuntu noble main" >> /etc/apt/sources.list.d/rocm.list

apt-get update
apt-get install -y rocm-dev rocm-libs rocm-smi rocminfo rocm-utils hipcc

Verify:

HSA_OVERRIDE_GFX_VERSION=11.5.1 rocminfo 2>&1 | head -20
# Should show HSA System Attributes and Agent 1

Part 5: Build Ollama from Source

The pre-built ollama/ollama:rocm Docker image and the standard Ollama install both bundle their own ROCm libraries that don't support gfx1151. You must build from source.

5.1 Install Go

wget https://go.dev/dl/go1.23.5.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.23.5.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin

5.2 Install Build Dependencies

apt-get install -y cmake git build-essential python3 zstd

5.3 Clone and Build

git clone https://github.com/ollama/ollama
cd ollama

# Install Ollama first (gets the service + base libs)
curl -fsSL https://ollama.com/install.sh | sh
systemctl stop ollama

# Build HIP library against system ROCm 7.2
cmake -B build -S . \
  -DAMDGPU_TARGETS=gfx1151 \
  -DCMAKE_HIP_COMPILER=/opt/rocm/lib/llvm/bin/clang \
  -DCMAKE_PREFIX_PATH=/opt/rocm \
  -Wno-dev

cmake --build build --parallel $(nproc) --target ggml-hip
# Takes ~5 minutes on Strix Halo (Zen 5 rips through it)

# Build the Ollama binary
export PATH=$PATH:/usr/local/go/bin
go build -o ollama-custom .

5.4 Install

# Install custom binary
cp ollama-custom /usr/local/bin/ollama

# Copy ROCm 7.2 libs into Ollama's lib dir
cp /opt/rocm/lib/libhsa-runtime64.so.1* /usr/local/lib/ollama/rocm/
cp /opt/rocm/lib/libamdhip64.so.7* /usr/local/lib/ollama/rocm/
cp /opt/rocm/lib/librocblas.so.5* /usr/local/lib/ollama/rocm/
cp -r /opt/rocm/lib/rocblas/library /usr/local/lib/ollama/rocm/rocblas/

# CRITICAL: Remove the duplicate hip lib in root dir (causes double-load crash)
rm -f /usr/local/lib/ollama/libggml-hip.so

# Copy our compiled hip lib
cp build/lib/ollama/libggml-hip.so /usr/local/lib/ollama/rocm/libggml-hip.so

5.5 Systemd Service Override

mkdir -p /etc/systemd/system/ollama.service.d
cat > /etc/systemd/system/ollama.service.d/override.conf << 'EOF'
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=11.5.1"
Environment="HSA_ENABLE_SDMA=0"
Environment="OLLAMA_HOST=0.0.0.0"
Environment="LD_LIBRARY_PATH=/usr/local/lib/ollama/rocm:/opt/rocm/lib"
EOF

systemctl daemon-reload
systemctl start ollama

5.6 Verify GPU Detection

journalctl -u ollama -n 5 | grep compute
# Should show: library=ROCm compute=gfx1151 total="160+ GiB"

Part 6: Performance Tuning

6.1 tuned Profile

apt install -y tuned
systemctl enable --now tuned
tuned-adm profile accelerator-performance

6.2 Persistent GPU Performance Mode

The power_dpm_force_performance_level resets on reboot. Create a systemd service on the Proxmox host:

cat > /etc/systemd/system/gpu-performance.service << 'EOF'
[Unit]
Description=Set GPU to high performance mode
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/bin/sh -c 'echo high > /sys/class/drm/card1/device/power_dpm_force_performance_level'
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
EOF

systemctl enable --now gpu-performance

Part 7: Pull Models and Test

# Recommended models for 128GB Strix Halo
ollama pull gpt-oss:120b      # OpenAI open-source MoE, ~36 tok/s
ollama pull qwen3:32b         # Dense model, ~10 tok/s
ollama pull qwen3:14b         # Fast, ~23 tok/s
ollama pull qwen2.5-coder:32b # Best local coding model

# Quick benchmark
curl -s http://localhost:11434/api/generate \
  -d '{"model":"gpt-oss:120b","prompt":"say hello","stream":false}' | \
  python3 -c "import sys,json; r=json.load(sys.stdin); print(r['response']); print(f'tok/s: {r[\"eval_count\"]/r[\"eval_duration\"]*1e9:.1f}')"

Expected Benchmarks

Model	Parameters	Active Params	tok/s	Notes
gpt-oss:120b	117B	5.1B (MoE)	~35-36	OpenAI open-source
qwen3:32b	32B	32B	~10
qwen3:14b	14B	14B	~23
qwen2.5-coder:32b	32B	32B	~10	Best for coding

Key Gotchas & Lessons Learned

These are the walls I hit. Each one cost me time. Hopefully you can skip them.

🔴 BIOS UMA Frame Buffer — Set to 2GB, not Auto

Symptom: free -h shows only ~62GB despite having 128GB installed.

"Auto" on GMKtec BIOSes statically reserves 64GB for the GPU, leaving half your RAM invisible to the OS. Set UMA Frame Buffer Size → 2GB (minimum available). The kernel GTT params handle dynamic GPU memory allocation instead.

🔴 ROCm 7.1 Won't Work — You Need 7.2

Symptom: Ollama falls back to CPU. rocminfo shows gfx1151 but inference never hits the GPU.

ROCm 7.1's rocblas library is missing gfx1151 kernels. Bootstrap validation fails silently and Ollama gives up. Install ROCm 7.2 specifically — it's the first version with proper gfx1151 support.

🔴 Ollama Bundles Old ROCm — Build from Source

Symptom: ollama serve detects GPU but model loads on CPU.

Both curl -fsSL https://ollama.com/install.sh | sh and the Docker ollama/ollama:rocm image ship with bundled ROCm 6.x libs that have no gfx1151 support. You must build Ollama from source with -DAMDGPU_TARGETS=gfx1151 AND replace the bundled libs with ROCm 7.2 versions.

🔴 Duplicate libggml-hip.so Causes Segfault

Symptom: signal arrived during cgo execution crash on model load.

Ollama installs libggml-hip.so in two places:

/usr/local/lib/ollama/libggml-hip.so ← delete this one
/usr/local/lib/ollama/rocm/libggml-hip.so ← keep this one

Having both causes a double-load crash. Remove the root-level copy.

rm -f /usr/local/lib/ollama/libggml-hip.so

🟡 Missing cgroup Device 234 (/dev/kfd)

Symptom: Operation not permitted from HSA runtime. GPU not detected in LXC.

The Proxmox setup scripts add cgroup rules for DRM devices but miss /dev/kfd (device major 234). Add it manually to /etc/pve/lxc/100.conf:

lxc.cgroup2.devices.allow: c 234:0 rwm

🟡 pct restart Doesn't Apply Memory Changes

Symptom: free -h still shows old memory value after pct restart.

Use pct stop && pct start instead. Restart doesn't fully reinitialise cgroup memory limits.

🟡 HSA_OVERRIDE_GFX_VERSION is Mandatory

Without HSA_OVERRIDE_GFX_VERSION=11.5.1 in the environment, the ROCm HSA runtime doesn't recognise gfx1151 and refuses to initialise. Set it in the Ollama systemd override.

🟡 GPU Power State Defaults to Low

Symptom: GPU clocks stuck at ~600MHz, tok/s much lower than expected.

The GPU idles in a low power state. Set high performance mode on the Proxmox host (not inside the LXC — it's read-only there):

echo high > /sys/class/drm/card1/device/power_dpm_force_performance_level

This doesn't persist across reboots — create a systemd service to set it on boot.

🟡 Proxmox ZFS Uses proxmox-boot-tool, Not grub

Symptom: Kernel params added to /etc/default/grub have no effect after reboot.

Proxmox with ZFS root uses a different bootloader. Edit /etc/kernel/cmdline directly and apply with:

proxmox-boot-tool refresh
reboot

Multi-Agent Setup

With 128GB unified memory you can run multiple models simultaneously. Recommended routing:

Agent	Model	Why
Orchestrator	claude-sonnet (API) or gpt-oss:120b	Best reasoning
Coder	qwen2.5-coder:32b	Best code quality
Fast/monitoring	qwen3:14b	Low latency for frequent tasks
Research	gpt-oss:120b	Large context, strong reasoning

Point all agents at http://192.168.1.9:11434 (or your LXC IP).

References

strix-halo-toolboxes.com — Donato Capitella's toolboxes
github.com/kyuz0/amd-strix-halo-toolboxes
strixhalo.wiki
tinycomputers.io — ROCm 7.2 upgrade guide
AMD Strix Halo system optimization docs
eikaramba/proxmox-setup-scripts