My first impressions on ROCm and Strix Halo

What they tried
A developer posted their first impressions of running ROCm on an ASUS Strix Halo system, using Ubuntu 24.04 LTS and the official ROCm instructions. It has been reported that PyTorch initially could not find the GPU and that a BIOS update was required — allegedly the board's firmware connected to Wi‑Fi and downloaded the update automatically. The end result: the author says they now have 128 GB of system memory effectively shared between CPU and GPU. Handy, right?
Tuning memory and GRUB
Getting the system stable meant fiddling with BIOS reserved‑video settings and the Linux boot options. The blog recommends keeping reserved video memory small (as low as 512 MB) and letting the bulk be shared via the GTT, and warns that mixing Reserved + GTT can cause fragmentation and addressing overhead. The author changed GRUB to include the line GRUB_CMDLINE_LINUX_DEFAULT="quiet splash ttm.pages_limit=32768000 amdgpu.gttsize=114688" and ran update-grub, noting you should leave several gigabytes free for the CPU to avoid kernel instability. Some legacy titles might misread available GPU memory as the small reserved chunk; no breakage was reported so far.
Software plumbing: PyTorch, UV and Llama.cpp
Installing PyTorch on ROCm proved fiddly because of its dependency graph. The author settled on torch==2.11.0+rocm7.2 with triton-rocm and a uv/uvx-based index configuration to pull ROCm wheels from PyTorch's ROCm index. They also shared an alias that launches IPython and prints ROCm visibility checks. On the model side, it has been reported that they ran llama.cpp in a Podman container with /dev/kfd and /dev/dri exposed, used HSA_OVERRIDE_GFX_VERSION, and served Qwen3.6 after converting the HF model to gguf with llama.cpp's conversion script. In short: containerized LLM serving on ROCm — it works, but expect some plumbing.
Why this matters
This write‑up isn't a polished how‑to so much as a practical, "I got it working" snapshot. For people trying to host large models locally on AMD hardware, it's hopeful news: ROCm ecosystems are getting usable, and the Strix Halo hardware can share a lot of RAM if you accept a bit of tweaking. Pain points remain — firmware, boot flags, dependency wrangling — but when it clicks, you can run modern LLM stacks without renting the cloud. Who doesn't like saving a few bucks and keeping the model at home?
Sources: marcoinacio.com, Hacker News
Comments