Two NVIDIA GPUs are placed on a QNAP storage unit, with a monitor displaying a desert scene in the background.

QNAP’s New Edge AI NAS Marries a 6-Year-Old Zen 2 EPYC with NVIDIA’s 96GB RTX PRO 6000 Blackwell GPU

QNAP is stepping deeper into on-premises AI with its newest AI-focused NAS, the QAI-h1290FX, built to handle modern workloads like LLM inference, RAG (retrieval-augmented generation), and a wide range of generative AI applications at the edge. Instead of relying on cloud compute, this system is positioned as a self-contained AI server that also functions as high-speed, scalable storage—ideal for teams that want faster workflows and tighter control over sensitive data.

A notable part of the QAI-h1290FX story is its “old meets new” hardware mix. On the CPU side, it runs an AMD EPYC 7302P with 16 cores and 32 threads based on the Zen 2 architecture. While it isn’t the newest EPYC generation, it delivers server-class parallel performance that fits edge inference, virtualization, and multi-user workloads where steady throughput matters.

Where the system can dramatically scale up is the GPU. QNAP offers two NVIDIA Blackwell workstation options, depending on the size of the models you want to run locally. Buyers can choose the RTX PRO 4500 Blackwell with 32GB of VRAM, aimed at running LLMs up to roughly the 30B range, or step up to the flagship RTX PRO 6000 Blackwell with a massive 96GB of VRAM for larger deployments—including 70B+ class models. For AI users, that VRAM headroom can be the difference between running a model smoothly on-prem or having to compromise with heavier quantization, smaller context windows, or offloading to cloud resources.

Storage and networking are clearly built for AI data pipelines. The QAI-h1290FX uses an all-flash design with 12 U.2 bays that support NVMe and SATA SSDs, targeting the high I/O demands that come with frequent model loading, vector database lookups, and continuous data streaming. On the network side, it includes dual 25GbE ports plus dual 2.5GbE ports, with PCIe expansion available for optional 100GbE upgrades. For organizations that need to scale capacity, it also supports JBOD expansion enclosures for large AI datasets and long-term storage growth.

QNAP is also emphasizing usability for AI deployment. The system supports container environments such as Docker and LXD, along with GPU resource management designed to simplify allocation. The idea is to let users spin up AI tools through an app center and assign GPU capacity without having to live in command-line configuration—useful for teams building internal chat assistants, document search, and knowledge-base tools that need to stay local for privacy or compliance reasons.

To show what the high-end configuration can do, QNAP shared performance data using the RTX PRO 6000 Blackwell 96GB model. In testing across several popular LLMs, throughput reached as high as 172 tokens per second depending on the model and settings. Example results included approximately 90 tokens/sec for gpt-oss:120b (MXFP4) using about 63GB of VRAM, 24 tokens/sec for deepseek-r1:70b (q4_K_M) at around 41GB, and higher speeds with smaller models such as qwen3:8b (q4_K_M) at about 172 tokens/sec using roughly 7GB.

QNAP also provided concurrent inference numbers using vLLM for the same general configuration. With deepseek-ai/DeepSeek-R1-Distill-Qwen-7B, throughput scaled with thread count, peaking around 850 total tokens/sec at 50 threads (with lower average tokens per thread as concurrency rose). For openai/gpt-oss-20b, shared results showed up to around 1045 total tokens/sec at 5 threads, with different efficiency characteristics as thread counts increased.

For expansion and customization, QNAP notes support for additional storage, networking, and interface add-in cards sold separately. Memory is also sold separately, with DDR4-3200 options ranging from 8GB modules up to 64GB kits, allowing buyers to tailor the system to virtualization needs, heavier indexing, or larger on-device workflows.

Pricing is positioned firmly in the enterprise/workstation tier. The QAI-h1290FX is listed at $8,999 for the 64GB model, $13,499 for 128GB, and $15,999 for 256GB, and it includes a 5-year warranty. For organizations looking to run generative AI locally—especially those building RAG pipelines, private AI assistants, or internal search across sensitive documents—the QAI-h1290FX is clearly designed as a “bring the AI to your data” platform, combining high-speed all-flash storage with optional Blackwell-class GPU acceleration.