AMD and Intel Unite on ACE to Supercharge AI With a New x86 Matrix-Acceleration Standard

AI Compute Extensions, better known as ACE, are shaping up to be a major step forward for AI performance on future x86 processors. Developed through collaboration between Intel and AMD, ACE is designed to dramatically speed up matrix multiplication, the core math behind modern neural networks and large language models, while improving scalability and energy efficiency across everything from everyday laptops to high-end servers and supercomputers.

ACE is part of a broader effort launched last year, when Intel and AMD joined forces to reinforce and modernize the x86 ecosystem through the x86 Ecosystem Advisory Group. The goal is straightforward: create a more unified, standardized set of capabilities across x86 so developers can count on consistent features, better compatibility, and a clearer path to future requirements. Alongside other initiatives like FRED, AVX10, and ChkTag, ACE is positioned as a key building block for x86 in the AI era.

With the newly published ACE whitepaper, Intel and AMD outline how they’ve aligned on a shared ACE instruction set approach that brings standardized matrix acceleration capabilities to x86. This alignment matters because it reduces fragmentation and helps ensure that developers can target a common model for acceleration rather than dealing with vendor-specific differences. Both companies also signal ongoing cooperation on the future roadmap for ACE and AVX10, aiming to capture new opportunities in AI as well as in other performance-heavy workloads.

At its core, ACE is intended to provide a significant boost in matrix multiply performance compared to relying purely on traditional SIMD approaches. While SIMD instruction sets such as AVX10 can already handle matrix operations, they can run into limits around scalability and compute density. Some alternative approaches can improve performance, but they may not be as efficient or as broadly applicable. ACE is designed to address those issues by adding a more flexible and scalable matrix acceleration framework that still works smoothly with AVX10, helping software scale up without forcing developers to rely on specialized external accelerators.

Intel and AMD describe ACE as a standardized matrix acceleration architecture for x86, and the feature list reflects a strong focus on the kinds of numeric formats commonly used in AI. According to the whitepaper, ACE supports native matrix multiplication for widely used data types including INT8, BF16, and industry formats such as OCP FP8, OCP MXFP8, OCP MXINT8, and OCP MXINT8. This is important because modern AI inference and training often depend on lower-precision formats to increase throughput and improve performance per watt.

One of the headline technical points is ACE’s matrix acceleration approach based on outer product operations, designed to pair with AVX10. The whitepaper claims the ACE outer product operation can deliver a 16x compute density advantage over an equivalent AVX10 multiply-accumulate operation while using the same number of input vectors. In practical terms, that’s a major step toward packing more AI work into the same execution footprint, which can translate into better performance and efficiency for AI-heavy tasks.

Because ACE extends AVX10, software enablement is already in progress, and the expectation is that it will show up through familiar layers of the AI and HPC software stack. The whitepaper points to work targeting deep learning and high-performance computing libraries (including lower-precision GEMMs and LLM primitives), popular Python-centric scientific computing tools like NumPy and SciPy, and major machine learning frameworks such as PyTorch and TensorFlow. That path is crucial for real adoption: new ISA features only matter when developers can access them through the libraries and frameworks they already use.

ACE also reflects a bigger reality in today’s computing landscape: AI performance is now a primary battleground, and general-purpose CPUs need to evolve to stay competitive. By collaborating on a unified approach to matrix acceleration, Intel and AMD are effectively working to keep x86 strong, relevant, and easier to develop for as AI workloads become increasingly mainstream.

Overall, ACE looks like a practical, ecosystem-wide attempt to bring faster and more efficient AI math directly to x86 processors, without increasing developer friction. If the promised performance density and broad software support land as intended, ACE could become one of the most important upgrades to x86 for AI workloads in years.

AMD and Intel Unite on ACE to Supercharge AI With a New x86 Matrix-Acceleration Standard

Share this:

Related Posts: