Microsoft Unveils Shader Model 6.10 and Agility SDK 720 Preview, Bringing Next-Gen DX12 Features to the Neural Rendering Era

Microsoft has rolled out a new preview of DirectX 12 updates with Shader Model 6.10 alongside AgilitySDK 1.720, giving developers more tools to push real-time graphics, ray tracing, and GPU-accelerated compute forward. This preview builds on the earlier wave of improvements that arrived with Shader Model 6.9 and DXR 1.2 in the AgilitySDK 1.619 era, but the latest drop adds several meaningful features that target modern rendering, machine learning-assisted visuals, and better GPU scheduling.

What’s new in this preview starts with Shader Model 6.10 (delivered through DXC 1.10.2605.2). The headline addition is linalg::Matrix, part of a broader Linear Algebra (LinAlg) effort. These Matrix APIs are designed to cover a wide range of use cases, with Microsoft positioning them as a way to efficiently run neural rendering techniques directly inside individual shader threads in real-time pipelines. At the same time, they’re meant to take advantage of higher-bandwidth matrix MMA operations for machine learning and image processing workloads, unifying graphics and ML-style math under one API approach.

Shader Model 6.10 also introduces Group Wave Index, adding two new intrinsics: GetGroupWaveIndex() and GetGroupWaveCount(). These functions expose wave-level structure within a thread group for compute shaders, mesh shaders, amplification shaders, and node shaders. GetGroupWaveIndex() returns which wave is currently executing (from 0 to N-1), while GetGroupWaveCount() reports how many waves are active in the group. The big win here is portability and correctness: instead of relying on risky “math tricks” such as dividing SV_GroupIndex by WaveGetLaneCount() (which can fail depending on hardware), developers can now write one code path that works reliably across different wave sizes. That opens the door to cleaner wave-level cooperation and specialization without fragile assumptions.

Another major change is Variable Group Shared Memory. Shader Model 6.10 removes the long-standing shared memory ceiling of 32 KB (and 28 KB for mesh shaders) by exposing the real hardware limit via a new runtime query called MaxGroupSharedMemoryPerGroup. Developers can then declare their shader’s maximum shared memory needs using a new [GroupSharedLimit()] attribute at the entry point. This gives the compiler a portability check at compile time while letting modern GPUs use more of their actual capability. Importantly, older shaders that don’t use the new attribute continue to follow the legacy validation limits, so existing projects shouldn’t break. In practical terms, this update can enable techniques that were previously boxed in by specification limits rather than GPU hardware—think large tile culling, more flexible software rasterization binning approaches, and bigger matrix-oriented workloads.

Beyond Shader Model changes, the DX12 feature set also gains Batched Asynchronous Command List APIs. Traditionally, various D3D12 operations like CopyBufferRegion, ClearUnorderedAccessViewFloat/Uint, ResolveSubresource, and more tend to behave as if they must run strictly in sequence, because the older barrier model can’t express dependencies between operations of the same type (for example, copy-dest to copy-dest). The practical result is unnecessary GPU stalls even when commands are touching completely separate memory regions. The new batched async command list functionality aims to reduce that forced serialization by introducing new command list methods that loosen the implicit “everything is sequential” contract. The driver and hardware can then overlap independent work inside a batch, while developers explicitly synchronize only when there’s a real hazard—such as overlapping writes to the same buffer region—using enhanced barriers.

Hardware support is already lined up across the major GPU players, with NVIDIA, AMD, and Intel all participating in the preview rollout. Support levels vary by feature and architecture, which is typical for early API expansions. NVIDIA is positioned to support most features broadly across its RTX lineup. AMD and Intel are focusing some feature enablement around their newer hardware generations, such as Intel Arc B-Series and AMD’s RDNA 4-based Radeon RX 9000 family.

Here’s how support breaks down in the preview information provided:

linalg::Matrix is supported on NVIDIA RTX hardware, supported on AMD’s Radeon RX 9000 series, and planned for a future Intel release.

Group Wave Index is supported on AMD Radeon RX 7000 (RDNA 3) and RX 9000 (RDNA 4), supported on Intel Arc B-Series, and planned for a future NVIDIA release.

Variable Group Shared Memory is supported on NVIDIA RTX hardware (with limits differing by hardware), supported on AMD Radeon RX 7000 and RX 9000, and supported on Intel Arc B-Series—though Intel is currently limited to the default memory size, with expanded limits planned in future drivers.

Ray tracing intrinsics, including TriangleObjectPositions and ClusterID, are supported on AMD Radeon RX 7000 and RX 9000, supported on Intel Arc B-Series, and supported across NVIDIA RTX hardware.

Batched Asynchronous Command List APIs are supported on AMD Radeon RX 7000 and RX 9000, supported on Intel Arc B-Series, and supported on NVIDIA RTX hardware.

For developers and enthusiasts tracking DirectX 12 progress, this preview is notable because it tackles three pressure points at once: better matrix/linear algebra tools for neural and ML-adjacent graphics workloads, more reliable wave-aware programming for modern shader stages, and command submission improvements designed to reduce artificial GPU stalls. As drivers mature and vendor support expands feature-by-feature, these additions could translate into more efficient rendering pipelines, more flexible compute strategies, and smoother performance scaling on next-generation PC graphics hardware.

Microsoft Unveils Shader Model 6.10 and Agility SDK 720 Preview, Bringing Next-Gen DX12 Features to the Neural Rendering Era

Share this:

Related Posts: