AMD may be preparing the next big step in CPU cache design, and it goes beyond the well-known 3D V-Cache approach. A newly published AMD research paper and patent filing (US20260003794A1), titled “Balanced Latency Stacked Cache,” outlines methods for stacking L2 cache on future chips while keeping latency competitive—and in some cases, even improving it versus traditional designs.
Most PC enthusiasts already associate AMD’s cache stacking with 3D V-Cache, which adds extra L3 cache by placing a cache die above or below the compute chiplet. That technology has appeared across multiple product tiers, from Ryzen desktop chips to high-end EPYC server processors. Now, AMD’s latest work suggests the company is investigating stacked cache designs that extend beyond L3, with L2 cache stacking positioned as a potential next move.
The patent describes a stacked cache system built from a first cache die and at least a second cache die, arranged vertically. In one illustrative example, AMD shows a base die connected to a compute die and a cache die, with an additional compute and cache die stacked on top. The example cache module includes four 512KB regions, totaling 2MB of L2 cache, along with cache control circuitry (CCC) to manage data movement in and out of the cache. The design is also presented as scalable, with diagrams indicating configurations that can expand further, including examples up to 4MB.
What makes this concept especially interesting is how AMD aims to avoid a traditional drawback of larger caches: extra latency caused by longer wiring and additional routing stages. Instead of routing connections around the cache, the design places vertical silicon vias in the center of the stacked cache system. This central routing is intended to create “balanced latency,” meaning access times remain consistent across different sections (or halves) of the cache stack. By shortening signal travel distance and minimizing additional wire stages—often described as extra pipeline stages in conventional planar cache layouts—AMD argues it can keep response times low even as cache capacity grows.
AMD’s own comparison in the document highlights the potential payoff. Using a planar (non-stacked) design as an example, a conventional 1MB L2 cache is described as having a typical latency of 14 cycles. Under the proposed stacked approach, a 1MB stacked L2 cache could reduce that to 12 cycles. In other words, the research suggests stacked L2 cache is not just about adding more cache—it may also help achieve similar or better latency than standard single-die cache layouts.
Power efficiency is another major claim. According to the paper, completing cache accesses in fewer cycles can reduce the time the cache must remain active, allowing it to return to idle states sooner. Shorter wire lengths also mean lower capacitance, which can lower power draw and reduce heat generation. With signals traveling less distance and facing less electrical loading, AMD suggests the design can improve data transfer performance while conserving energy.
Of course, a patent and research paper don’t guarantee a near-term product launch, and it could take years before stacked L2 cache appears in shipping hardware. Still, with AMD’s established track record of commercializing stacked cache technology through 3D V-Cache, this new “balanced latency” approach strongly hints at where future AMD CPU—and possibly GPU—architectures could be headed.






