Anthropic, the company behind Claude AI, is reportedly in early discussions with a UK startup called Fractile, exploring technology that could dramatically accelerate AI inference while cutting operating costs. If these talks move forward, they could mark an important step in Anthropic’s broader effort to secure more efficient compute as demand for AI services continues to surge.
Today, Anthropic runs its AI infrastructure using chips and cloud capacity from multiple major providers, including NVIDIA, Google, and Amazon. That diversified approach helps reduce the risk and bottlenecks that can come from depending on a single vendor. But as inference workloads grow—driven by real-time AI assistants, enterprise deployments, and increasing model complexity—many AI companies are looking beyond standard off-the-shelf solutions. The goal is simple: achieve faster inference, lower latency, and better cost efficiency at scale, often through custom or semi-custom silicon strategies.
Fractile has been gaining attention for what it calls a Memory Compute Fusion Architecture. The core idea is to reduce how often data needs to travel back and forth to external DRAM. By keeping more of the data movement and processing inside the chip itself—and leaning heavily on SRAM—this approach aims to reduce off-chip memory dependence, which is a common performance and cost constraint for AI inference. Less data shuttling can translate into lower latency, better throughput, and more efficient power usage, especially for inference scenarios where responsiveness and cost per query matter.
This concept mirrors a wider industry trend: packing large amounts of SRAM close to compute and pairing it with extremely high bandwidth. One well-known example of this approach is the use of LPUs built to accelerate inference by emphasizing on-chip memory and bandwidth at rack scale. In such designs, SRAM capacity and bandwidth become key differentiators for low-latency performance in real-world inference deployments.
Fractile claims its architecture could deliver a 100x speedup for AI inferencing and reduce costs by 10x compared with some leading SRAM-heavy inference solutions. Those are eye-catching targets, and the team’s background adds credibility, with engineers reportedly coming from established names across the AI and graphics hardware ecosystem. However, it’s worth noting that Fractile has not yet produced test chips, meaning the technology is still early and these performance and cost figures remain aspirational until silicon is built and validated.
For Anthropic, early talks with a startup like Fractile could be a way to explore future in-house chip development or specialized inference hardware partnerships. Even with ongoing reliance on external chipmakers and large-scale compute agreements, the pressure to optimize inference—speed, cost, and scalability—keeps growing. Reports also suggest Anthropic’s compute mix may expand further as it continues to broaden its supplier base and infrastructure options.
If these discussions progress, they could signal a deeper push by Anthropic toward purpose-built inference acceleration—an area that is quickly becoming one of the most competitive battlegrounds in AI, where milliseconds and dollars per million tokens can define the winners.






