SoftBank is taking a fresh swing at making AMD’s Instinct AI GPUs more effective for modern AI workloads, and the approach is all about smarter sharing of the hardware you already have. Instead of treating one GPU as a single, monolithic resource, SoftBank has built an in-house orchestrator that can split an AMD Instinct GPU into multiple “logical” GPUs, then assign those slices to different AI jobs based on what each task actually needs.
The key idea behind this effort is GPU partitioning. With partitioning enabled, a single AMD Instinct GPU can be divided so it behaves like multiple separate devices. That lets software allocate compute capacity more precisely depending on factors like model size, how many users or requests are running at once (concurrency), and how much performance headroom is available at any given moment.
SoftBank says the orchestrator was developed in collaboration with AMD and is designed to take advantage of Instinct’s partitioning capabilities to improve flexibility and efficiency when running AI applications. In practical terms, this means lighter jobs don’t need to hog an entire GPU, while heavier jobs can be given larger slices—or even an entire device—when required.
Under the hood, SoftBank’s orchestrator focuses on distributing compute within the GPU by separating workloads across multiple GPU instances that run on individual Accelerator Complex Dies (XCDs). This design can be configured in different modes depending on how fine-grained the split needs to be. For example, a single instance setup can keep the GPU as one large pool, while a more segmented configuration can scale up to eight instances, increasing granularity as you add more partitions.
It’s not just compute being divided, either. The orchestrator also taps into the GPU’s high-bandwidth memory setup by splitting memory into distinct HBM regions assigned per GPU instance. That means each partition can have its own memory pool, which helps make the slices behave more like truly separate devices rather than multiple tasks fighting over the same memory resources.
One of SoftBank’s biggest goals here is tighter, lower-level control over how GPU resources are assigned, while also maintaining strict hardware-level isolation. That isolation matters because shared accelerators can sometimes introduce unpredictable latency spikes when workloads interfere with each other. By enforcing clearer boundaries between partitions, SoftBank is aiming for more consistent performance—especially in environments where many AI tasks run side by side.
SoftBank hasn’t shared benchmarks or performance numbers yet, but it claims the setup enables “optimal resource allocation,” with particular usefulness for smaller and mid-sized AI workloads (often discussed as SLM and MLM use cases), where right-sizing GPU resources can make a major difference in utilization and cost efficiency.
For now, SoftBank’s implementation is focused on AMD Instinct GPUs, though the company also signaled interest in exploring similar orchestrator concepts for other AI accelerators in the future. If this approach delivers on its goals, it could be a meaningful step toward making AI infrastructure more adaptable—letting organizations squeeze more real-world throughput out of expensive GPU hardware without sacrificing workload predictability.






