AMD-135: Pioneering Speculative Decoding in Small Language Models for Technological Advancement

AMD recently entered the artificial intelligence landscape with the introduction of its first small language model, AMD-135M. This innovative model leverages speculative decoding, aiming to enhance AI capabilities and improve the technological process.

In recent years, large language models (LLMs) such as GPT-4 have gained substantial recognition for their prowess in natural language processing. However, small language models (SLMs) are also proving to be invaluable, offering unique advantages for various applications. AMD’s debut small language model, AMD-135M, showcases the company’s commitment to advancing AI in a way that is inclusive, ethical, and innovative.

The AMD-135M model includes two variants: AMD-Llama-135M and AMD-Llama-135M-code. Both were trained from the ground up using AMD Instinct™ MI250 accelerators. The AMD-Llama-135M variant was pre-trained with a whopping 670 billion tokens of general data over six days, utilizing four MI250 nodes. Meanwhile, AMD-Llama-135M-code underwent additional fine-tuning with 20 billion tokens of code data, taking four more days on the same hardware.

In a move to foster collaboration and innovation, AMD has open-sourced the training code, dataset, and model weights. This allows developers to reproduce the model and contribute to the training of other SLMs and LLMs.

Traditional large language models employ an autoregressive approach for inference, which generates one token per forward pass. This method often leads to inefficient memory access and slower overall performance. Speculative decoding, however, revolutionizes this process. It employs a smaller draft model to generate a batch of candidate tokens, which are subsequently verified by a larger target model. This method significantly reduces memory consumption and enhances inference speed.

When tested using the AMD-Llama-135M-code draft model for CodeLlama-7b on the MI250 accelerator and Ryzen™ AI processor with Neural Processing Unit (NPU), substantial performance improvements were noted. Speculative decoding accelerated the inference performance, showcasing its efficiency over traditional methods. The AMD-135M model exemplifies a comprehensive process, combining both training and inferencing on select AMD platforms.

By developing AMD-135M, AMD not only steps into the AI arena but also sets the stage for more efficient and collaborative advancements in artificial intelligence technology. The future of AI looks promising as we witness the blending of powerful hardware with innovative AI models, ensuring the benefits are far-reaching and the challenges are collectively addressed.

AMD-135: Pioneering Speculative Decoding in Small Language Models for Technological Advancement

Share this:

Related Posts: