Apple & NVIDIA Collaborate on 'ReDrafter' for Faster LLM Text Generation
Apple and NVIDIA have collaborated on 'ReDrafter,' a new technique to speed up text generation with large language models, integrated into NVIDIA
Despite Apple's preference for its own silicon in AI tasks, the company has collaborated with NVIDIA to develop 'ReDrafter,' a new technique that speeds up text generation with large language models (LLMs). This collaboration highlights a shared goal of improving LLM performance, despite the complex history between the two tech giants. 'ReDrafter' Technique Apple's open-sourced 'ReDrafter' combines beam search and tree attention to enhance text generation performance. This technique was then integrated into NVIDIA's TensorRT-LLM, a tool designed to accelerate LLMs on NVIDIA GPUs. This integration improves speed and reduces latency, while also decreasing power consumption. "This research work demonstrated strong results, but its greater impact comes from being applied in production to accelerate LLM inference... ML developers using NVIDIA GPUs can now easily benefit from ReDrafter’s accelerated token generation for their production LLM application…