1. 📘 Topic and Domain: Efficient speculative decoding for large language models through selective knowledge distillation.
2. 💡 Previous Research and New Ideas: Based on conventional knowledge distillation and speculative decoding methods, proposes a novel selective token filtering approach for more efficient knowledge transfer.
3. ❓ Problem: Addresses the inefficiency in traditional knowledge distillation methods where draft models struggle to fully assimilate target model knowledge due to capacity constraints.
4. 🛠️ Methods: Introduces AdaSPEC, a two-phase approach using reference model distillation to identify hard-to-fit tokens, then selectively distilling knowledge on easier tokens to the draft model.
5. 📊 Results and Evaluation: Consistently outperformed state-of-the-art DistillSpec method across diverse tasks (arithmetic, instruction-following, coding, summarization), achieving up to 15% higher acceptance rates with model configurations of 31M/1.4B and 350M/2.7B parameters.