1. 📘 Topic and Domain: The paper focuses on entropy-adaptive fine-tuning of large language models, specifically addressing the issue of catastrophic forgetting during model adaptation across mathematical, medical, and agent domains.
2. 💡 Previous Research and New Ideas: Based on research showing that on-policy Reinforcement Learning preserves general capabilities better than Supervised Fine-Tuning (SFT), the paper proposes using entropy as a novel gating mechanism to identify and handle "confident conflicts" during training.
3. ❓ Problem: The paper addresses catastrophic forgetting in Supervised Fine-Tuning, where models lose their general capabilities while adapting to specific domains.
4. 🛠️ Methods: The authors developed Entropy-Adaptive Fine-Tuning (EAFT), which uses token-level entropy as a gating mechanism to modulate training loss, down-weighting destructive updates from conflicting data while maintaining learning from uncertain samples.
5. 📊 Results and Evaluation: EAFT matched or exceeded baseline performance on target tasks while significantly reducing catastrophic forgetting across multiple model families (Qwen, GLM) and scales (4B-32B parameters), demonstrating effectiveness in mathematical, medical, and agent domains.