1. 📘 Topic and Domain: Development of VibeThinker-1.5B, a small language model for logical reasoning in mathematics and coding, challenging the assumption that large models are necessary for strong reasoning capabilities.
2. 💡 Previous Research and New Ideas: Based on OpenAI's o1 model's reasoning paradigm and various large language models, proposing a novel "Spectrum-to-Signal Principle" that enables small models to achieve comparable reasoning abilities to larger models.
3. ❓ Problem: Addressing the industry assumption that scaling model parameters is essential for enhancing logical reasoning capabilities, aiming to achieve comparable performance with a much smaller and cost-effective model.
4. 🛠️ Methods: Implemented a two-stage approach: "Two-Stage Diversity-Exploring Distillation" for SFT phase to generate diverse solutions, followed by "MaxEnt-Guided Policy Optimization" for RL phase to amplify correct reasoning paths.
5. 📊 Results and Evaluation: VibeThinker-1.5B outperformed larger models on mathematical benchmarks (AIME24: 80.3, AIME25: 74.4, HMMT25: 50.4) and coding tasks (LiveCodeBench V6: 51.1), surpassing models 400 times larger while costing only $7,800 to train.