1. 📘 Topic and Domain: Development of large language models (P1) specialized in physics reasoning and solving Physics Olympiad problems, in the domain of artificial intelligence and scientific reasoning.
2. 💡 Previous Research and New Ideas: Based on recent advances in LLMs for scientific reasoning, introduces new reinforcement learning techniques for physics problem-solving and proposes a novel multi-stage training framework with adaptive learnability adjustment.
3. ❓ Problem: Addresses the challenge of developing open-source language models capable of mastering complex physics problems at the Olympiad level, requiring deep scientific reasoning rather than simple pattern matching.
4. 🛠️ Methods: Employs reinforcement learning with Group Sequence Policy Optimization (GSPO), adaptive learnability adjustment, and test-time scaling through an agentic framework called PhysicsMinions.
5. 📊 Results and Evaluation: P1-235B-A22B achieved gold-medal performance at IPhO 2025, winning 12 gold medals out of 13 competitions, while P1-30B-A3B earned silver medal performance, surpassing most open-source models, with further improvements when combined with PhysicsMinions.