1. 📘 Topic and Domain: Medical large language models (LLMs) for clinical decision support, specifically focusing on transforming passive question-answering systems into active clinical-grade decision-making partners.
2. 💡 Previous Research and New Ideas: Building upon existing medical LLMs like GPT-5.2 and previous Baichuan models (M1, M2), the paper proposes a unified framework that integrates clinical inquiry with reliable reasoning through a three-stage training pipeline combining task-specific reinforcement learning, offline policy distillation, and multi-teacher online policy distillation.
3. ❓ Problem: Current medical LLMs fail to maintain evidence-grounded and uncertainty-aware responses in open-ended clinical interactions, exhibiting "inquiry inertia" (lacking agency to elicit missing evidence) and struggling with hallucination control during long-horizon medical decision-making.
4. 🛠️ Methods: The paper employs Segmented Pipeline RL with Step-Penalized Advantage with Relative baseline (SPAR) algorithm for multi-stage clinical workflows, Dynamic Rubric Evolution for reward optimization, and Fact-Aware Reinforcement Learning with semantic claim verification for hallucination suppression.
5. 📊 Results and Evaluation: Baichuan-M3 achieves state-of-the-art performance with 44.4 on HealthBench-Hard (outperforming GPT-5.2), top scores on ScanBench across Clinical Inquiry (74.9), Laboratory Testing (72.1), and Diagnosis (74.4), and the lowest hallucination rate of 3.5% among compared models.