1. 📘 Topic and Domain: The paper introduces Native Parallel Reasoner (NPR), a framework for enabling Large Language Models to perform parallel reasoning, falling within the domain of artificial intelligence and language model optimization.
2. 💡 Previous Research and New Ideas: Based on previous work in parallel reasoning like Multiverse and MapReduce paradigms, it proposes a novel teacher-free approach where models self-evolve parallel reasoning capabilities without external supervision.
3. ❓ Problem: The paper addresses the challenge of enabling language models to perform genuine parallel reasoning rather than sequential emulation, while avoiding reliance on external teacher models or supervised distillation.
4. 🛠️ Methods: The paper implements a three-stage progressive training paradigm: (1) Format-follow RL to discover parallel structures, (2) Parallel warmup through self-distilled data, and (3) Native-parallel RL using a novel Parallel-Aware Policy Optimization algorithm and NPR Engine.
5. 📊 Results and Evaluation: Testing on eight reasoning benchmarks showed performance gains up to 24.5%, inference speedups up to 4.6×, and achieved 100% genuine parallel execution, with consistent improvements over baseline models like Multiverse-32B and Multiverse-4B.