1. 📘 Topic and Domain: The paper presents InCoder-32B, a code foundation model specifically designed for industrial programming scenarios including chip design, GPU kernel optimization, embedded systems, compiler optimization, and 3D modeling.
2. 💡 Previous Research and New Ideas: The paper builds on existing code LLMs like DeepSeek, Qwen, and Claude series but proposes the first unified model addressing industrial code intelligence gaps by introducing domain-specific training with hardware-aware data and execution-grounded verification.
3. ❓ Problem: The paper addresses the significant performance degradation of existing code LLMs in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints.
4. 🛠️ Methods: The authors employ a three-stage Code-Flow pipeline: pre-training with curated industrial code data, mid-training with progressive context extension (8K to 128K tokens) using synthetic industrial reasoning data, and post-training with execution-grounded verification across reconstructed industrial environments.
5. 📊 Results and Evaluation: InCoder-32B achieves competitive performance on general code benchmarks (74.8% on SWE-bench Verified, 49.14% on LiveCodeBench) while establishing the strongest open-source results across all industrial domains, outperforming larger models on specialized tasks like CAD-Coder and KernelBench.