1. 📘 Topic and Domain: The paper presents LongCodeZip, a context compression framework for code language models, focusing on efficient processing of long programming code contexts.
2. 💡 Previous Research and New Ideas: Based on existing context compression methods like LLMLingua and code-specific approaches, it introduces a novel two-stage compression strategy specifically designed for code, considering code structure and dependencies.
3. ❓ Problem: The paper addresses the challenge of handling long code contexts in language models, where processing extensive codebases leads to high API costs, increased latency, and degraded performance.
4. 🛠️ Methods: Uses a dual-stage approach: (1) coarse-grained compression to select relevant functions using conditional perplexity, and (2) fine-grained compression that segments functions into blocks and selects optimal subsets under token budgets.
5. 📊 Results and Evaluation: Achieves up to 5.6× compression ratio while maintaining performance across multiple tasks (code completion, summarization, question answering), consistently outperforming baselines and reducing generation time from 15.7s to 6.6s.