2025-10-06 Papers

1/2

Paper 1

LongCodeZip: Compress Long Context for Code Language Models

Published: 2025-09-30

Link: http://arxiv.org/pdf/2510.00446

1. 📘 Topic and Domain: The paper presents LongCodeZip, a context compression framework for code language models, focusing on efficient processing of long programming code contexts.

2. 💡 Previous Research and New Ideas: Based on existing context compression methods like LLMLingua and code-specific approaches, it introduces a novel two-stage compression strategy specifically designed for code, considering code structure and dependencies.

3. ❓ Problem: The paper addresses the challenge of handling long code contexts in language models, where processing extensive codebases leads to high API costs, increased latency, and degraded performance.

4. 🛠️ Methods: Uses a dual-stage approach: (1) coarse-grained compression to select relevant functions using conditional perplexity, and (2) fine-grained compression that segments functions into blocks and selects optimal subsets under token budgets.

5. 📊 Results and Evaluation: Achieves up to 5.6× compression ratio while maintaining performance across multiple tasks (code completion, summarization, question answering), consistently outperforming baselines and reducing generation time from 15.7s to 6.6s.

LongCodeZip: Compress Long Context for Code Language Models

1/2

Paper 2

Apriel-1.5-15b-Thinker

Published: 2025-10-01

Link: http://arxiv.org/pdf/2510.01141

1. 📘 Topic and Domain: The paper presents Apriel-1.5-15B-Thinker, a 15-billion parameter open-weights multimodal reasoning model in the domain of artificial intelligence and large language models.

2. 💡 Previous Research and New Ideas: Based on Pixtral-12B architecture, it introduces a novel three-stage training methodology focusing on mid-training design rather than massive scale, challenging the conventional approach that bigger models are always better.

3. ❓ Problem: The paper addresses the challenge of creating high-performing multimodal AI models that can achieve frontier-level reasoning capabilities while remaining computationally efficient enough to run on a single GPU.

4. 🛠️ Methods: The authors use a three-stage approach: depth upscaling of the base model, staged continual pre-training for foundational and visual reasoning, and high-quality supervised fine-tuning with explicit reasoning traces.

5. 📊 Results and Evaluation: The model achieves a score of 52 on the Artificial Analysis Intelligence Index, matching larger models like DeepSeek-R1-0528, and performs within 5 points of Gemini-2.5-Flash and Claude Sonnet-3.7 across ten image benchmarks, despite its smaller size.

Apriel-1.5-15b-Thinker

1/2

Paper 3

StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?

Published: 2025-10-02

Link: http://arxiv.org/pdf/2510.02209

1. 📘 Topic and Domain: The paper introduces StockBench, a benchmark for evaluating Large Language Model (LLM) agents' ability to trade stocks profitably in real-world markets within the financial domain.

2. 💡 Previous Research and New Ideas: Based on previous financial benchmarks that focused mainly on static question-answering tasks, this paper proposes a new dynamic benchmark that tests LLMs' actual trading capabilities in realistic market conditions.

3. ❓ Problem: The paper addresses the gap between existing financial benchmarks that only test static knowledge and the need to evaluate LLMs' ability to make continuous, profitable trading decisions in dynamic market environments.

4. 🛠️ Methods: The authors created a contamination-free benchmark with daily market signals (prices, fundamentals, news) where LLM agents make sequential buy/sell/hold decisions over multiple months, evaluated using financial metrics like cumulative return and Sortino ratio.

5. 📊 Results and Evaluation: Most LLM agents struggled to outperform a simple buy-and-hold baseline, though some models showed potential for higher returns and better risk management, with Kimi-K2 and Qwen3-235B-Ins performing best among tested models.