2026-01-19 Papers

1/2

Paper 1

The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents

Published: 2026-01-16

Link: http://arxiv.org/pdf/2601.11496

1. 📘 Topic and Domain: Strategic manipulation of AI-mediated economic markets through technology expansion, focusing on game theory and market design.

2. 💡 Previous Research and New Ideas: Based on classical game theory models (bargaining, negotiation, persuasion); introduces new concept of "Poisoned Apple Effect" where releasing unused technology can manipulate market regulation.

3. ❓ Problem: Addresses how expanding available AI technologies in regulated markets can be exploited to manipulate regulatory outcomes and market equilibrium.

4. 🛠️ Methods: Used GLEE dataset to simulate 580,000 strategic decisions across 13 LLMs in three game types (bargaining, negotiation, persuasion), analyzing meta-game equilibria before and after technology expansion.

5. 📊 Results and Evaluation: Found that in ~33% of cases, technology expansion caused opposite payoff changes between players without the new technology being used, proving strategic manipulation possible; regulatory metrics worsened in 40% of cases when market design remained static.

The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents

1/2

Paper 2

Your Group-Relative Advantage Is Biased

Published: 2026-01-13

Link: http://arxiv.org/pdf/2601.08521

1. 📘 Topic and Domain: Group-based reinforcement learning for training large language models (LLMs) on reasoning tasks, specifically focusing on advantage estimation in Reinforcement Learning from Verifier Rewards (RLVR).

2. 💡 Previous Research and New Ideas: Based on GRPO (Group Relative Policy Optimization) and its variants, proposing a new finding that group-relative advantage estimation is inherently biased.

3. ❓ Problem: The systematic bias in group-relative advantage estimation that underestimates advantages for hard prompts and overestimates them for easy prompts, leading to imbalanced exploration and exploitation.

4. 🛠️ Methods: Introduced History-Aware Adaptive Difficulty Weighting (HA-DW), which adjusts advantage estimates based on an evolving difficulty anchor and training dynamics to correct biased estimation.

5. 📊 Results and Evaluation: HA-DW consistently improved performance when integrated into GRPO and its variants across five mathematical reasoning benchmarks, demonstrating better results even compared to GRPO using larger numbers of rollouts.

Your Group-Relative Advantage Is Biased

1/2

Paper 3

Controlled Self-Evolution for Algorithmic Code Optimization

Published: 2026-01-12

Link: http://arxiv.org/pdf/2601.07348

1. 📘 Topic and Domain: The paper focuses on algorithmic code optimization using controlled self-evolution in the domain of code generation with Large Language Models.

2. 💡 Previous Research and New Ideas: Based on previous self-evolution methods that use "generate-verify-refine" cycles, it introduces new ideas of diversified planning initialization, genetic evolution with feedback-guided mechanisms, and hierarchical evolution memory.

3. ❓ Problem: The paper addresses the low exploration efficiency of existing self-evolution methods that fail to discover solutions with superior complexity within limited computational budgets.

4. 🛠️ Methods: The paper implements Controlled Self-Evolution (CSE) with three key components: diversified planning initialization for broad solution space coverage, genetic evolution with feedback-guided mutation and crossover, and hierarchical evolution memory for capturing both inter-task and intra-task experiences.

5. 📊 Results and Evaluation: Testing on EffiBench-X demonstrated that CSE consistently outperformed all baselines across various LLM backbones, achieving higher efficiency from early generations and maintaining continuous improvement throughout evolution.