2025-11-25 Papers

1/2

Paper 1

AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning

Published: 2025-11-24

Link: http://arxiv.org/pdf/2511.19304

1. 📘 Topic and Domain: The paper focuses on automated environment generation and cross-environment agent learning evaluation in artificial intelligence, specifically developing a framework called AutoEnv for creating and measuring how well AI agents learn across different environments.
2. 💡 Previous Research and New Ideas: Based on previous work in single-environment agent learning and human-designed environments, it introduces two new ideas: AutoEnv (an automated environment generation framework) and a formal component-centric process for agent learning with Selection, Optimization, and Evaluation stages.
3. ❓ Problem: The paper addresses the lack of diverse, controllable environments for testing AI agents' cross-environment learning abilities and the absence of a unified way to represent how agents learn across different environments.
4. 🛠️ Methods: The authors developed AutoEnv to automatically generate environments by treating them as factorizable distributions over transitions, observations, and rewards, and created AutoEnv-36 (a dataset of 36 environments with 358 validated levels) to test eight different learning methods.
5. 📊 Results and Evaluation: The results showed that seven language models achieved only 12-49% normalized reward on AutoEnv-36, and single learning methods' effectiveness decreased as environment diversity increased, while environment-adaptive selection improved performance but showed diminishing returns as the method space expanded.

AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning

AutoEnv: Automated Environments for Cross-Environment Agent Learning Environment Generation Environment Theme DSL Design (YAML) Code Synthesis (BaseEnv/ObsEnv/SkinEnv) Self-Repair Loop Three-Stage Verification Execution Level Gen Reliability AutoEnv-36: 36 Environments, 358 Levels Component-Centric Learning Selection Best/Pareto Optimization Dynamics/Instruction Evaluation Reward-based Prompt Components Agent Code Eight Learning Methods Instantiated Fixed Methods Don't Scale Environment-Adaptive Selection Improves Performance Learning Upper Bound Analysis Evaluation & Analysis 7 Language Models: 12-49% Normalized Reward Reward Type Observation Semantics Key Findings: • Single methods fail with diversity • Adaptive selection shows promise Generation Cost: $4.12 per Environment 90% Execution Success Rate Cross-Environment Agent Learning Results Environment Formulation E = (S, A, T, R, Ω, τ) Three-layer abstraction: BaseEnv → ObsEnv → SkinEnv AutoEnv-36 Statistics 36 environments, 358 levels Binary/Accum: 50%/50% Full/Partial obs: 42%/58% Learning Methods 2 Selection × 2 Optimization × 2 Components = 8 methods Learning Upper Bound defined Performance Gap Best single: 42.40% Upper bound: 47.75% Gap: 5.35% improvement Key Research Insights Fixed learning methods break down as environment diversity increases Environment-adaptive selection substantially improves performance but shows diminishing returns Heterogeneous environments provide diverse learning signal characteristics Future: Automatic learning strategy design for cross-environment generalization
Q1
1. What surprising finding did the researchers discover about environments with inverse semantics in AutoEnv-36?
They were too difficult for agents to handle
They yielded higher scores than aligned semantic environments
They required more computational resources to generate
Q2
2. What was the average cost per environment generation using AutoEnv?
$4.12
$14.20
$40.12
Q3
3. How does a fixed learning method's performance change as environment diversity increases?
It stays constant regardless of environment diversity
It improves with more diverse environments
Its benefit quickly diminishes with more environments
1/2

Paper 2

DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

Published: 2025-11-24

Link: http://arxiv.org/pdf/2511.19365

1. 📘 Topic and Domain: The paper presents DeCo, a novel frequency-decoupled pixel diffusion framework for end-to-end image generation in computer vision and deep learning.
2. 💡 Previous Research and New Ideas: Based on previous pixel diffusion and latent diffusion models, it proposes a new architecture that separates high and low-frequency components during image generation, unlike traditional methods that process both simultaneously.
3. ❓ Problem: The paper addresses the inefficiency of existing pixel diffusion models that struggle to jointly model complex high-frequency signals and low-frequency semantics within a single diffusion transformer.
4. 🛠️ Methods: Implements a two-part architecture: a Diffusion Transformer (DiT) for low-frequency semantics and a lightweight pixel decoder for high-frequency details, plus a frequency-aware flow-matching loss inspired by JPEG compression.
5. 📊 Results and Evaluation: Achieves superior FID scores of 1.62 (256×256) and 2.22 (512×512) on ImageNet, with a leading GenEval score of 0.86, outperforming existing pixel diffusion methods while matching two-stage latent diffusion approaches.

DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

DeCo: Frequency-Decoupled Pixel Diffusion Framework Original Image x₀ (512×512) Add Noise xₜ = (1-t)x₀ + tx₁ Multi-scale Split High-res: xₜ (original) Low-res: x̄ₜ (patch=16) Diffusion Transformer (DiT) Low-frequency semantics c = θ_DiT(x̄ₜ, t, y) Pixel Decoder (Lightweight) High-frequency details vθ = θ_Dec(xₜ, t, c) Decoder Components • Dense Query (patch=1) • AdaLN Modulation • Linear Layers ×3 • Attention-free • 8.5M parameters Standard FM Loss L_FM = E[||vθ - vₜ||²] Pixel-level supervision Freq-aware FM Loss L_FreqFM = E[w||Vθ - Vₜ||²] DCT + JPEG weights Frequency Transform RGB → YCbCr → DCT JPEG quality weights Generated Image High-quality output Key Advantages of DeCo Framework • Frequency Decoupling: DiT focuses on semantics, decoder handles details • Efficient Training: 10× faster convergence, reduced computational overhead • Superior Performance: FID 1.62 (256×256), 2.22 (512×512) on ImageNet Low Freq Semantics (DiT) High Freq Details (Decoder)
Q1
1. What is the main innovation of DeCo compared to traditional pixel diffusion models?
It uses a larger transformer architecture
It separates high and low frequency components during generation
It completely eliminates the need for neural networks
Q2
2. The frequency-aware flow-matching loss in DeCo is inspired by which technology?
MP3 audio compression
PNG image format
JPEG compression
Q3
3. What practical advantage does DeCo demonstrate in terms of training efficiency?
It achieves the same FID score 10x faster than baseline
It requires 50% less memory during training
It can only be trained on small datasets
1/2

Paper 3

General Agentic Memory Via Deep Research

Published: 2025-11-23

Link: http://arxiv.org/pdf/2511.18423

1. 📘 Topic and Domain: The paper presents a novel memory framework called General Agentic Memory (GAM) in the domain of artificial intelligence, specifically focusing on memory systems for large language models.
2. 💡 Previous Research and New Ideas: The paper builds on previous static memory systems that use Ahead-of-Time compilation, and proposes a new Just-in-Time compilation approach that creates optimized contexts at runtime while maintaining simple memory offline.
3. ❓ Problem: The paper aims to solve the limitations of static memory systems which suffer from information loss and lack of flexibility in adapting to unforeseen requests.
4. 🛠️ Methods: The paper implements a dual-agent framework consisting of a Memorizer that creates lightweight memory while storing complete history in a page-store, and a Researcher that performs deep research to retrieve and integrate relevant information for requests.
5. 📊 Results and Evaluation: The system achieved substantial improvements over existing memory methods across multiple benchmarks including LoCoMo, HotpotQA, RULER, and NarrativeQA, with particularly strong performance on multi-step retrieval and reasoning tasks.

General Agentic Memory Via Deep Research

General Agentic Memory (GAM) Workflow OFFLINE STAGE Input History sess₁, sess₂, ..., sessₙ MEMORIZER 1. Memorizing 2. Paging Create headers & pages Memory Page Store ONLINE STAGE Request RESEARCHER Deep Research Process: 1 Planning 2 Searching 3 Reflection 4 Integration Search Tools • Vector Search • BM25 Retrieval • Page ID Access Optimized Context JIT vs AOT Memory Paradigm AOT (Traditional) Heavy offline computation Information loss JIT (GAM) Runtime optimization Lossless retrieval Key Advantages High-fidelity & Adaptable Domain generalizable End-to-End Optimization via Reinforcement Learning Memorizer Policy (θₘ) Learns memory construction Researcher Policy (θᵣ) Learns search strategy Reward Function (Γ) Task completion quality Test-time Scalability Increased reflection depth + More retrieved pages = Better performance
Q1
1. What is the key innovation in GAM's approach compared to traditional memory systems?
It uses static memory compilation only
It employs Just-in-Time compilation with runtime optimization
It completely eliminates the need for memory storage
Q2
2. Which component of GAM is most sensitive to the size/capacity of the underlying LLM?
The Memorizer module
The Researcher module
The Page-store component
Q3
3. On which type of tasks did GAM show particularly strong performance?
Simple single-hop retrieval tasks
Basic text classification tasks
Multi-step retrieval and reasoning tasks