2026-02-20 Papers

1/2

Paper 1

Unified Latents (UL): How to train your latents

Published: 2026-02-19

Link: http://arxiv.org/pdf/2602.17270

1. 📘 Topic and Domain: The paper presents Unified Latents (UL), a framework for learning latent representations in generative modeling, specifically for image and video generation using diffusion models.

2. 💡 Previous Research and New Ideas: The paper builds on Latent Diffusion Models and VAE frameworks, proposing to jointly train an encoder, diffusion prior, and diffusion decoder with linked encoding noise to the prior's minimum noise level, providing interpretable control over latent information content.

3. ❓ Problem: The paper addresses the challenge of how to optimally regularize latent representations when they will be subsequently modeled by diffusion models, balancing reconstruction quality against generation performance.

4. 🛠️ Methods: The authors use a deterministic encoder with fixed Gaussian noise linked to the diffusion prior's precision, train a diffusion decoder conditioned on noisy latents, and apply reweighted ELBO loss with sigmoid weighting for optimization.

5. 📊 Results and Evaluation: On ImageNet-512, UL achieves FID of 1.4 with high reconstruction quality while requiring fewer training FLOPs than Stable Diffusion latents; on Kinetics-600, it achieves state-of-the-art FVD of 1.3.

Unified Latents (UL): How to train your latents

1/2

Paper 2

CADEvolve: Creating Realistic CAD via Program Evolution

Published: 2026-02-18

Link: http://arxiv.org/pdf/2602.16317

1. 📘 Topic and Domain: The paper focuses on generating complex Computer-Aided Design (CAD) programs using evolutionary methods and vision-language models for 3D parametric modeling.

2. 💡 Previous Research and New Ideas: The paper builds on existing CAD sequence datasets (DeepCAD, Fusion360, CAD-Recode) which are limited to sketch-extrude operations, and proposes CADEvolve - an evolutionary pipeline that uses VLMs to iteratively grow CAD programs from simple primitives to industrial-grade complexity.

3. ❓ Problem: The paper aims to solve the bottleneck of limited public CAD datasets that lack complex operations, multi-operation composition, and design intent, which hinders effective AI model training for CAD automation.

4. 🛠️ Methods: The authors use an evolutionary propose-execute-filter pipeline with GPT-4o-mini that iteratively edits parent programs, validates them through staged checks (execution, geometry, visual-text agreement), and creates a three-tier dataset (generators, programs, canonicalized scripts).

5. 📊 Results and Evaluation: CADEvolve generated ~8k parametric generators and ~1.3M executable scripts; a VLM fine-tuned on this dataset achieved state-of-the-art Image2CAD performance on DeepCAD, Fusion360, and MCB benchmarks with improved CD/IoU metrics compared to the cadrille baseline.

CADEvolve: Creating Realistic CAD via Program Evolution

1/2

Paper 3

Towards a Science of AI Agent Reliability

Published: 2026-02-18

Link: http://arxiv.org/pdf/2602.16666

1. 📘 Topic and Domain: The paper addresses AI agent reliability evaluation, proposing a multi-dimensional framework for measuring how consistently, robustly, predictably, and safely AI agents perform beyond simple accuracy metrics.

2. 💡 Previous Research and New Ideas: Building on safety-critical engineering practices from aviation, nuclear power, and automotive domains, the paper introduces a novel decomposition of agent reliability into four dimensions with 12 concrete metrics, moving beyond traditional single-score accuracy evaluations.

3. ❓ Problem: Current AI agent evaluations rely primarily on mean task success rates, which obscure critical operational flaws like inconsistent behavior across runs, sensitivity to input variations, unpredictable failures, and unbounded error severity.

4. 🛠️ Methods: The authors evaluate 14 agentic models across two benchmarks (GAIA and τ-bench) using multi-run protocols, prompt perturbations, fault injection, environment modifications, and LLM-based safety analysis to compute metrics across consistency, robustness, predictability, and safety dimensions.

5. 📊 Results and Evaluation: Despite 18 months of capability improvements showing steady accuracy gains, reliability improvements lag significantly behind, with consistency and discrimination identified as the weakest dimensions requiring immediate research focus.