2026-01-16 Papers

1/2

Paper 1

Urban Socio-Semantic Segmentation with Vision-Language Reasoning

Published: 2026-01-15

Link: http://arxiv.org/pdf/2601.10477

1. 📘 Topic and Domain: Urban socio-semantic segmentation in computer vision, focusing on segmenting socially-defined entities (like schools, parks) from satellite imagery and digital maps.

2. 💡 Previous Research and New Ideas: Based on vision-language models and semantic segmentation research, introduces novel ideas of rendering heterogeneous geospatial data into unified map images and using two-stage reasoning process mimicking human annotation.

3. ❓ Problem: Current segmentation models struggle with socially-defined categories in urban areas, as these entities are defined by social attributes rather than distinct visual appearances.

4. 🛠️ Methods: Developed SocioReasoner framework using two-stage vision-language reasoning (localization and refinement) with reinforcement learning optimization, operating on both satellite imagery and digital maps.

5. 📊 Results and Evaluation: Outperformed state-of-the-art baselines across all metrics on the new SocioSeg dataset, showing strong zero-shot generalization capabilities and achieving superior accuracy in socio-semantic segmentation tasks.

Urban Socio-Semantic Segmentation with Vision-Language Reasoning

1/2

Paper 2

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

Published: 2026-01-13

Link: http://arxiv.org/pdf/2601.08763

1. 📘 Topic and Domain: Reinforcement learning for improving large language models' creative problem-solving abilities, specifically focusing on maintaining solution diversity during RL training.

2. 💡 Previous Research and New Ideas: Based on previous work in RL for LLMs that focused on token-level diversity and entropy bonuses; introduces a novel approach that rewards uniqueness at the solution strategy level rather than just token level.

3. ❓ Problem: Addresses "exploration collapse" in RL-trained LLMs where models converge to a small set of dominant reasoning patterns, limiting their ability to find diverse solutions.

4. 🛠️ Methods: Introduces "Uniqueness-Aware RL" that uses an LLM judge to cluster solution rollouts based on high-level strategies, then reweights policy advantages inversely with cluster size to reward rare but correct solutions.

5. 📊 Results and Evaluation: Achieved consistent improvements in pass@k metrics across mathematics, physics, and medical reasoning benchmarks, with better maintenance of solution diversity and exploration compared to baselines, validated through both quantitative metrics and human evaluation of solution strategies.

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

1/2

Paper 3

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Published: 2026-01-14

Link: http://arxiv.org/pdf/2601.09667

1. 📘 Topic and Domain: The paper introduces Multi-Agent Test-Time Reinforcement Learning (MATTRL) for improving collaborative reasoning among large language models across medicine, math, and education domains.

2. 💡 Previous Research and New Ideas: The paper builds on recent work in multi-agent LLM systems and reinforcement learning for reasoning, proposing a novel framework that injects structured textual experience into multi-agent deliberation at inference time rather than requiring expensive training.

3. ❓ Problem: The paper addresses the challenges of multi-agent reinforcement learning, which is resource-intensive and unstable due to non-stationarity from co-adapting teammates and sparse, high-variance rewards.

4. 🛠️ Methods: MATTRL forms specialized expert teams for multi-turn discussions, uses credit assignment strategies to construct an experience pool from high-value interactions, and injects these experiences during test-time to improve collaborative reasoning.

5. 📊 Results and Evaluation: Across medical diagnosis, math problem-solving, and educational tasks, MATTRL improved accuracy by an average of 3.67% over multi-agent baselines and 8.67% over single-agent approaches, with detailed ablation studies validating different credit assignment schemes.