2025-07-31 Papers

1/2

Paper 1

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Published: 2025-07-30

Link: http://arxiv.org/pdf/2507.22827

1. 📘 Topic and Domain: Automating the conversion of UI designs into front-end code using vision-language models and multi-agent systems.

2. 💡 Previous Research and New Ideas: Based on previous vision-language models and UI-to-code generation research, proposing a novel modular multi-agent framework that decomposes the task into grounding, planning, and generation stages.

3. ❓ Problem: Addressing the limitations of existing text-based and vision-based code generation systems that struggle with capturing spatial layouts and visual design intent in UI development.

4. 🛠️ Methods: Implements a three-stage pipeline with a grounding agent for UI component detection, a planning agent for hierarchical layout construction, and a generation agent for HTML/CSS code synthesis, plus dual-stage post-training of vision-language models.

5. 📊 Results and Evaluation: Achieves state-of-the-art performance across five metrics (block match, text similarity, position alignment, color consistency, and CLIP similarity), outperforming existing open-source models and competing with proprietary systems.

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

1/2

Paper 2

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

Published: 2025-07-30

Link: http://arxiv.org/pdf/2507.22607

1. 📘 Topic and Domain: The paper focuses on multimodal reasoning in AI, specifically developing a reinforcement learning approach to improve visual-language models' reasoning capabilities across diverse tasks.

2. 💡 Previous Research and New Ideas: It builds on previous reinforcement learning work in language models and extends it to multimodal reasoning, proposing a novel progressive curriculum learning framework with dynamic length rewards.

3. ❓ Problem: The paper aims to solve the challenge of unstable performance of multimodal models across different domains and difficulty levels of reasoning tasks.

4. 🛠️ Methods: The authors developed PCuRL (Progressive Curriculum Reinforcement Learning) framework with two key components: online difficulty soft weighting for curriculum learning and dynamic length reward mechanism to adapt reasoning path lengths.

5. 📊 Results and Evaluation: VL-Cogito achieved state-of-the-art or highly competitive performance across multiple multimodal benchmarks spanning mathematics, science, logic and general understanding domains, demonstrating consistent improvements without requiring cold-start initialization.

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

1/2

Paper 3

ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge

Published: 2025-07-29

Link: http://arxiv.org/pdf/2507.21990

1. 📘 Topic and Domain: The paper presents ChemDFM-R, a chemical reasoner large language model enhanced with atomized chemical knowledge, in the domain of chemistry and artificial intelligence.

2. 💡 Previous Research and New Ideas: Based on previous work in general domain reasoning LLMs and chemical LLMs, it proposes incorporating atomized functional group knowledge and developing chemical-specific reasoning capabilities through a novel training pipeline.

3. ❓ Problem: The paper addresses the limitations of current LLMs in chemistry: shallow domain understanding and limited reasoning capabilities that hinder reliable practical applications.

4. 🛠️ Methods: The method involves: 1) Constructing a functional group-centric pretraining corpus (ChemFG), 2) Domain pretraining and instruction tuning, 3) Mix-sourced distillation combining expert knowledge with general reasoning skills, 4) Domain-specific reinforcement learning.

5. 📊 Results and Evaluation: ChemDFM-R achieved state-of-the-art performance on chemical benchmarks while providing interpretable rationales. It demonstrated strong chemical reasoning capabilities and enabled reliable human-AI collaboration in chemistry research scenarios.