2025-07-07 Papers

1/2

Paper 1

IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Published: 2025-07-02

Link: http://arxiv.org/pdf/2507.02025

1. 📘 Topic and Domain: Development of IntFold, a controllable foundation model for biomolecular structure prediction in computational biology and drug discovery.

2. 💡 Previous Research and New Ideas: Based on AlphaFold 3's architecture for biomolecular structure prediction, introduces new controllable adapters for specialized tasks and a custom attention kernel.

3. ❓ Problem: Addresses the challenge of efficiently adapting large structure prediction models for specialized applications while maintaining high accuracy across general prediction tasks.

4. 🛠️ Methods: Implements modular adapters (LoRA architecture), custom FlashAttentionPairBias kernel, and a model-agnostic ranking method, trained on comprehensive datasets including PDB structures and specialized datasets.

5. 📊 Results and Evaluation: Achieves accuracy comparable to AlphaFold 3 across various biomolecular structures (protein-protein, protein-ligand, nucleic acids), with significant improvements in specialized tasks like allosteric state prediction and binding affinity estimation.

IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

1/2

Paper 2

Ovis-U1 Technical Report

Published: 2025-06-28

Link: http://arxiv.org/pdf/2506.23044

1. 📘 Topic and Domain: A technical report introducing Ovis-U1, a 3-billion-parameter unified multimodal AI model for image understanding, text-to-image generation, and image editing.

2. 💡 Previous Research and New Ideas: Based on GPT-4o and previous Ovis models, proposing a new unified training approach starting from a language model instead of using a frozen multimodal language model.

3. ❓ Problem: Addressing how to endow a multimodal understanding model with image generation capabilities and effectively train a unified model on both understanding and generation tasks.

4. 🛠️ Methods: Implements a diffusion-based visual decoder with bidirectional token refiner, utilizing a 6-stage unified training process combining understanding, generation, and editing tasks.

5. 📊 Results and Evaluation: Achieves 69.6 on OpenCompass Multi-modal Academic Benchmark, 83.72 on DPG-Bench, 0.89 on GenEval, and scores of 4.00 and 6.42 on ImgEdit-Bench and GEdit-Bench-EN respectively, surpassing several state-of-the-art models.

Ovis-U1 Technical Report

1/2

Paper 3

Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search

Published: 2025-07-03

Link: http://arxiv.org/pdf/2507.02652

1. 📘 Topic and Domain: The paper presents HiRA, a hierarchical reasoning framework for deep search tasks in artificial intelligence that separates planning from execution.

2. 💡 Previous Research and New Ideas: Building on previous retrieval-augmented generation (RAG) and single-model reasoning approaches, it proposes a novel multi-agent hierarchical architecture that decouples high-level planning from specialized execution.

3. ❓ Problem: The paper addresses the limitations of current single-model approaches that struggle with handling both high-level planning and detailed execution simultaneously, leading to inefficient reasoning and limited scalability.

4. 🛠️ Methods: HiRA implements a three-tier architecture consisting of a Meta Reasoning Planner for task decomposition, an Adaptive Reasoning Coordinator for task delegation, and Domain-Specialized Executors for specialized task execution.

5. 📊 Results and Evaluation: Experiments on four complex cross-modal deep search benchmarks showed that HiRA significantly outperformed state-of-the-art RAG and agent-based systems, with notable improvements in both answer quality and system efficiency.