1. 📘 Topic and Domain: The paper introduces Medical AI Scientist, an autonomous framework for end-to-end clinical medical AI research automation spanning hypothesis generation, experimental validation, and manuscript drafting.
2. 💡 Previous Research and New Ideas: Based on existing AI Scientist systems (AI Scientist-v2, AI-Researcher, Agent Laboratory) and medical AI applications, the paper proposes a novel clinician-engineer co-reasoning mechanism and three operational modes (Reproduction, Innovation, Exploration) tailored specifically for clinical medicine.
3. ❓ Problem: Existing AI Scientists are domain-agnostic, lacking mechanisms to ground hypotheses in medical evidence, handle heterogeneous clinical data formats, or ensure ethical compliance, making them unsuitable for clinical autonomous research.
4. 🛠️ Methods: The framework comprises three components: an Idea Proposer with clinician-engineer co-reasoning for evidence-grounded hypothesis generation, an Experimental Executor orchestrating domain-specific medical toolboxes in Dockerized environments, and a hierarchical Manuscript Composer enforcing structured medical writing with ethical review.
5. 📊 Results and Evaluation: Across 171 evaluation cases (19 tasks, 6 modalities), the system outperforms GPT-5 and Gemini-2.5-Pro in idea quality across six dimensions; achieves 86-93% experimental success rates; and generates manuscripts scoring 4.60±0.56 (Stanford Agentic Reviewer), competitive with MICCAI/ISBI/BIBM publications under double-blind human evaluation, with one manuscript accepted at ICAIS 2025.
1. 📘 主题与领域: 该论文提出了"医学AI科学家"(Medical AI Scientist),一个用于端到端临床医学AI研究自动化的自主框架,涵盖假设生成、实验验证和论文撰写。
2. 💡 先前研究与新思路: 基于现有AI科学家系统(如AI Scientist-v2、AI-Researcher、Agent Laboratory)和医学AI应用,论文提出了专门针对临床医学设计的临床医生-工程师协同推理机制,以及三种操作模式(复现、创新、探索)。
3. ❓ 问题: 现有AI科学家系统缺乏领域针对性,无法将假设建立在医学证据基础上、处理异构临床数据格式或确保伦理合规,难以用于临床自主研究。
4. 🛠️ 方法: 该框架包含三个核心组件:具备临床医生-工程师协同推理功能的"创意生成器"、在Docker化环境中编排领域特定医学工具包的"实验执行器",以及执行结构化医学写作并嵌入伦理审查的"论文合成器"。
5. 📊 结果与评估: 在171个评估案例(19项任务、6种数据模态)中,该系统在六个维度的创意质量上优于GPT-5和Gemini-2.5-Pro;实现86-93%的实验成功率;生成的论文在斯坦福智能审稿人评估中获得4.60±0.56分,在双盲人类评估中与MICCAI/ISBI/BIBM论文相当,其中一篇论文已被ICAIS 2025接收。