1. 📘 Topic and Domain: Evaluating DINOv3, a self-supervised vision transformer trained on natural images, for medical imaging tasks including 2D/3D classification and segmentation.
2. 💡 Previous Research and New Ideas: Based on DINO series and other medical vision models like BiomedCLIP, proposes using natural image-trained DINOv3 as a universal encoder for medical imaging without domain-specific pre-training.
3. ❓ Problem: Investigating whether DINOv3's visual features trained on natural images can effectively transfer to specialized medical imaging tasks without medical domain pre-training.
4. 🛠️ Methods: Conducted comprehensive benchmarking across multiple medical imaging tasks using linear probing, k-NN evaluation, and multiple instance learning, testing different model sizes (DINOv3-S/B/L) and input resolutions.
5. 📊 Results and Evaluation: DINOv3 showed strong performance on X-ray and CT tasks but struggled with specialized domains like pathology slides and PET scans, with inconsistent scaling benefits across different medical tasks and modalities.