We show that Vision-Language Models (VLM) fail to identify relative positions of anatomical structures in medical images, a fundamental requirement for clinical applicability. We analyze visual-marker strategies for performance enhancement, demonstrate VLMs' reliance on prior anatomical knowledge over image content, and introduce MIRP, an open benchmark dataset for relative positioning tasks in medical imaging. Our evaluation serves as a critical first step toward enabling VLMs for clinical use.
Is coming soon
Is coming soon
Is coming soon
Is coming soon
@article{wolf2024less,
title = {Less is More: Selective reduction of CT data for self-supervised pre-training of deep learning models with contrastive learning improves downstream classification performance},
author = {Wolf, Daniel and Payer, Tristan and Lisson, Cathrina Silvia and Lisson, Christoph Gerhard and Beer, Meinrad and G{\"o}tz, Michael and Ropinski, Timo},
journal = {Computers in Biology and Medicine},
volume = {183},
year = {2024},
doi = {10.1016/j.compbiomed.2024.109242}
}