Frontier Signal
Dermatology MLLMs Face ‘Benchmark-to-Bedside’ Gap
New research reveals multimodal LLMs, including GPT-4.1, perform significantly worse in real-world dermatology diagnostics and triage compared to public benchmarks, highlighting a critical 'benchmark-to-bedside' gap.
Read the briefing