Back to Leaderboards
Healthcare

Medical Diagnosis Chain-of-Thought Evaluation

Comprehensive evaluation of LLM diagnostic reasoning capabilities using real clinical cases. Measures diagnostic accuracy, reasoning coherence, and consideration of alternative diagnoses.

Dr. Sarah ChenStanford Medical9/10/20254 models evaluated

Evaluation Abstract

Comprehensive evaluation of LLM diagnostic reasoning capabilities using real clinical cases. Measures diagnostic accuracy, reasoning coherence, and consideration of alternative diagnoses.

Sample Prompt

Patient presents with chest pain, shortness of breath, and diaphoresis. ECG shows ST elevation in leads II, III, and aVF. What is the most likely diagnosis and immediate management?
Scoring criteria: domain-specific metrics per evaluation.

Chainforge.ai Evaluation Flow

Model Performance Rankings

Leaderboard · Chainforge