Pediatrics

Comprehensive Summary

This study evaluated how effectively two large language models (LLMs), ChatGPT-4o and Grok-3, responded to questions about cleft lip and palate (CLP) and presurgical infant orthopedics (PSIO). Six structured questions (three addressing common patient and caregiver concerns and three on the clinical aspects of presurgical orthopedic treatment) were submitted to each model under zero-context conditions. The resulting 12 responses were independently rated by 45 specialists (15 orthodontists, 15 pediatricians, 15 plastic surgeons) using the validated DISCERN instrument and the Global Quality Scale (GQS). It was found that pediatricians consistently rated both ChatGPT-4o and Grok-3 responses more favorably than orthodontists and plastic surgeons, particularly for patient-focused topics. With Grok-3, significant differences among specialties were observed in both reliability and treatment-related content, while GQS scores also varied across groups. ChatGPT-4o showed similar trends, with significant variation across specialties in reliability and treatment-related measures, and pediatricians’ GQS scores averaging 4.33–4.60 compared with orthodontists’ lower ratings of 3.00–3.67. Despite these differences, there were no statistically significant differences between the two models in overall DISCERN or GQS scores (P > 0.05). ChatGPT-4o was noted for more structured explanations, while Grok-3 slightly outperformed in presurgical content, and both achieved moderate to good reliability scores although they lacked the depth expected by surgical specialists.

Outcomes and Implications

It is important for patients and caregivers to have accurate and comprehensible information on CLP and PSIO, especially as they navigate early treatment decisions. While both models showed promise for patient education, the variation in scoring highlights that LLMs still lack the technical depth required for specialized clinical guidance. ChatGPT-4o and Grok-3 perform well with patient-friendly communication standards but may still require professional oversight for use in clinical settings. No subgroup performance analysis or cross-validation was conducted, and the one-time prompt design limits reproducibility. While the study does not claim clinical readiness, the integration of LLMs into multidisciplinary care can help expand access to cleft care resources in underserved settings.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team