Hypertensie

Large language models in de hypertensiezorg: kansen en grenzen

Scoping review (PRISMA-ScR) van 33 studies tussen 2023 en 2025 over toepassing van large language models (LLM's) in hypertensiezorg. Toepassingen omvatten klinische besluitondersteuning, patiënteducatie, medisch onderwijs, onderzoeksondersteuning en administratieve functies.

GPT-modellen domineren; evaluatiemethoden zijn nog niet gestandaardiseerd. Validiteit en klinische bruikbaarheid blijven beperkt onderzocht. Het veld vraagt om robuustere studie-opzet, klinische uitkomstmaten en aandacht voor bias en privacy.

Abstract (original)

Large language models have emerged as potential tools to support hypertension care, including diagnosis, treatment decision-making, and patient education. However, evidence regarding their validity, performance, and clinical applicability remains limited. The objective is to map current applications of large language models in hypertension care, with emphasis on model optimization strategies, evaluation approaches, and reported limitations. We conducted a Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews–compliant scoping review of primary studies published between 2023 and 2025 evaluating large language models in hypertension. Thirty-three studies were included. Data were charted on clinical use cases, model optimization techniques, evaluation metrics, data sets, and limitations. Applications were categorized into clinical decision support systems, patient education, medical education, research support, and administrative functions. GPT-based models predominated (82%). Model optimization was limited: 89% relied exclusively on prompt engineering. Most applications focused on patient education (52%) and clinical decision support systems (24%). In clinical decision support systems, reported accuracy ranged from 65% to 100%, reaching 87% to 91% for ambulatory blood pressure monitoring interpretation. Patient education applications showed accuracy between 80% and 90%, but frequent issues included excessive language complexity and occasional unsafe outputs. Across domains, evaluation methods were heterogeneous, reproducibility was inconsistently assessed, and safety concerns, including hallucinations and outdated knowledge, were commonly reported. Current evidence suggests that large language models may support selected tasks in hypertension care; however, their clinical reliability remains uncertain. The limited methodological rigor, minimal use of advanced optimization techniques, and narrow scope of evaluated applications preclude conclusions regarding routine clinical use. Further rigorously designed studies are required before broader implementation can be considered.

Dit artikel is een samenvatting van een publicatie in Hypertension. Voor het volledige artikel, alle details en referenties verwijzen wij u naar de oorspronkelijke bron.

Lees het volledige artikel

DOI: 10.1161/HYPERTENSIONAHA.126.27004

Lid worden van HartVaat.nl?

Gratis — en we stemmen het nieuws en de literatuur af op uw vakgebied.

Maak een gratis account