Abstract
STUDY OBJECTIVES: To compare end-of-life predictions as measured by the physician-answered surprise question (SQ), "Would you be surprised if this patient died in the next 6 months?"), the Geriatric End-of-Life Screening Tool (GEST) artificial intelligence (AI) model, and a new collaborative GEST+SQ model for predicting 6-month mortality in older emergency department (ED) patients.
METHODS: This was a single-site prospective cohort study (Nov 2022 to June 2023) at a tertiary academic ED of patients aged 65 years and older. Answers to the SQ were collected within the electronic health record at ED disposition and GEST scores were calculated from available records using laboratory, vital signs, demographic and historical data. Six-month mortality was adjudicated via electronic health record and state records. SQ and GEST were compared using sensitivity and specificity. A new logistic regression model was developed combining SQ and GEST (GEST+SQ) and compared with GEST alone, using area under receiver-operating characteristic curves (ROC-AUC) for discrimination and expected calibration error for calibration. We modeled a sequential screening pathway where low- and high-risk patients received only GEST screening, whereas intermediate-risk patients received both GEST and SQ, reporting the proportion of patients for whom adding the SQ to GEST would change a theoretical referral to intervention.
RESULTS: From 9,256 eligible patients, 3,479 had SQ responses (37.6%), with 13.3% 6-month mortality. When matching GEST sensitivity to SQ (83.8%), GEST had greater specificity than the SQ (61.5% [56.7 to 67.1] vs. 50.8% [49.1 to 52.6]). At matching specificity (50.8%), GEST sensitivity (90.0% [87.0 to 92.7]) exceeded the SQ (83.8% [80.3 to 87.0]). GEST had an receiver-operating characteristic - area under the curve (ROC-AUC) of 0.79 (0.77 to 0.81), whereas the GEST+SQ model had ROC-AUC of 0.80 (0.78 to 0.82). The GEST+SQ model had significantly improved expected calibration error of 0.01 (0.01 to 0.02) for GEST+SQ vs. 0.042 (0.03 to 0.05) for GEST alone. In a sequential screening pathway, as few as 5% of patients required SQ screening following GEST risk scoring.
CONCLUSION: GEST modestly outperformed the SQ for predicting 6-month mortality. A GEST+SQ collaborative model did not improve discrimination (ROC-AUC) over GEST alone, but improved calibration. Sequential screening using GEST and then the SQ for intermediate-risk patients could decrease physician screening burden by 95% relative to manual, SQ-only screening. Collaborative approaches integrating automated tools with targeted physician input may enhance ED mortality risk assessment while reducing clinician effort.