Lung ultrasound (LUS) is an important imaging modality used by emergency physicians to assess pulmonary congestion at the patient bedside. B-line artifacts in LUS videos are key findings associated with pulmonary congestion. Not only can the interpretation of LUS be challenging for novice operators, but visual quantification of B-lines remains subject to observer variability. In this work, we investigate the strengths and weaknesses of multiple deep learning approaches for automated B-line detection and localization in LUS videos. We curate and publish, BEDLUS, a new ultrasound dataset comprising 1,419 videos from 113 patients with a total of 15,755 expert-annotated B-lines. Based on this dataset, we present a benchmark of established deep learning methods applied to the task of B-line detection. To pave the way for interpretable quantification of B-lines, we propose a novel "single-point" approach to B-line localization using only the point of origin. Our results show that (a) the area under the receiver operating characteristic curve ranges from 0.864 to 0.955 for the benchmarked detection methods, (b) within this range, the best performance is achieved by models that leverage multiple successive frames as input, and (c) the proposed single-point approach for B-line localization reaches an F 1-score of 0.65, performing on par with the inter-observer agreement. The dataset and developed methods can facilitate further biomedical research on automated interpretation of lung ultrasound with the potential to expand the clinical utility.
Publications
2023
BACKGROUND: Managing acute pain is a common challenge in the emergency department (ED). Though widely used in perioperative settings, ED-based ultrasound-guided nerve blocks (UGNBs) have been slow to gain traction. Here, we develop a low-cost, low-fidelity, simulation-based training curriculum in UGNBs for emergency physicians to improve procedural competence and confidence.
METHODS: In this pre-/postintervention study, ED physicians were enrolled to participate in a 2-h, in-person simulation training session composed of a didactic session followed by rotation through stations using handmade pork-based UGNB models. Learner confidence with performing and supervising UGNBs as well as knowledge and procedural-based competence were assessed pre- and posttraining via electronic survey quizzes. One-way repeated-measures ANOVAs and pairwise comparisons were conducted. The numbers of nerve blocks performed clinically in the department pre- and postintervention were compared.
RESULTS: In total, 36 participants enrolled in training sessions, eight participants completed surveys at all three data collection time points. Of enrolled participants, 56% were trainees, 39% were faculty, 56% were female, and 53% self-identified as White. Knowledge and competency scores increased immediately postintervention (mean ± SD t0 score 66.9 ± 8.9 vs. t1 score 90.4 ± 11.7; p < 0.001), and decreased 3 months postintervention but remained elevated above baseline (t2 scores 77.2 ± 11.5, compared to t0; p = 0.03). Self-reported confidence in performing UGNBs increased posttraining (t0 5.0 ± 2.3 compared to t1 score 7.1 ± 1.5; p = 0.002) but decreased to baseline levels 3 months postintervention (t2 = 6.0 ± 1.9, compared to t0; p = 0.30).
CONCLUSIONS: A low-cost, low-fidelity simulation curriculum can improve ED provider procedural-based competence and confidence in performing UGNBs in the short term, with a trend toward sustained improvement in knowledge and confidence. Curriculum adjustments to achieve sustained improvement in confidence performing and supervising UGNBs long term are key to increased ED-based UGNB use.
OBJECTIVES: Non-contrast computed tomography (NCCT) is the gold standard for nephrolithiasis evaluation in the emergency department (ED). However, Choosing Wisely guidelines recommend against ordering NCCT for patients with suspected nephrolithiasis who are <50 years old with a history of kidney stones. Our primary objective was to estimate the national annual cost savings from using a point-of-care ultrasound (POCUS)-first approach for patients with suspected nephrolithiasis meeting Choosing Wisely criteria. Our secondary objectives were to estimate reductions in ED length of stay (LOS) and preventable radiation exposure.
METHODS: We created a Monte Carlo simulation using available estimates for the frequency of ED visits for nephrolithiasis and eligibility for a POCUS-first approach. The study population included all ED patients diagnosed with nephrolithiasis. Based on 1000 trials of our simulation, we estimated national cost savings in averted advanced imaging from this strategy. We applied the same model to estimate the reduction in ED LOS and preventable radiation exposure.
RESULTS: Using this model, we estimate a POCUS-first approach for evaluating nephrolithiasis meeting Choosing Wisely guidelines to save a mean (±SD) of $16.5 million (±$2.1 million) by avoiding 159,000 (±18,000) NCCT scans annually. This resulted in a national cumulative decrease of 166,000 (±165,000) annual bed-hours in ED LOS. Additionally, this resulted in a national cumulative reduction in radiation exposure of 1.9 million person-mSv, which could potentially prevent 232 (±81) excess cancer cases and 118 (±43) excess cancer deaths annually.
CONCLUSION: If adopted widely, a POCUS-first approach for suspected nephrolithiasis in patients meeting Choosing Wisely criteria could yield significant national cost savings and a reduction in ED LOS and preventable radiation exposure. Further research is needed to explore the barriers to widespread adoption of this clinical workflow as well as the benefits of a POCUS-first approach in other patient populations.
BACKGROUND: Point-of-care ultrasound (US) has been suggested as the primary imaging in evaluating patients with suspected diverticulitis. Discrimination between simple and complicated diverticulitis may help to expedite emergent surgical consults and determine the risk of complications. This study aimed to: (1) determine the accuracy of an US protocol (TICS) for diagnosing diverticulitis in the emergency department (ED) setting and (2) assess the ability of TICS to distinguish between simple and complicated diverticulitis.
METHODS: Patients with clinically suspected diverticulitis who underwent a diagnostic computed tomography (CT) scan were identified prospectively in the ED. Emergency US faculty and fellows blinded to the CT results performed and interpreted US scans. The presence of simple or complicated diverticulitis was recorded after each US evaluation. The diagnostic ability of the US was compared to CT as the criterion standard. Modified Hinchey classification was used to distinguish between simple and complicated diverticulitis.
RESULTS: A total of 149 patients (55% female, mean ± SD age 58 ± 16 years) were enrolled and included in the final analyses. Diverticulitis was the final diagnosis in 75 of 149 patients (50.3%), of whom 53 had simple diverticulitis and 22 had perforated diverticulitis (29.4%). TICS protocol's test characteristics for simple diverticulitis include a sensitivity of 95% (95% confidence interval [CI] 87%-99%), specificity of 76% (95% CI 65%-86%), positive predictive value of 80% (95% CI 71%-88%), and negative predictive value of 93% (95% CI 84%-98%). TICS protocol correctly identified 12 of 22 patients with complicated diverticulitis (sensitivity 55% [95% CI 32%-76%]) and specificity was 96% (95% CI 91%-99%). Eight of 10 missed diagnoses of complicated diverticulitis were identified as simple diverticulitis, and two were recorded as negative.
CONCLUSIONS: In ED patients with suspected diverticulitis, US demonstrated high accuracy in ruling out or diagnosing diverticulitis, but its reliability in differentiating complicated from simple diverticulitis is unsatisfactory.
AIM: Acute decompensated heart failure (ADHF) is the leading cause of cardiovascular hospitalizations in the United States. Detecting B-lines through lung ultrasound (LUS) can enhance clinicians' prognostic and diagnostic capabilities. Artificial intelligence/machine learning (AI/ML)-based automated guidance systems may allow novice users to apply LUS to clinical care. We investigated whether an AI/ML automated LUS congestion score correlates with expert's interpretations of B-line quantification from an external patient dataset.
METHODS AND RESULTS: This was a secondary analysis from the BLUSHED-AHF study which investigated the effect of LUS-guided therapy on patients with ADHF. In BLUSHED-AHF, LUS was performed and B-lines were quantified by ultrasound operators. Two experts then separately quantified the number of B-lines per ultrasound video clip recorded. Here, an AI/ML-based lung congestion score (LCS) was calculated for all LUS clips from BLUSHED-AHF. Spearman correlation was computed between LCS and counts from each of the original three raters. A total of 3858 LUS clips were analysed on 130 patients. The LCS demonstrated good agreement with the two experts' B-line quantification score (r = 0.894, 0.882). Both experts' B-line quantification scores had significantly better agreement with the LCS than they did with the ultrasound operator's score (p < 0.005, p < 0.001).
CONCLUSION: Artificial intelligence/machine learning-based LCS correlated with expert-level B-line quantification. Future studies are needed to determine whether automated tools may assist novice users in LUS interpretation.
2022
INTRODUCTION: Hypertension is often incidentally discovered in the emergency department (ED); these patients may benefit from close follow-up. We developed a module to automatically include discharge instructions for patients with elevated blood pressure (BP) in the ED, aiming to improve 30-day follow-up.
AIM: This study sought to determine if automated discharge instructions for patients with elevated blood pressure in the ED improved 30-day follow-up with a patient's primary care physician (PCP).
METHODS: We developed an automated module with standardized instructions for patients with elevated BP. These were read upon discharge, and e-mailed to the PCP. We analyzed 193 patients during a 1-month interval after implementation, and 207 during 1-month the year prior. The groups were compared using Fisher's exact test.
RESULTS: Thirty-day follow-up was 52.2% pre-implementation and 48.4% post-implementation, with no significant difference noted. For patients without known hypertension, follow-up slightly improved, but not significantly. For hypertensive patients, follow-up rates significantly decreased post-implementation.
CONCLUSIONS: Despite implementation of automated discharge instructions, we found no improvement in 30-day follow-up. Patients without hypertension trended towards improved follow-up, possibly being more attentive to new abnormal BP readings. However, known hypertensive patients followed-up at a lower rate, which was unexpected and requires further investigation.
BACKGROUND: Post-discharge opioid consumption is a crucial patient-reported outcome informing opioid prescribing guidelines, but its collection is resource-intensive and vulnerable to inaccuracy due to nonresponse bias.
METHODS: We developed a post-discharge text message-to-web survey system for efficient collection of patient-reported pain outcomes. We prospectively recruited surgical patients at Beth Israel Deaconess Medical Center in Boston, Massachusetts from March 2019 through October 2020, sending an SMS link to a secure web survey to quantify opioids consumed after discharge from hospitalization. Patient factors extracted from the electronic health record were tested for nonresponse bias and observable confounding. Following targeted learning-based nonresponse adjustment, procedure-specific opioid consumption quantiles (medians and 75th percentiles) were estimated and compared to a previous telephone-based reference survey.
RESULTS: 6553 patients were included. Opioid consumption was measured in 44% of patients (2868), including 21% (1342) through survey response. Characteristics associated with inability to measure opioid consumption included age, tobacco use, and prescribed opioid dose. Among the 10 most common procedures, median consumption was only 36% of the median prescription size; 64% of prescribed opioids were not consumed. Among those procedures, nonresponse adjustment corrected the median opioid consumption by an average of 37% (IQR: 7, 65%) compared to unadjusted estimates, and corrected the 75th percentile by an average of 5% (IQR: 0, 12%). This brought median estimates for 5/10 procedures closer to telephone survey-based consumption estimates, and 75th percentile estimates for 2/10 procedures closer to telephone survey-based estimates.
CONCLUSIONS: SMS-recruited online surveying can generate reliable opioid consumption estimates after nonresponse adjustment using patient factors recorded in the electronic health record, protecting patients from the risk of inaccurate prescription guidelines.
Background: Since 2017, states, insurers, and pharmacies have placed blanket limits on the duration and quantity of opioid prescriptions. In many states, overlapping duration and daily dose limits yield maximum prescription limits of 150-350 morphine milligram equivalents (MMEs). There is limited knowledge of how these restrictions compare with actual patient opioid consumption; while changes in prescription patterns and opioid misuse rates have been studied, these are, at best, weak proxies for actual pain control consumption. We sought to determine how patients undergoing surgery would be affected by opioid prescribing restrictions using actual patient opioid consumption data. Methods: We constructed a prospective database of post-discharge opioid consumption: patients undergoing surgery at one institution were called after discharge to collect opioid consumption data. Patients whose opioid consumption exceeded 150 and 350 MME were identified. Results: Two thousand nine hundred and seventy-one patients undergoing 54 common surgical procedures were included in our study. Twenty-one percent of patients consumed more than the 150 MME limit. Only 7% of patients consumed above the 350 MME limit. Typical (non-outlier) opioid consumption, defined as less than the 75th percentile of consumption for any given procedure, exceeded the 150 MME and 350 MME limits for 41 and 7% of procedures, respectively. Orthopedic, spinal/neurosurgical, and complex abdominal procedures most commonly exceeded these limits. Conclusions: While most patients undergoing surgery are unaffected by recent blanket prescribing limits, those undergoing a specific subset of procedures are likely to require more opioids than the restrictions permit; providers should be aware that these patients may require a refill to adequately control post-surgical pain. Real consumption data should be used to guide these restrictions and inform future interventions so the risk of worsened pain control (and its troublesome effects on opioid misuse) is minimized. Procedure-specific prescribing limits may be one approach to prevent misuse, while also optimizing post-operative pain control.
OBJECTIVE: We compare consensus recommendations for 5 surgical procedures to prospectively collected patient consumption data. To address local variation, we combined data from multiple hospitals across the country.
SUMMARY OF BACKGROUND DATA: One approach to address the opioid epidemic has been to create prescribing consensus reports for common surgical procedures. However, it is unclear how these guidelines compare to patient-reported data from multiple hospital systems.
METHODS: Prospective observational studies of surgery patients were completed between 3/2017 and 12/2018. Data were collected utilizing post-discharge surveys and chart reviews from 5 hospitals (representing 3 hospital systems) in 5 states across the USA. Prescribing recommendations for 5 common surgical procedures identified in 2 recent consensus reports were compared to the prospectively collected aggregated data. Surgeries included: laparoscopic cholecystectomy, open inguinal hernia repair, laparoscopic inguinal hernia repair, partial mastectomy without sentinel lymph node biopsy, and partial mastectomy with sentinel lymph node biopsy.
RESULTS: Eight hundred forty-seven opioid-naïve patients who underwent 1 of the 5 studied procedures reported counts of unused opioid pills after discharge. Forty-one percent did not take any opioid medications, and across all surgeries, the median consumption was 3 5 mg oxycodone pills or less. Generally, consensus reports recommended opioid quantities that were greater than the 75th percentile of consumption, and for 2 procedures, recommendations exceeded the 90th percentile of consumption.
CONCLUSIONS: Although consensus recommendations were an important first step to address opioid prescribing, our data suggests that following these recommendations would result in 47%-56% of pills prescribed remaining unused. Future multi-institutional efforts should be directed toward refining and personalizing prescribing recommendations.