Published Clinical Decision Aids May Lack Validation

The physician life has never been “easier.” We live in a fortunate future, replete with information technology at our fingertips, along with the decision support to suit our every clinical need. What tremendous satisfaction we all must take with the various epiphanies and pearls presented to us by our electronic health records. Independent thought, along with clinical judgment, is being rendered obsolete.

A steady diet of academic research inflates our fulsome girth of clinical calculators, shortcuts, and acronyms. NEXUS! PERC! HEART! HAS-BLED! The siren’s call of simplicity and cognitive unburdening is insidiously appealing. With progress, unfortunately, also comes folly. Are these tools actually smarter than the average bear? Does this ever-expanding cornucopia of “decision support” actually outperform a trained clinician?

Perhaps a better question is, is anyone even asking that question?

It’s troubling that this does not appear to be the case. A recent historical review in Annals of Emergency Medicine looked back at 171 research articles evaluating the performance of decision aids.¹ For a decision aid intended to be incorporated into routine practice, it should seem reasonable not only to simply statistically validate a prediction but to also ensure it outperforms current clinical practice.

Of the 171 decision aids included in their survey, the authors were only able to identify 21 publications either in Annals or another journal in which the aid was compared directly to clinician judgment. In the remainder, no comparison was made or could be identified in the external literature. Of the handful for which a comparison was identified, the results are, unfortunately, discouraging. In these 21 comparisons, the decision aid was clearly superior to clinician judgment in only two. The two comparisons favoring the decision aid were a prognostic neural network for outcomes in patients presenting with chest pain—effective but too unwieldy for widespread use—and the useful and well-studied Canadian C-Spine Rule. Conversely, six decision aids clearly underperformed as compared to clinician judgment, and the remainder were a wash. Examples of popular decision instruments either inferior to or no different than clinician judgment included the Alvarado score for appendicitis, a general evaluation of pediatric head injury rules, risk-stratification rules for pulmonary embolism, and the San Francisco Syncope Rule.

A mere 21 publications hardly represent more than a tenth of their survey substrate, and it would be erroneous to assume those left untested are equally unreliable. It is also reasonable to suggest the decision aids for which the comparisons showed no difference may have suffered from flawed comparator study design rather than a failing of the decision aid itself. Regardless, it should certainly not instill any disproportionate confidence in clinical decision aids as a replacement for thoughtful clinical judgment and experience.
A salient contemporary example of a decision aid of questionable value versus clinical judgment is the Ottawa Subarachnoid Hemorrhage Rule.² This rule, derived and described originally in JAMA, then recently validated prospectively in the Canadian Medical Association Journal, targets an important clinical question: Which patients with acute headache should be evaluated for subarachnoid hemorrhage (SAH)?^2,3 Patients for whom an initial SAH or sentinel bleed is missed tend to have poor and potentially avoidable outcomes. However, the flip side is excessive resource use either by CT scanning or invasive procedures, such as lumbar puncture. A decision aid superior to clinician judgment could add a great deal of value for this clinical scenario.

The good news first: Sensitivity for SAH was the same in the validation as it was in the derivation, effectively 100 percent. Applying the Ottawa SAH Rule, as constructed, would virtually never miss a serious outcome in an acute headache matching the inclusion criteria for the study. That said, in their pursuit of absolute sensitivity, these authors have also followed the breadcrumbs laid out by their statistical analysis to their somewhat inane conclusion: The only path to zero-miss involves evaluating virtually everyone. The specificity of their rule was 13.6 percent, capturing almost all comers in pursuit of their small handful of true positives.

This is an example of a decision aid that, after seven years and thousands of patients, likely cannot be shown to be superior to physician judgment when explicitly studied. No direct comparison was performed, but the underlying physician practice in these various studies was to investigate by either CT or lumbar puncture in between 85 percent and 90 percent of cases; the impact of this rule would be negligible. More concerning is the impact of a rule with such low specificity when used outside the narrow inclusion criteria and high prevalence of specific academic referral settings. It is possible or even likely that misuse of these criteria could lead to many more patient evaluations than by current clinical judgment without detectable advantage in patient-oriented outcomes.

A rule such as this is a prime example of why all decision aids should be tested in practice against physician judgment before their widespread use is encouraged. Given the past history of underwhelming performance of decision aids in direct comparison, this and countless other substitutions for clinician judgment should be viewed with skepticism rather than idolatry.

This should not suggest that decision aids can’t inform clinical judgment prior to formal testing, only that their limitations ought be considered at the time of utilization. Decision aids are derived and tested in unavoidably limited populations, outcomes are measured with flawed or incomplete gold standards, and the prioritization and weighting of different elements in the statistical analysis may have profound effects on the final model. Then, even in those ultimately tested against physician judgment, the same generalizability considerations persist, along with the confounding question of practice culture/environment and similarity to the clinicians involved.

The future of digital cognitive enhancement is bright, and computers may yet replace substantial portions of clinical decision making—but not today!

References

Schriger DL, Elder JW, Cooper RJ. Structured clinical decision aids are seldom compared with subjective physician judgment, and are seldom superior. Ann Emerg Med. 2017;70(3):338-344.e3.
Perry JJ, Sivilotti MLA, Sutherland J, et al. Validation of the Ottawa Subarachnoid Hemorrhage Rule in patients with acute headache. CMAJ. 2017;189(45):E1379-E1385.
Perry JJ, Stiell IG, Sivilotti ML, et al. Clinical decision rules to rule out subarachnoid hemorrhage for acute headache. JAMA. 2013;310(12):1248-1255.

Pages: 1 2 3 | Multi-Page

References

About the Author

Ryan Patrick Radecki, MD, MS

One Response to “Published Clinical Decision Aids May Lack Validation”

Leave a Reply Cancel Reply

Published Clinical Decision Aids May Lack Validation

You Might Also Like

Explore This Issue

References

Related

Alcohol Use Disorder: Screening Tools and Medications in the ED

Opinion: Emergency Physicians Witness the Universal Truth of Humanity

Let Core Values Help Guide Patient Care

Current Issue

About the Author

Ryan Patrick Radecki, MD, MS

One Response to “Published Clinical Decision Aids May Lack Validation”

Leave a Reply Cancel Reply