By Howard Rodenberg, MD, MPH, CCDS for ACDIS CDI Blog
Many are familiar with the scene from A Few Good Men where Jack Nicholson’s old-school marine tells Tom Cruise’s younger antagonist, “You can’t handle the truth!” My favorite application of that same line comes from Carl Reiner and Mel Brooks’ 2000-Year-Old Man:
“You have some kind of thing with the truth? Everything gotta be with the truth? We do these records together and you chase me into the corner for the truth?”
“But because you lived so long you could tell us what’s the truth.”
“If I told you the truth you couldn’t handle the truth!”
(This is followed by a discussion of the lost 11th Commandment, “Thou shalt not squint.” And since I can’t write in dialect, I’ll just refer you to the scene itself.)
Even when one can handle the truth, it’s often difficult to know just what it is. I’ve always subscribed to the idea that if you get a story about the same thing from two different people, the truth lies somewhere in the middle. But it seems like now we’re in a scenario where cultural relativism—the idea that beliefs are validated in the context of one’s own culture and cannot be judged by outsiders—has evolved into factual relativism, where the truth is only the truth if you concur.
Remembering that truth may be only a construct has been helpful to me as I think about natural language processing (NLP) and artificial intelligence (AI) systems for CDI. There’s a subtle difference between the two. NLP programs search through documents to find keywords, phrases, or findings associated with specific clinical conditions to suggest opportunities to enhance documentation. AI programs use these same elements to build a clinical picture that suggests an underlying condition.
To use an admittedly crude example, an NLP program might see the word “pneumonia” within the medical record and prompt a query for specificity. An AI program might also pick up on “pneumonia,” but also that the patient has a history of a cerebral vascular accident and a feeding tube to suggest the possibility of aspiration pneumonia. (In reality, the distinction isn’t always this clear, and the inner workings of these programs tend to blur the lines.)
While their mechanisms are slightly different, both types of CDI software claim to be able to identify medical records with opportunities to better define and document clinical conditions. Vendors cite the potential for significant return on investment (ROI) if only CDI programs use these systems and clinicians respond to the program’s prompts. The program is considered the “Source of Truth,” and if only one follows the truth, significant financial gains are to be had.
This is where I hit a roadblock, as I’m not convinced that these programs really are the source of truth. You may have heard the terms specificity and sensitivity applied to lab testing, but they apply here as well. Sensitivity is the chance (expressed as a percentage) that a finding or process can identify those with disease, while specificity is the chance that a patient does not have a particular condition. These values are used to calculate positive and negative predictive values (the chance that a positive test is actually positive, and vice-versa). Truth, as we think of it in an absolute sense, must be 100% sensitive, 100% specific, and with 100% positive and negative predictive values. So, we would expect that, if the CDI software is giving us the truth, 100% of those suggestions would be accepted by the clinician.
Of course, that’s not how it works in real life. Clinical care occurs along a spectrum, and while the old diagnostic adage, “If you hear hoof-beats, look for horses and not zebras” is generally true, there are enough other even-toed ungulates out there that the truth often remains elusive. And the timeframe also matters. We already know that a principal diagnosis is not assigned until “after study.” It’s difficult for a clinician, let alone a software entity, to establish the truth before the workup is completed.
So, what’s the best way to evaluate claims for the efficacy of these programs? I would contend that the key metric to look at would not be projections of revenue that are based on the use of the tool as the “Source of Truth.” Instead, I might look at other metrics. Key for me would be looking at agreement rates between the program’s suggestions and CDI specialists’ concurrence of the need for query, and at the response and agreement rate to software query prompts by clinicians. These measures may help to better understand the true potential impact of these systems as filtered through the informed judgement of the CDI specialist and the clinical realities of patient care. Projections of fiscal impact can then be adjusted to better reflect a pragmatic ROI.
There’s another impact to relying on CDI software as the “Source of Truth.” Software that generates queries generally place their templates within the clinical workflow. If these queries show up as a “hard stop,” clinicians will do whatever it takes to make them go away. They may well simply check a box, any box, to clear the screen. This can lead to inappropriate documentation, incorrect coding, and risk an allegation of fraud.
Admittedly, I’ve been speaking in broad terms of CDI software marketed as a turnkey operation that does chart review, develops queries, and issues reports as a single package. Some vendors, however, recognize the presence of clinical variation, and will recommend their software be used to prioritize a record for CDI review. I do think this model provides some built-in safeguards to screen out NLP or AI recommendations that don’t make clinical sense. To me, this seems like a better way to use these systems. Prioritizing records with significant documentation opportunities while deescalating those charts that are “cut and dried” is a valid way to use these tools in a fashion that takes advantage of CDI skillsets and maximizes productivity.
I’ve spilled some ink on the problem of thinking about NLP and AI systems as the “Source of Truth.” But I would be remiss not to note that these programs are here to stay. My own sense is that the technology is not yet reliable enough to function independently, but I suspect that in a decade or so (when I had better be retired—if not, I’ve clearly done something wrong) issues of sensitivity, specificity, and predictive values will be retooled and refined. I would not be surprised if the costs savings from not employing larger numbers of CDI staff to review individual charts, coupled with the 24/7 nature of electronic chart review and the increasing comfort of new clinicians with EHR systems, make semi-autonomous CDI software the frontlines of provider interactions.
And that’s the truth.