Blog

UTILIZATION MANAGEMENT (UM) and the Biases of ARTIFICIAL INTELLIGENCE (AI)

Written by James Haering, DO SFHM | 8/31/21 12:58 PM

The use of Artificial Intelligence (AI), in medicine has been in existence formally since the mid 1950’s, and is more widespread than most people would think. The healthcare industry could easily be the greatest adopter-benefactor of AI whether in the form of machine learning or the use of deep learning. All of us at some level will interact with and be impacted by medical AI. And, along with its great promise, there are risks and limitations that need to be considered. If we are not vigilant to these issues the very systems that we rely upon to improve patient care and outcomes, may lead to negative impacts or even patient harm. Here we will discuss some of the pitfalls of healthcare AI.

WHAT IS AI?

In its most basic form it is the utilization of computer systems to perform human tasks that normally require human intelligence to perform said tasks. In healthcare this may include:

  • Object Detection. Common examples of this include screening pathology slides for cancer, or retinal exams for pathology.
  • Solving Complex Problems. Examples include monitoring population-based mega data to determine which patients are at risk of hospitalization, or developing certain diseases such as heart failure.
  • Medical Decision Making. This may include applying algorithms to determine the best antibiotic choice for a patient’s infection. In utilization management, this may include determining who is appropriate for inpatient status versus observation services.
WHY USE AI?

Working smarter is the objective. Utilizing technology to perform monotonous tasks allows for more personalized, better patient care. If we can free up certain time-consuming and low-risk tasks for automation, we open up additional avenues for growth and improvement. Here are some ways in which automation works to our advantage:  

  • Calculations. Especially involving large data sets. The AI may perform calculations and solve problems faster than a human. An example is the ability for computer programs to process complex radiological studies such as CT and MRI in a matter of minutes, when the previous process required hours.
  • Consistency and Performance. Once it is developed, the AI may perform the same function repeatedly with nearly 100% consistency. In contrast, the interrater reliability of determinations made by humans rarely reaches 100%, and is often much below this. As an example, the goal for interrater reliability of secondary review by physician advisor for the determination of inpatient versus observation services, is 95%; unfortunately, this goal is infrequently achieved.
  • Increased Monitoring. Programs may run 24/7, and provide nearly continuous monitoring, depending on their design and function. In the setting of Health Level Seven International’s (HL7) Fast Healthcare Interoperability Resources (FHIR), the AI may monitor for certain triggers or events that are of interest to Utilization Management (UM), such as the placement of an inpatient order. In contrast, the typical case manager, with 20+ patients, may screen each case only one or two times per day. 
  • Cost. Beyond its initial development, most AI programs have little to no maintenance costs. In contrast, providing UM staff 24/7 to cover all cases, is (although, we may personally disagree) cost prohibitive for hospitals.
  • Accuracy. In certain instances, AI may be more accurate than the UM team. It is unreasonable to expect a human to constantly maintain an up-to-date knowledge base of all the rules and regulations of UM, including the variations by payer, by state, and by other factors. Despite my 20+ years in UM I cannot recite all the elements of national coverage determinations, local coverage determinations, variations by state of Medicaid rules, etc.
  • Depth. Depending on its structure, AI may have access to a larger and wider data set than it’s human counterpart. In UM this may translate to the knowledge of an updated insurance policy, awareness that the patient was hospitalized at another facility within the past few days, or that a prior authorization was obtained for inpatient status.
AI BIAS

Algorithms are only as good as those who program and create them. They will only deliver the results derived from their design. This means that unintended bias may creep into the system and lead to unexpected results. Examples of these include:

  • Coverage Bias. When developing an AI algorithm, it is critical to ensure that the training data set (the information used to train the AI), includes cases representative of the population in which the AI will be used. As a simplified case example, here are some considerations for AI designed to determine the risk of myocardial infarction using the coronary calcium scores with a data set of young, healthy women in their 20s, but applied to the general population of a hospital.
    • Consider the commercially available data sets used by UM on a daily basis, to determine the medical necessity of inpatient hospitalization. The criteria used to determine admission may include certain threshold values such as creatinine, or hemoglobin level. These values are typically derived from well sourced, peer-reviewed, scientific studies. However, these studies may not have included adequate representation of certain subpopulations (age, gender, race, etc.). In other words, the inclusion/exclusion criteria may not match the patient’s true clinical risk.
    • Consider a model designed to predict the length of stay and probable discharge date. This would be useful for assigning the priority of review by UM to patients with anticipated early discharge. If bias is not considered, then case managers may be assigned to patients with higher resources (wealthier, “better” insurance, and resource-rich neighborhoods), over the underprivileged patients, who may need more support, and due to the lack of resources, typically have a longer length of stay.
  • Conformity Bias. This is the tendency for individuals to adhere to group norms of behavior. In UM, this form of bias may develop in a feedback loop between the interface of AI and the Physician Advisor (PA). As an example, a PA group has a large data set of previous PA determinations, and establishes an algorithm for decision-support. The AI determines that if a patient with chest pain has certain criteria, 80% of PA determinations are for observation services. Future cases of chest pain are presented to the PA, whose performance review requires 95% interrater reliability of his/her determinations with the peer group. Armed with the knowledge that 80% of peers made the determination of observation, the PA makes the same determination. Subsequent cycles of case review lead to an increasing rate of observation for patients present with chest pain. This specific example sacrifices accuracy in the determination for consistency. 
  • Informational Bias. This is a type of cognitive bias referring to the idea that more information equals better decision-making, even if the extra information is irrelevant to a specific AI process. In healthcare, this may include making an AI process unnecessarily complex without improving the overall value of the output. In contrast, UM AI may suffer from the bias of inappropriate exclusion of information relevant to the determination. Consider the social determinants for hospitalization as an inpatient. Despite the fact that CMS recognizes social factors as a key component to the decision for admission, these are often discounted by payers, and even the UM team; physician advisors included. As an example, an elderly patient living alone, three hours away from the hospital, with moderate cognitive deficits, has elevated risk of adverse events without patient management, and in combination with other factors, may lead to the determination that medical necessity is present for hospitalization.
  • Perfection Bias. This is less of a perfection bias incorporated into AI, and more of a barrier to the broader acceptance of AI in healthcare. Even with Deep Learning, high-levels of accuracy are elusive. Since a failure to diagnose a clinical condition may be devastating to the patient, and is a common reason for lawsuits in healthcare, there is hesitancy in implementing AI when sensitivity and specificity are low. 
    • Consider a program that is designed to screen chest x-rays for changes of cancer. No one would expect the program to replace a radiologist, if 10% of cancers were missed.
    •  The AI team may struggle for years to achieve the lofty goal of a 99.99% sensitivity for detecting cancer. 
    • Early on, the same program may have the ability to prescreen imaging, and with near 100% accuracy, exclude malignancy from 40% of the studies. This has the effect of eliminating non-value added imaging reviews, allowing the radiologist to focus on the higher risk studies. 
    • The practical application of AI in interpretation of EKGs is a familiar example to most physicians. Initially, these programs were very inaccurate in their interpretation, especially with heart rhythms. It was imperative that physicians carefully review each EKG; with subsequent refinements machine interpretation is highly accurate, especially with normal tracings. An argument can be made that a large percentage of EKGs do not require formal interpretation by a physician. For the physician, mentally, a large amount of time is lost in this non-value added work. The practice appears to continue due to a combination of fear of litigation, and the continued payment by insurance of the physician interpretation fee. 
  • Other Bias. Healthcare AI is primarily developed by software engineers and data scientists, (hopefully) with involvement of clinicians familiar with the real-world environment where the AI will operate. Assumptions are made regarding the end-users actions, unfortunately, and these assumptions can be wrong. Here are some examples:
    • We developed an HL-7 system of monitoring, to notify case management if a traditional Medicare patient hospitalized as an inpatient, has a discharge order placed before the second midnight of their admission. The purpose is to identify cases for urgent review; ensuring that either the documentation adequately supports the short inpatient stay, or the patient is converted to observation services. At one hospital, it was a common practice to place conditional discharge orders, e.g. “discharge if okay with cardiology.” This was taken to the extreme by physicians placing discharge orders at the time of the initial evaluation in the emergency department. Understandably, this would lead to multiple false alarms. The algorithm functioned appropriately, but we failed to take into consideration the unusual behavior at this facility. 
    • In another case, the natural language AI program failed to accurately identify patients that were medically appropriate for inpatient status. An audit showed that physicians typically auto populated their history and physical with the objective findings, past medical history, and generic-normal review systems and exams. When presented with normal findings and a lack of physician documentation of the medical necessity for hospitalization, the AI more often than not, determined that inpatient status was not appropriate. The programmers did not take into consideration the frequent lack of adequate physician documentation.
SOLUTIONS

The developers of healthcare AI need to ensure that there is “machine fairness“. It’s equally important to require constant monitoring to ensure that unintended bias is identified, mitigated or eliminated. 

When considering the use of AI in healthcare, especially UM, perfection need not be the goal. AI should be leveraged to decrease the overall burden to the UM team. This includes systems that:

  • Continuously Monitor. Systems can work 24/7/365, and be programmed for significant event alerts or “triggers,” that require UM evaluation and possible intervention. E.g., a patient with traditional Medicare, hospitalized as an inpatient, with a discharge order on day two, potentially acquiring a Condition Code 44. 
  • Decrease “Unnecessary” Reviews. Systems designed well can make appropriate inferences. E.g. a patient hospitalized in the ICU, on a ventilator, with presser agents, and septic shock, with an inpatient order. 
  • Provides Decision Support. Promoting increased accuracy in UM determinations and better decision making capabilities is a great upside of AI. E.g. flagging procedures that are “impatient only,” or supplying prompts regarding obscure rules and regulations that are relevant to the individual case.

Ultimately, in the foreseeable future, AI will remain an assistive process to healthcare, and not a replacement of the healthcare profession. We can get excited about the future of healthcare including more precision medicine. Thanks to increased patient data analytics and the presence of AI, technology assistance will (hopefully) become our best-new-data-friend allowing for more intelligent workflows and faster, more accurate value-based care. As we evolve with AI, we can expect to see improved data protection and patient monitoring, as well as increased diagnostics accuracy, and boosts in clinical performance with additional fraud and data security solutions. The future’s looking bright with appropriate (designed) assistance from AI.

JBH Solutions is proficient in FHIR and with Physician leadership to help design better systems. Contact us today to build better solutions together.