Alcoholic liver disease: A registry view on comorbidities and disease prediction
Research output: Contribution to journal › Journal article › peer-review
- Alcoholic liver disease: A registry view on comorbidities and disease prediction
Final published version, 979 KB, PDF document
Alcoholic-related liver disease (ALD) is the cause of more than half of all liver-related deaths. Sustained excess drinking causes fatty liver and alcohol-related steatohepatitis, which may progress to alcoholic liver fibrosis (ALF) and eventually to alcohol-related liver cirrhosis (ALC). Unfortunately, it is difficult to identify patients with early-stage ALD, as these are largely asymptomatic. Consequently, the majority of ALD patients are only diagnosed by the time ALD has reached decompensated cirrhosis, a symptomatic phase marked by the development of complications as bleeding and ascites. The main goal of this study is to discover relevant upstream diagnoses helping to understand the development of ALD, and to highlight meaningful downstream diagnoses that represent its progression to liver failure. Here, we use data from the Danish health registries covering the entire population of Denmark during nineteen years (1996-2014), to examine if it is possible to identify patients likely to develop ALF or ALC based on their past medical history. To this end, we explore a knowledge discovery approach by using high-dimensional statistical and machine learning techniques to extract and analyze data from the Danish National Patient Registry. Consistent with the late diagnoses of ALD, we find that ALC is the most common form of ALD in the registry data and that ALC patients have a strong over-representation of diagnoses associated with liver dysfunction. By contrast, we identify a small number of patients diagnosed with ALF who appear to be much less sick than those with ALC. We perform a matched case-control study using the group of patients with ALC as cases and their matched patients with non-ALD as controls. Machine learning models (SVM, RF, LightGBM and NaiveBayes) trained and tested on the set of ALC patients achieve a high performance for data classification (AUC = 0.89). When testing the same trained models on the small set of ALF patients, their performance unsurprisingly drops a lot (AUC = 0.67 for NaiveBayes). The statistical and machine learning results underscore small groups of upstream and downstream comorbidities that accurately detect ALC patients and show promise in prediction of ALF. Some of these groups are conditions either caused by alcohol or caused by malnutrition associated with alcohol-overuse. Others are comorbidities either related to trauma and life-style or to complications to cirrhosis, such as oesophageal varices. Our findings highlight the potential of this approach to uncover knowledge in registry data related to ALD.
Author summary Alcoholic liver disease (ALD) is one of the most common chronic liver disease worldwide. It progresses from fatty liver to alcoholic liver fibrosis then to cirrhosis. Unfortunately, people with early-stage ALD have almost no symptoms, for which reason most patients are only discovered when it is already too late. We have thus worked on finding an effective way to detect ALD at an early stage by searching for signs of alcohol over-use among patients. To this end, we analyzed big data from the Danish National Patient Registry, which covers the whole population of Denmark. We found that only 499 patients were diagnosed with alcoholic liver fibrosis and that alcoholic liver cirrhosis is the most frequent form of ALD in registry data. We identified typical diagnoses seen in patients before they developed cirrhosis, many of which relate to the liver not functioning properly. However, we found that patients with fibrosis were much harder to identify based on their past medical records, since most of them show very few signs of being sick.
|Journal||P L o S One|
|Number of pages||19|
|Publication status||Published - 2020|
- DATA QUALITY, CIRRHOSIS, RISK, MANAGEMENT, PATIENT, HISTORY, BURDEN
Number of downloads are based on statistics from Google Scholar and www.ku.dk