loading page

Opportunities for Automated Liver Disease Risk Prediction in the Finnish Healthcare Environment
  • +2
  • Viljami Männikkö,
  • Janne Tommola,
  • Emmi Tikkanen,
  • Olli-Pekka Hätinen,
  • Fredrik Åberg
Viljami Männikkö

Corresponding Author:[email protected]

Author Profile
Janne Tommola
Emmi Tikkanen
Olli-Pekka Hätinen
Fredrik Åberg


Chronic liver disease incidence and mortality have been rising worldwide. In many cases, liver disease is detected late in the symptomatic stage, while the earlier detection would be crucial for early initiation of preventative actions. "The Chronic Liver Disease score", CLivD, risk detection model has been developed with Finnish healthcare data and it predicts a person's risk of getting the disease in future years. In this study, real-world data repository (Kanta) was used as a data source for "The ClivD score" risk calculation model. Our dataset consisted of 96 200 individuals from Kanta. We had two main objectives: 1) to evaluate feasibility to implement automatic CLivD score with current Kanta platform, 2) to identify and suggest the improvements for Kanta that would enable accurate automatic risk detection. We found that Kanta currently lacks many CLivD risk model input parameters in the structured format required to calculate precise risk scores. However, the risk scores can be improved by utilizing the unstructured text in patient reports and by approximating variables by utilizing other health data like diagnosis information. With only utilizing structured data we were able to identify only 33 persons out of 51 275 persons to "Low risk" category and under 1% to "moderate risk" category. By adding the diagnosis information approximation and free text utilization we were able to identify 37% of persons to "Low risk" category and 4% to "moderate risk" category. In both cases we were not able to identify any persons to "high-risk" category because of the missing waist-hip ratio measurement. We evaluated three scenarios to improve the coverage of waist-hip ratio data in Kanta and these yielded the most substantial improvement in prediction accuracy. We conclude that the current structured Kanta data is not enough for precise risk calculation for CLivD or other diseases where obesity, smoking and alcohol use are important risk factors. Our simulations show up to 14% improvement in risk detection when additional data sources are considered. Kanta shows potential for implementing nationwide automated risk detection models that could result in improved disease prevention and public health.
24 May 2024Submitted to TechRxiv
30 May 2024Published in TechRxiv