loading page

Overview of Finnish national patient data repository for research on medical risk assessment
  • +3
  • Viljami Männikkö,
  • Klaus Förger,
  • Henna Urhonen,
  • Jani Tikkanen,
  • Simo Antikainen,
  • Joona Munukka
Viljami Männikkö

Corresponding Author:[email protected]

Author Profile
Klaus Förger
Henna Urhonen
Jani Tikkanen
Simo Antikainen
Joona Munukka


The Kanta Patient Data Repository contains healthcare data from the population of Finland for more than a decade. The repository is a continuously expanding real world dataset produced by many information systems and healthcare service providers. Kanta data has been accessible for secondary uses such as scientific research since 2019. The data can be requested from the Finnish authority Findata. However, before a request has been accepted, it is difficult to assess if the accumulated data allows answering a specific research question. Publicly available descriptions of data structures in Kanta do not tell how much they are used in practice. This publication enables future data use cases by providing a view on the overall availability of types of structured health data in the Kanta patient data repository based on a sample of 96 200 medical histories of over 18-year-old patients. We conclude that Kanta PDR is a promising source of real world data for development and evaluation of medical risk calculators within the Finnish population. The wide coverage of the Finnish population and timeliness of the data are its strengths as a source of research data also outside of Finnish context. However, the limitations on data availability in variable level need to be considered on a case-by-case basis. Main challenges in the use of Kanta data are multiple code systems for laboratory results, short durations of recorded data for specific data types, and missing or very rarely used structured format e.g., in cases of tobacco and alcohol use.
26 Feb 2024Submitted to TechRxiv
27 Feb 2024Published in TechRxiv