New Method Utilizing Continuous Time Markov Chains to Analyze Evolution in the Nine States Model of Non-Alcoholic Fatty Liver Disease

it handles the disease in more extended multistate model


Introduction
CTMC is commonly used to model data obtained from longitudinal studies in medical research and to investigate the evolution and progression of the diseases over time. Estes et al. [1] used multistate Markov chains to model the epidemic of nonalcoholic fatty liver disease. Younossi et al. [2] used the multistate Markov chains to demonstrate the economic and clinical burden of nonalcoholic fatty liver disease in United States and Europe. According to American Association for Study of Liver Disease , American College of Gastroenterology, and the American Gastroenterological Association, NAFLD to be defined requires (a) there is evidence of hepatic steatosis (HS) either by imaging or by histology and (b)there are no causes for secondary hepatic fat accumulation such as significant alcohol consumption, use of steatogenic medications or hereditary disorders [3].This is the same definition established by European Association for the Study of the Liver (EASL),European Association for the Study of Diabetes (EASD)and European Association for the Study of Obesity(EASO) [4]. NAFLD can be categorized histologically into nonalcoholic fatty liver (NAFL) or nonalcoholic steato-hepatitis (NASH). NALF is defined as the presence of ≥ 5% (HS) without evidence of hepatocellular injury in the form of hepatocyte ballooning .NASH is defined as the presence of ≥ 5 % HS and inflammation with hepatocyte injury (ballooning), with or without any fibrosis.
NAFLD is a multistage disease process consisting of 9 stages as depicted in figure 1 [2]. As shown in the figure; the patient can move across the stages of the disease process. While the remission rates are allowed from stage 4 (compensated liver cirrhosis) to the earlier stages, patient progresses to HCC and liver transplantation once he arrives to stage 5(decompensated liver cirrhosis) and remission rates are not allowed. Death state can be reached from any state. The patient can move from the first 5 stages to stage 8 (HCC) with higher rate of progression from stage 4(CC) or stage 5 (DCC) to stage 8( HCC) compared to first 3 stages. A brief definition of each stage is illustrated below the figure (1). NAFLD stages are modeled as time homogenous CTMC , that is to mean ( ) depends on and not on ,with constant transition intensities over time, exponentially distributed time spent within each state and patients' events follow Poisson distribution. The states are finite and can be defined or identified based on various aspects such as clinical symptoms and invasive or noninvasive investigations. The gold standard method for classification of histopatholgical changes in the liver is the invasive liver biopsy. It is presently the most trustworthy procedure for diagnosing the presence of steatohepatitis (HS) and fibrosis in NAFLD patients [5]. The limitations of this procedure are cost, sampling error, and procedurerelated morbidity and mortality. MR imaging, by spectroscopy [6] or by proton density fat fraction [7], is an excellent noninvasive technique for quantifying HS and is being widely used in NAFLD clinical trials [8] .The use of transient elastrography (TE) to obtain continuous attenuation parameters is a promising tool for quantifying hepatic fat in an ambulatory setting [9]. However, quantifying noninvasively HS in patients with NAFLD is limited in routine clinical care. Also one of the most recent biological markers are the keratin(K18) and its caspase-cleaved fragments(cK18).There are many scoring systems that can identify the stages of the disease process [10].NAFLD has higher prevalence rate in individuals with risk factors such as visceral obesity, type 2 diabetes mellitus (T2DM), dyslipidemia, older age , male sex and being of Hispanic ethnicity [11].
For simplicity, all individuals are assumed to enter the disease process at stage one and they are all followed up with the same length of time interval between measurements.
The paper is divided into 7 sections. In section 1 the transition probabilities and transition rates are thoroughly discussed. In section 2 mean sojourn time and its variance are reviewed. In section 3 state probability distribution and its covariance matrix are discussed. While in section 4 the life expectancy of the patients are considered. In section 5 expected numbers of patients in each state is obtain .A hypothetical numerical example is used in section 6 to illustrate the above concepts. Lastly a brief summary is comprehended in section 7. In supplementary materials, appendix A provides the Matlab code for rate matrix estimation while appendix B provides Matlab code for probability matrix estimation, utilizing Matlab program version 14.

1.Transition Rates And Probabilities
NAFLD is modeled by a multistate Markov chains which define a stochastic process For the above multistate Markov model demonstrating the NAFLD disease process; the forward Kolomogrov differential equations are the following: Solving the Kolmogorov differential equations will give the transition probability matrix ( ) ( ) ( ) satisfies the following properties : ( ) While the Q matrix satisfies the following conditions: Where the is the ( ) entry in the Q matrix emphasizing that the depends only on the interval between and not on .

Maximum Likelihood Estimation of the Q Matrix
Let be the number of individuals in state at and in state at time . Conditioning on the distribution of individuals among states at , then the likelihood function for is

According to Kalbfleisch and Lawless [12]
, applying Quasi-Newton method to estimate the rates mandates calculating the score function which is a vector -valued function for the required rates and it's the first derivative of the probability transition function with respect to .The second derivative is assumed to be zero .
Assuming the second derivative is zero and The Quasi-Newton formula is , ( )-( ) According to Klotz and Sharples [13] the initial For this NAFLD process ( )

Mean Sojourn Time:
It is the mean time spent by a patient in a given state i of the process. It is calculated in relations to transition rates ̂. These times are independent and exponentially distributed random variables with mean where .
According to Kalbfleisch and Lawless [12] the asymptotic variance of this time is calculated by applying multivariate delta method: For this NAFLD process ( )

State Probability Distribution
According to Cassandras and Lafortune [14], it is the probability distribution for each state at a specific time point given the initial probability distribution. Thus using the rule of total probability; a solution describing the transient behavior of a chain characterized by Q and an initial condition ( ) is obtained by direct substitution to solve: ( ) ( ) ( ) Stationary probability distribution when goes to infinity or in other words when the process does not depend on time is obtained by differentiating both sides of the following equation: at specific time point is obtained by solving this system of differential equations. Solving these differential equations for this complex chain is not a trivial matter. If the limit of ( ) exists , so there is a stationary or steady state distribution and as the ( ) , since ( )does not depend on time .Therefore ( ) ( ) will reduce to ( ) . The stationary state probability distribution is obtained by solving subject to ∑ For this NAFLD process ( )

Asymptotic Covariance of the State Probability Distribution:
Multivariate delta method is applied to following function ( ) to obtain asymptotic covariance matrix of the state probability distribution, as is not a simple function of thetas . Differentiating ( ) implicitly with respect to is used in the following manner: For this NAFLD process: ( )

Life Expectancy of Patient in NAFLD Disease Process:
The disease process is composed of 8 transient states and one absorbing (death state). So the Q matrix is partitioned into 4 sets : Also the differential equations can be partitioned into the following: ( ) The moment theory for Laplace transform can be used to obtain the mean of the time which has the above cumulative distribution function. CTMC can be written in a Laplace transform such that :

Expected Number of Patients in Each State
Let ( ) be the size of patients in a specific state at specific time .The initial size of patients ( ) ∑ ( ) , as there are 8 transient states and 1 absorbing states, where ( ) is the initial size or number of patients in state at time given that ( ) i.e initial size of patients in state 9 (absorbing death state) is zero at initial time point . As the transition or the movement of the patients among states are independent so at the end of the whole time interval ( ) and according to Chiang [15], there will be ( ) patients in the transient states at time , also there will be ( )patients in state 9 (death state) at time .

6.Hypothetical Numerical Example :
To illustrate the above concepts and discussion, a hypothetical numerical example is introduced. It does not represent real data but it is for demonstrative purposes.( see suppl. Info. Section 6 ) A study was conducted over 15 years on 1050 patients with risk factors for developing NAFLD such as type 2 diabetes mellitus, obesity, and hypertension acting alone or together as a metabolic syndrome. The patients were decided to be followed up every year by a liver biopsy to identify the NAFLD cases, but the actual observations were recorded as shown in the (see supplementary material). The estimated transition rate matrix Q is: To calculate goodness of fit for multistate model used in this model, it is like the procedure used in contingency table, and it is calculated in each interval then sum up: Step 1: Step 2: calculate the ( )

( ) [ ]
Step 3: calculate the expected counts in this interval by multiplying each row in the probability matrix with the corresponding total marginal counts in the observed transition counts matrix in the same interval to get the expected counts: Step 5: sum up the above results to obtain: So from the above results the null hypothesis is rejected while the alternative hypothesis is accepted and the model fits the data that is to mean the future state depends on the current state with the estimated transition rate and probability matrices as obtained.

Conclusion and Summary
Continuous time Markov chains are suitable mathematical and statistical tools to be used for analysis of disease evolution over time. CTMCs being a type of multistate models are utilized to study this evolution in NAFLD patients, with its main phenotypes NAFLD and NASH, as well as the associated presence of fibrosis and its stages. The prevalence of NAFLD is rapidly increasing worldwide, and parallels the epidemics of obesity and type 2 diabetes. Metabolic syndrome is a well-known risk factor.
In the present study, NAFLD is modeled in more elaborative expanded form, which includes nine states: the first eight states are the states of disease progression as time elapses; while, the ninth state is the death state. The importance of such analysis is that the health policy makers can predict the number of affected patients at each stage, the needed investigations and medications for each of them, and the costs and budgets that the medical insurance should assign to this disease burden. This analysis is of great value and benefit to the physicians, as they can conduct longitudinal studies to explore further investigations that better define each stage specifically and efficiently, as well as to explore further treatment needed for each stage. An example of the non-invasive diagnostic tools is the circulating level of cytokeratin-18 fragments, although promising it is not available in a clinical care setting and there is not an established cut-off value for identifying steato-hepatitis (NASH) [16]. A genetic polymorphism of patatin-like phospholipase domain-containing protein 3 gene variants (PNPLA-3) are associated with NASH and advanced fibrosis, however testing for these variants in routine clinical care is not supported and needs further studies.
The hypothetical examples of factitious non-real data is used to emphasize the attributes need to be estimated: Such analysis may give better insights to physicians, especially when new drug classes will soon be released in the market. What drug classes are to be used first? How to monitor the disease throughout the journey of treatment? What investigations to be used in such monitoring? How to modify the drug treatment? What is the target that needs to be achieved and how to maintain this target? And what is more to be said; that is in late stage of the disease, when patients suffer from decompensated liver cirrhosis; liver transplantation is the treatment of choice to such patients, which increases the economic burden of NAFLD as was the disease course during treatment in early stages. Also, the load of what are the best economic noninvasive tests to be used in primary health care units for stratification and identification of high risk patients, whether to do genetic tests in health insurance setting, and when to refer for liver biopsy in secondary or special clinics. All these questions can be answered from such longitudinal studies conducted on susceptible individuals. Over and above, some of the recently investigated noninvasive scoring systems of fibrosis need further external validation so as to be generalized in ethnicities other than the one tested upon. There are some controversies of cutoff points of these scoring systems among countries, and among ethnicities within the same country. Although liver biopsy is considered the standard method for diagnosis of NAFLD and staging it; its limitations encourage the development of various noninvasive tests, which necessitate better correlation between the findings obtained from the biopsy and the results of these tests to minimize the misclassification errors, which hamper good diagnosis and prognosis of the patient. These tests should be easy, feasible, convenient and with high safety profile to be used repeatedly in patients for follow up in such longitudinal studies.
Multistate model represented by CTMC is a valuable statistical methodology, for longitudinal studies in medical researches to better comprehend and understand the pathophysiology, or the mechanism of the NAFLD process, and the interactions between the different modifiers either the external, or the internal modifiers. The external modifiers reside in bad dietary habits with excessive fat and carbohydrate ingestion, as well as sedentary life; while, the internal modifiers are represented in genetic factors affecting the metabolism of the food stuff (fat and carbohydrates) and other cellular functions such as risk factors for fibro-genesis (formation of fibrous tissue); as, fibrosis is a detrimental predictor factor for disease progression to liver cirrhosis and its complications. The importance of such understanding has a great impact to reveal the genes that must be tested if ever needed, for whom to do such a test, and should it be in the utilities or services offered by the medical insurance. Moreover, should the degradation byproducts resulting from extracellular matrix destruction be used in routine clinical practice to mirror the fibrosis stages?
In Egypt, there are scare data, or may be no available data, about the prevalence of NAFLD and its phenotypes. Guidelines for risk stratification and identification are also lacking. Thus, more longitudinal studies are needed to cover these issues.
Multistate models can also be used for analysis of competing risks to death in such patients, as the first and second most common causes of death in NAFLD patients are the cardiovascular diseases (CVD) and kidney diseases, while the liver-related mortality is the third common cause of death.
Some other statistical methodologies, like : semi Markov and hidden Markov chains can be used to model NAFLD, especially hidden Markov CTMC can be used to model misclassification errors encountered in studies conducted by time homogenous CTMC.

Declarations:
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable Availability of data and material Not applicable. Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Competing interests
The author declares that I have no competing interests. Funding No funding resource. No funding roles in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript are declared Authors' contribution I am the author who has carried the mathematical analysis as well as applying these mathematical statistical concepts on the hypothetical example.