Preventing Non-attendance in Outpatient Appointments: Predictive Model Development, Validation, and Clinical Assessment

Background: Non-attendance to scheduled hospital outpatient appointments may compromise healthcare resource planning, which ultimately reduces the quality of healthcare provision by delaying assessments and increasing waiting lists. We developed a model for predicting non-attendance and assessed the effectiveness of an intervention for reducing non-attendance based on the model. Methods: Candidate models were built using retrospective data from appointments scheduled between January 1, 2015, and November 30, 2018, in the dermatology and pneumology outpatient services of the Hospital Municipal de Badalona (Spain). The predictive capacity of the selected model was then validated prospectively with appointments scheduled between January 7 and February 8, 2019. The effectiveness of selective phone call reminders to patients at high risk of non-attendance according to the model was assessed on all consecutive patients with at least one appointment scheduled between February 25 and April 19, 2019. Patients identied by the model as high risk of non-attendance were randomly assigned to either a control (no intervention) or intervention group, the last receiving phone call reminders one week before the appointment. Results: Models were trained and selected using 33,329 appointments in the dermatology service and 21,050 in the pneumology service. Average results for specicity and balanced accuracy for the prediction of non-attendance were 79.90% and 73.49% for dermatology, and 71.38% and 64.61% for pneumology outpatient services. The prospective validation showed a specicity of 78.34% (95%CI 71.07, 84.51) and balanced accuracy of 70.45% for dermatology; and 69.83% (95%CI 60.61, 78.00) for pneumology, respectively. The effectiveness of the intervention was assessed on 1,311 individuals identied as high risk of non-attendance according to the selected model. Overall, the intervention resulted in a signicant reduction in the non-attendance rate to both the dermatology and pneumology services, with a decrease of 50.61% (p<0.001) and 39.33% (p=0.048), respectively. Conclusions: The risk of non-attendance can be adequately estimated using patient information stored in medical records. The patient stratication according to the non-attendance


Background
Non-attendance, de ned as a missed appointment without prior noti cation, is an important obstacle for adequate management of healthcare centers. High non-attendance rates are associated with increased waiting lists and healthcare and societal costs, as well as reduced effectiveness and e ciency of the healthcare system [1,2]. At the patient level, missed appointments may lead to inadequate follow-up and late diagnosis or complication management, thus increasing the health risk of non-attendees. Reported non-attendance rates worldwide are highly heterogeneous and range from 13.2% (average countries in Oceania) to 43.0% (Africa); the estimated average rate in Europe is 19.3% [3].
Various authors have proposed interventions to reduce the harmful effects of non-attendance, such as overbooking [4] and open access [5], or to improve attendance rates directly, for example, by providing information, reminders, and incentives to patients [6][7][8]. Of them, the use of appointment reminders based on short message services (SMS) and telephone calls have been widely used [9][10][11]. Although current evidence suggests equal effectiveness of both interventions, reported results are heterogeneous, and most studies have low-quality design [10].
Regardless of the reminding strategy, identifying patients at higher risk of non-attendance may reduce costs and resources, thus increasing the sustainability of the intervention. The determinants of nonattendance are complex and may include patient-related factors (e.g., age and gender), their previous attendance history, and factors associated with the given appointment (e.g., lapse from schedule date, and weekday and season of the appointment) [12,13]. In the last few years, a growing number of models for predicting no-shows have been proposed; however, most of them achieved an accuracy lower than the attendance rate [14]. The poor performance may be attributed to multiple factors that challenge model development, such as the type of data available or the sample size. Furthermore, the high variability of non-attendance rates worldwide suggests that behavioral determinants of non-attendance and the effectiveness of mitigating measures may depend on the country and healthcare system organization.
Therefore, we aimed to develop a model for predicting patients' non-attendance and assess the effectiveness of selective phone calls to patients at high risk of non-attendance according to the resulting model.

Overview of study design
This study was conducted at two outpatient services (i.e., dermatology and pneumology) of the Hospital Municipal de Badalona (Spain) and included three phases: (1) the development of a non-attendance predictive model for each outpatient service, (2) the prospective validation of the resulting models, and (3) a pilot study to assess the effectiveness of integrating the predictive model into the organization of the healthcare provider.
Candidate models were developed using retrospective data from appointments scheduled between January 1, 2015, and November 30, 2018. Data were randomly assigned to one of the following two sets: 75% of the collected data were used for model building and algorithm training, and the remaining 25% were used in a retrospective validation of the model. The predictive capacity of the selected model was then validated prospectively using data from appointments scheduled between January 7 and February 8, 2019. Finally, we conducted a pilot study to assess the effectiveness of a preventive intervention based on selective phone call reminders to patients identi ed as high-risk of non-attendance according to the selected model. The pilot study was conducted between February 25 and April 19, 2019.
All data, including retrospective information for model building and prospective information of the pilot study, were collected anonymously and handled according to the General Data Protection Regulation 2016/679 on data protection and privacy for all individuals within the European Union and the local regulatory framework regarding data protection. All the experimental protocols were conducted in accordance to the Declaration of Helsinki and approved by the independent research committee of Badalona Serveis Assistencials, which waived the need for obtaining informed consent.

Variables collected for model development and validation
We collected three types of variables from the Electronic Medical Record database: sociodemographic characteristics of patients, characteristics of the appointment, and history of patients' attendance. Sociodemographic characteristics included gender, age, nationality, marital status, and home address, which was used to calculate the distance from the patient's home to the hospital. Characteristics of the appointment included hour, weekday, month, type of visit ( rst, second, successive), the reason for the visit, treatment category, physician, lead time (days since scheduling until the appointment date), and rescheduling. Variables regarding the record of patient's attendance included the history of previous attendance, number of prior visits, days since the last appointment, and the last appointment status.

Predictive model development and validation
All variables with a signi cant association with non-attendance were included in training algorithms based on the following models: decision trees, XGBoost, Support Vector Machines (SVM), and k-nearest neighbor (kNN). A 5-fold cross-validation was used in the training, and the class imbalance (approximately, 80% of attendees and 20% of non-attendees) was addressed by strati ed random sampling. We selected for the model the combination of variables with a better capacity for predicting non-attendance.
The performance of the obtained model was retrospectively assessed using the dataset reserved to this end. Because the model was intended to identify patients at high risk of non-attendance, speci city, de ned as the proportion of real non-attendees among all identi ed by the algorithm as high-risk (i.e., ≥ 50% likelihood of non-attendance), was used for measuring performance. Sensitivity (i.e., the proportion of real attendees among low-risk patients) and accuracy (i.e., the proportion of appointments predicted correctly) were also estimated. The model performance in predicting non-attendance was prospectively validated using the same de nitions of performance as for the retrospective validation. The only exception was considering balanced accuracy instead of raw accuracy because of class imbalance in the prospective validation.

Pilot study
The pilot study included all consecutive patients with at least one appointment scheduled between February 25 and April 19, 2019, in either of the two involved services. The primary endpoint of the pilot study was the reduction of the non-attendance rate among patients at high risk of non-attendance (i.e., attendance probability below 50% according to the predictive model obtained). The week before the appointment, patients who were considered at high risk of non-attendance were randomly assigned to either a control or intervention group, balanced regarding age and gender. Right after randomization (i.e., one week before the appointment), patients allocated in the intervention group received a reminder phone call (up to three contact attempts) in which they were encouraged to either attend or early cancel the visit, whereas those in the control group did not receive any reminder. The outcomes related to the appointment reminder (i.e., whether the patient was reached, appointment cancellation or rescheduling, appointment attendance) were recorded. A post-intervention self-guided debrie ng session was conducted on April 26, 2019, following a 3-phase conversational structure, including reaction, analysis and summary phases [15]. Two dermatology and two pneumology specialists, together with the responsible of administrative management and three directors (Medical O cer, Information O cer and Management O cer) participated in the conversation.

Statistical analysis
Quantitative variables were presented as the mean and standard deviation (SD), and qualitative variables as frequency and percentage. Non-attendance rates were calculated by dividing the number of nonattended visits by the number of scheduled visits on a given period. Data from remote appointments, and negative days of waiting time (i.e., introduced in the program after the visit) were excluded from the analysis. Sensitivity, speci city, and accuracy were estimated directly from the contingency table of predicted and real missed appointments, whereas balanced accuracy was calculated as (sensitivity+speci city)/2. Qualitative variables were compared using the Chi-Square test, whereas quantitative variables were compared using analysis of variance (ANOVA). Correlations between quantitative variables were analyzed using the Pearson correlation test, whereas correlations between qualitative variables were analyzed with Cramer's V coe cient. The signi cance threshold was set at a bilateral alpha value of 0.05. All analyses were performed using the R software (version 3.6.1).

Variable analysis
Non-attendance algorithms were developed using data from 33,329 appointments scheduled in the dermatology service and 21,050 in the pneumology service. The global non-attendance rates of these appointments were 20.90% and 18.37% for dermatology and pneumology outpatient services, respectively. When comparing the sociodemographic characteristics, appointment characteristics and attendance history of patients who attended the appointment in the dermatology outpatient service and those who not, signi cant differences were observed in all variables except gender and marital status (Table S2, Supplementary le 1). Similarly, all variables showed a signi cant association with nonattendance in appointments in the pneumology outpatient service, except gender, physician, and number of reschedules (Table S3). We found no strong correlations between variables, neither categorical nor numerical (Table S4).
After assessing the speci city, accuracy, and sensitivity of four training algorithms, we selected the decision trees algorithm for model development. The details regarding model training and selection are provided in Supplementary methods and Table S1 (Supplementary le 1). Figure 1A and Figure 2A show the design of the resulting predictive models for dermatology and pneumology outpatient services, respectively. In the dermatology predictive model, the patient's history of previous attendance was the most relevant factor to predict non-attendance in the future, followed by major ambulatory surgery, the status of the last appointment, number of prior visits, and age ( Figure 1B). This model displayed a speci city of 79.90%, a sensitivity of 67.09%, and an accuracy of 73.49%. Similarly, in the pneumology predictive model, the patient's previous attendance history was also the most important variable to predict non-attendance, followed by lead time, the status of the last appointment, number of prior visits, and number of days since the last visit Figure 2B. The speci city, sensitivity, and accuracy of this model were 71.38%, 57.84%, and 64.61%, respectively.

Model validation
The prospective validation of the non-attendance predictive models included 758 and 637 appointments in the services of dermatology and pneumology, respectively. In the dermatology service, the predictive model identi ed 348 (45.91%) appointments at high risk (i.e., ≥50% likelihood) of non-attendance, 123 of which were actually missed appointments. The total number of real non-attendances was 157, thus yielding a speci city of the model of 78.34% (95%CI 71.07, 84.51). The sensitivity and balanced accuracy of this model were 62.56% (95%CI 71.07, 84.51) and 70.45%, respectively. Correspondingly, 283 (44.43%) appointments scheduled in the pneumology service were identi ed as high risk of non-attendance, 81 of which were missed appointments. The total amount of real non-attendances was 116, resulting in a speci city of 69.83% (95%CI 60.61, 78.00). The sensitivity and balanced accuracy of the pneumology model were 61.23% (95%CI 56.89, 65.43) and 65.53%, respectively. Compared with the retrospective validation used during model development, speci city in the prospective validation was reduced by approximately 2%.

Pilot study
During the study period, 1,311 individuals had at least one appointment to either the dermatology or pneumology outpatient services that was identi ed as high risk non-attendance according to the selected model. Among them, 1,108 (805 and 303 in the dermatology and pneumology services, respectively) had available data and were, therefore, included in the analysis. Of the 805 patients with scheduled visits in the dermatology service, 390 (48.45%) were allocated to the intervention group and 415 (51.55%) to the control group. Correspondingly, 303 individuals had scheduled visits to the pneumology service, 146 (48.18%) and 157 (51.82%) allocated in the intervention and control groups, respectively. Table 1 summarizes the baseline characteristics of the individuals enrolled in the pilot study. None of the variables showed signi cant differences between control and intervention groups, except the time from the last visit among individuals visited at the pneumology service, which was higher in the intervention group than in the control group. In the dermatology setting, 267 (68.46%) individuals allocated in the intervention arm were successfully contacted by phone. From which, 251 attended the appointment, and 16 missed it (non-attendance rate 5.99%). Regarding the pneumology service, 95 (65.07%) individuals of the intervention group were successfully contacted; 86 of them attended the appointment, and 9 did not (non-attendance rate 9.47%). Table 2 summarizes the non-attendance rate of each group in each clinical setting. Overall, the interventions applied resulted in a signi cant decrease of the non-attendance rate for both dermatology and pneumology services, with a reduction of non-attendance of 50.61% and 39.33%, respectively. In both services, non-attendance rates were signi cantly lower among individuals in the intervention group that were successfully contacted than those who could not be reached (79.54% and 62.85% reductions for dermatology and pneumology services, respectively). All participant of the post-study debrie ng consistently perceived the intervention as successful. However, two issues were identi ed: (1) the overload of the hospital agenda after preventing non-shows, and (2) the overburden of the administrative staff associated with phone calls to patients at high risk of nonattendance.

Discussion
We found that the models that better predicted non-attendance in dermatology and pneumology outpatient services were based on decision trees and included the following variables: patient's history of previous attendance, major ambulatory surgery, status of the last appointment, number of previous visits, and age, for dermatology, and patient's history of previous attendance, lead time, status of the last appointment, number of previous visits, and number of days since the last visit, for pneumology. The use of the prediction models to identify individuals at high risk of non-attendance for further selective phone call reminders allowed reducing in approximately 50% and 40% the non-attendance rate in dermatology and pneumology services, respectively.
The systematic review conducted by Carreras et al. showed that at least half of the studies on no-show prediction identi ed age, gender, distance from home to the healthcare center, weekday, visit time, lead time, and history of previous attendance as predictors of non-attendance; marital status and visit type ( rst or successive) were also frequently used [14]. Our ndings were mostly in line with the results reported by Carreras et al., although we did not nd an association between gender and non-attendance, as reported elsewhere [16,17]. Other studies described that non-attendance was associated with the number of previous appointments [18,19], the status of the last appointment [20,21], and the treatment category (e.g., surgery) [22], which was also consistent with our results. Regarding the relative importance of each variable in the model, the status of the last appointment, age, time of the day, lead time, and history of previous attendance are among the most important variables in the non-attendance predictive models presented in various analyses [12,20,23]. In our study, the history of previous attendance and the status of the last appointment also had a high weight in both models. In contrast, lead time and age were mainly important in pneumology and dermatology models, respectively. The time of the day had a small weight in both models.
Based on the performance results of the training algorithms, we chose decision trees to build our models, which was the second most frequently used algorithm to develop predictive models in the review of Carreras et al., after logistic regression [14]. The accuracy values reported in the review for models based on decision trees ranged from 76.5-89.6%, higher than the accuracy found in our analysis. However, most studies had a limited sample size and/or used the same dataset for training algorithms and assessing their performance. Alternatively to this approach, which may lead to over tting, we used an independent dataset for model validation. Therefore, although lower than reported elsewhere, we think our results may better re ect the expected accuracy of the model when applied to the real-world.
Regardless of the validation approach, most studies reported accuracy values lower than the attendance rate [14]. This trend, also observed in our analysis, may be explained by the lack of data from other domains such as social, cultural, and socioeconomic factors that might have a relevant contribution to non-attendance behavior. Finally, we observed a poorer performance of the pneumology model compared with the dermatology model, which might also be due to differences in outpatient procedures and patient complexity between services. These ndings suggest that service-speci c characteristics and predictors from other domains should be included in the development of prediction models for non-attendance.
Like in our pilot study, other authors have reported non-attendance reductions after implementing reminding strategies based on phone calls [24] or, most frequently, short message services (SMS) [9][10][11].
However, phone calls are more expensive than SMS [9,25], and both interventions have high costs for healthcare centers. Irrespective of the type of reminder, predictive algorithms may help to prioritize patients at higher risk of non-attendance, which is likely to improve the cost-effectiveness of the intervention. Furthermore, the quantitative approach to the prediction of non-attendance allows combining more or less compelling interventions based on different thresholds of non-attendance risk (e.g., SMS at risk between 50%-90%, and phone calls at risk ≥90%).
A remarkable consequence of our intervention for reducing non-attendance was the overloading of hospital agendas, highlighted during the debrie ng held after the pilot study. This perception, which is consistent with the effectiveness of the measure, indicates that medical appointments were routinely scheduled on an overbooking basis, assuming certain level of non-attendance. Hence, the potential consequences of improving e ciency in healthcare systems should be considered before implementing these types of solutions. Another concern raised during the debrie ng session was the cost (in terms of time spent by administrative staff) associated with phone calls to individuals at higher risk of nonattendance. The economic impact of this solution can be minimized by implementing call centers shared by various centers or investigating the optimal cut-off of non-attendance risk for a patient to be included in the intervention. Nevertheless, cost-effectiveness analyses that consider the cost associated with nonattendance should be conducted before drawing conclusions on the actual economic impact of this intervention.
The interpretation of our results is limited by the simultaneous assessment of the predictive model and the intervention itself (i.e., phone call reminder), which precluded appraising the contribution of each feature to the non-attendance reduction. However, the main purpose of our pilot study was to assess the applicability of the whole concept to day-to-day practice. Another limitation was the unavailability of data with potential in uence on the non-attendance rate, such as the economic status [26,27], education level [28,29], or certain medical conditions [18,30]. As discussed previously, the lack of social information is common in the development of predictive algorithms elsewhere. Regardless of the future inclusion of these data, the model should undergo continual learning by retraining to assure its validity through time.
The model has to be aware of new patients or categorical features, as well as considering up-to-date data to include the latest trends of non-attendance in each hospital service.

Conclusions
The results of our study show that the use of non-attendance predictive models can be a valuable tool to identify patients at higher risk of non-attending a medical appointment and should be, therefore, prioritized for active reminders such as phone calls. The overloading of the hospital agenda experienced as a consequence of the effectiveness of the intervention underscores the need to consider organizational changes when implementing interventions for reducing non attendance rates. The free availability of our algorithm warrants future research to adapt it to other patient pro les and assess the cost-effectiveness of interventions based patient strati cation according to the risk of non-attendance.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. NonattendanceSupplmat25OCT2021.docx