Multilayer Perceptron Neural Network Approach to Classifying Learning Modalities Under the New Normal

Because of community quarantines and lockdowns during COVID–19 times, the Philippine’s Department of Education (DepEd) implemented blended learning (BL) [both online and offline distance learning modalities (LM)] among basic educational institutions in the hope of continuing learners’ learning experiences amidst the pandemic. Learners’ LM are classified through the use of an Algorithm for Learning Delivery Modality as recommended by DepEd. Based on initial investigation, mismatches in learners’ LM were, however, observed, resulting in learners’ massive shifting from one LM to another in the middle of the school year. In this study, we introduced an approach to classifying learner’s LM using machine learning (ML) techniques. We compared the effectiveness of five ML classifiers, namely the random forest (RF), multilayer perceptron neural network (MLP NN), K-nearest neighbor (KNN), support vector machine (SVM), and Naïve Bayes (NB). Learner’s enrolment and survey form (LESF) data from the repository of a local private high school in the Philippines is used in model formulation. We also compared three existing feature selection (FS) algorithms (recursive feature elimination (RFE), Boruta algorithm (BA), and ReliefF)–integrated into the five ML classifiers as data feature reduction techniques. Results show that the combination of MLP NN and BA yielded a considerably high performance among the rest of the formulated models. Sensitivity analysis revealed that asynchronous LM is most sensitive to “existing health condition” feature, modified asynchronously, is highly characterized by low educational attainment and unstable employment status of parents or guardians, while synchronous learners have high socio–economic status as compared to other LM.

Multilayer Perceptron Neural Network Approach to Classifying Learning Modalities Under the New Normal Gernel S. Lumacad and Rhoda A. Namoco Abstract-Because of community quarantines and lockdowns during COVID-19 times, the Philippine's Department of Education (DepEd) implemented blended learning (BL) [both online and offline distance learning modalities (LM)] among basic educational institutions in the hope of continuing learners' learning experiences amidst the pandemic.Learners' LM are classified through the use of an Algorithm for Learning Delivery Modality as recommended by DepEd.Based on initial investigation, mismatches in learners' LM were, however, observed, resulting in learners' massive shifting from one LM to another in the middle of the school year.In this study, we introduced an approach to classifying learner's LM using machine learning (ML) techniques.We compared the effectiveness of five ML classifiers, namely the random forest (RF), multilayer perceptron neural network (MLP NN), K-nearest neighbor (KNN), support vector machine (SVM), and Naïve Bayes (NB).Learner's enrolment and survey form (LESF) data from the repository of a local private high school in the Philippines is used in model formulation.We also compared three existing feature selection (FS) algorithms (recursive feature elimination (RFE), Boruta algorithm (BA), and ReliefF)-integrated into the five ML classifiers as data feature reduction techniques.Results show that the combination of MLP NN and BA yielded a considerably high performance among the rest of the formulated models.Sensitivity analysis revealed that asynchronous LM is most sensitive to "existing health condition" feature, modified asynchronously, is highly characterized by low educational attainment and unstable employment status of parents or guardians, while synchronous learners have high socio-economic status as compared to other LM.

I. INTRODUCTION
N OVEL coronavirus was declared by the World Health Organization (WHO) a Public Health Emergency of International Concern (PHEIC) on 30 January 2020.COVID-19 was announced by the WHO as a new name for the novel coronavirus disease on 11 February 2020 [1].Because of the continuing threat of COVID-19, different sectors are greatly affected.For the education sector, the International Gernel S. Lumacad is with the Higher Education Department, St. Rita's College, Balingasag, Misamis Oriental 9005, Philippines (e-mail: gernellumacad@srcb.edu.ph).
Rhoda A. Namoco is with the Department of Applied Mathematics, University of Science and Technology of Southern Philippines Cagayan de Oro Campus, Cagayan de Oro City 9000, Philippines (e-mail: rhoda.namoco@ustp.edu.ph).
Digital Object Identifier 10.1109/TCSS.2023.3251566 Labor Organization (ILO) suggested that universal distance education should be adopted by teachers and that the "safety of learners and teachers should be paramount" [2].WHO, on "Key Messages and Actions for COVID-19 Prevention and Control in Schools," proposed a plan for continuity of learning, including the use of online/e-learning strategies; utilizing radio, podcast, or TV broadcasts for academic content; and reviewing/developing accelerated education strategies.
The pandemic has led the Philippine government to declare a state of calamity throughout the Philippines and implement lockdowns and community quarantines to evade further transmission and contain the virus.Universal and Mandatory Safety Measures salient guidelines for schools, colleges, universities, and other learning/training institutions under House Bill 6623 [3], known as the New Normal for the Workplace and Public Spaces Act of 2020, includes integrating online classes and promoting creative ways of learning modalities (LM) and engagement without physical contact and/or reporting to the classroom.
In response to the salient guidelines under HB 6623, the Department of Education (DepEd) and other private education sectors have suggested LM for blended learning (BL) to address the particular situation based on the recommendations of Almario and Austria [4].DepEd issued guidelines on enrolment, including the learners' enrolment and survey form (LESF) [5], a form that asks the student's information, parent/guardian's information, as well as household capacity, and student's access to distance learning.For learning modality classification, DepEd recommended using the Algorithm for Learning Delivery Modalities (ALDM) [6], which provides a set of questions and conditions as to which learning modality the child is to be classified.
Tupas and Lagunda [7] suggested that the negative and positive experiences in implementing the BL should be recorded.Based on an initial investigation conducted at St. Rita's College of Balingasag (SRCB), a local private high school in the Philippines, the implementation of BL using the ALDM as basis to the learning modality, to which a learner is primarily associated with, has been found to have a negative result, where a massive shift of learners from one learning modality to another learning modality is observed in the middle of the school year due to learning modality misclassification.Many features in the LESF were not included in the ALDM, which made the decision-making process very limited.These scenarios have vastly affected the teachers due to overlapped and delayed work-related requirements; for instance, in the time allotted for preparing the instruction, the cost for every preparation (printed modules, etc.), and delayed checking of learners' outputs.ALDM appeared as not a robust decision-making method for learning modality classification.Cabual [8] asserted that mismatches in teachinglearning strategies (such that of learning modality) may result into a learning barrier where a delay or an end of the learning process is the consequence.Understanding the profile of the learners is a helpful tool for effective decision-making and implementation of BL to efficiently address learners' needs [9].Moreover, further studies related to the planning and implementation of BL are encouraged to support the education system in the new normal [7].

A. BL in the New Normal
Learning styles emerge due to the demands of the current situation [10].The global pandemic situation demanded a transition from traditional face-to-face learning to different learning styles/modalities: online and offline distance learning.Safe delivery of instruction for continuity of learning is the forefront rationale of DepEd's formulation of the Basic Education -Learning Continuity Plan in pursuance of HB 6623.Among the LM suggested, this study focuses on three generalized LM: synchronous, asynchronous, and modified asynchronous learning.
1) Synchronous Learning: This is a type of remote learning wherein the class is conducted in real time [11].The teacher and students log in to a single platform where the class happens during an allotted time.Synchronous learning modality may be done using online tools through video conferences, audio chats, or messaging apps.In synchronous learning [12], students can ask questions in real time, feel a greater sense of connection to their peers, and are more engaged in their learning with stronger collaboration.One of the identified disadvantages in synchronous learning is, however, Internet connection problems [13], and moreover, the Philippines has the slowest connectivity in Asia [14].
2) Asynchronous Learning: In an asynchronous setup, students are provided with content and tasks that they need to accomplish within a time frame using an online platform such as a learning management system (LMS).Interaction between the teacher and students and among students does not take place in real time.Asynchronous learning [12] allows students to progress learning when they want and where they want; have more time to reflect on what they learned; feel more comfortable interacting with their teacher or peers when they have time to compose through emails rather than feeling pressured to speak up on what they learned; and can participate in the same activities regardless of time zone.In the absence of social interaction between their colearners and teachers, learners may feel satisfied; also, the contents of the subject matter may be misunderstood due to the absence of real-time interaction.
3) Modified Asynchronous Learning: Modified asynchronous as a learning modality is conducted by providing learning content and task in the form of a module (either electronic or printed) in which a learner is responsible to accomplish given a time frame [15].The module can either be picked up from the school or will be delivered to the learner's residence via courier.In this setup, there is no interaction between the teacher and the learner; moreover, face-to-face learning for both on-campus and online modules does not take place.Modular learning allows a learner to work on his task at his own rate.In the new normal, modified asynchronous learning is an offline distance learning with different learning instructions; radio, television, printed materials, and soft copies, to name a few.

B. Algorithm for Learning Delivery Modalities
In terms of learning modality classification, it was strongly recommended by DepEd to use ALDM as a decision-making tool for deciding on the right learning modality and affordable way of a child's learning.ALDM (see Fig. 1) provides sets of questions and conditions as a guide for classifying a learner to its specific learning modality: 1) Face-to-face learning; 2) Synchronous; 3) Asynchronous; 4) Digital offline modular; 5) home-based EdTV or RBI + printed modules; and 6) home-based printed modules.Note that 4)-6) are subforms of modified asynchronous learning modality.

C. Machine Learning (ML) in the Educational System
In the past decades, the use of different ML techniques has already penetrated the education system.It highlights its utilization for the benefit of its students, faculty and personnel, administrators, stakeholders, and those who are pertained to the academic community.ML is fundamentally transforming education in a way that it changes the traditional teaching, learning, and educational research.As highlighted in the studies of Asthana and Hazela [16], ML is applied in the following educational aspects: automated assessment, intelligent learning environment, and career planning and prediction.Other studies also showed the how ML had become a significant tool in 1) predicting the student's performance [17], where the student's final grades can be precisely predicted using log data stored in LMSs; 2) retention [18] of freshmen students enrolled in Science and Engineer streams are being classified as risk, intermediate and advanced; and 3) placement probability [19], where the use of ML techniques in learning analytics is highlighted to have power in developing the student's recruitment, policies, educational needs or even financial decisions.Previously mentioned literatures provide evidence of the robustness of ML techniques that significantly improved data-driven decisions in educational systems.Up until now, scholars and researchers are accelerating educational researches with ML to unlock new discoveries, knowledge, and insights.
The learning modality classification problem that is tackled in this article is a novel problem in the Philippine education system as part of its aim to continue students' learning under the new normal setup.The main task of the learning modality classification problem described in this study is in the formulation of a predictive model that will automatically classify upcoming learners to its appropriate learning modality, whether it is asynchronous, modified asynchronous or synchronous.There is yet no existing literatures that discusses solving this particular classification problem, specifically in the utilization of ML models.
We present in this article a comparative analysis of different ML methods for classifying students' learning modalities under the new normal.Also, various existing feature selection (FS) algorithms integrated into each ML method considered in this study are compared.Each ML method is trained using learners' LESF information as input features.In the latter part of this article, we further investigate the sensitivities for each feature with respect to each learning modality using the partial derivatives method (PDM).In summary, the key contributions of this article are as follows. 1

II. ML CLASSIFIERS OVERVIEW
We have outlined in this section the ML classifiers that we have utilized in this study in formulating a predictive model for classifying students' learning modality in the new normal.These classifiers are RF, MLP NN, KNN, SVM, and NB.

A. RF Algorithm
RF algorithm, also called as the "random decision forest algorithm," is an upgraded version of the bagging algorithm, an ensemble learning classifier based on multiple decision trees derived from classification and regression trees (CART) [20].RF divides each node using the best split among a subset of predictors randomly selected at that node rather than the best split among all variables (see Fig. 2).Several classification trees that make up the RF model are all independently produced using bootstrap samples.The RF model is trained using two-thirds of the data, and the model is validated using one-third of the data (the out-of-bag samples, OOB) [21].
In a classification task, the final predicted value is the average of the individual tree predictions as shown as follows: where A m,pred is the predicted value of the mth compound by the RF model; N is the total number of trees in the RF model; and A m,n,pred is the predicted value of the mth compound by the nth tree.Each tree grows based on the following rules as demonstrated in Algorithm 1. and End f or

B. Multilayer Perceptron Neural Network
Artificial neural networks are subfields of artificial intelligence [23].It is a computational system and a mathematical model inspired by the biological human brain [24].Generally, an artificial neural network that comprises at least three layers of nodes (input layer, output layer, and hidden layer) is called an "MLP NN" [25].Each node in the network is called an "artificial neuron."MLP NN is a robust global approximator for nonlinear functions, even in sparse data sets.Pattern classification, recognition, prediction, and approximation are major uses of MLP NN [26].MLP NN has the power to find patterns in a classification problem even in a very high-dimensional nonlinear data structure.For the MLP NN model with multiple input features and multiple output classes, for instance, as shown in Fig. 3, the final output of the neuron j is obtained using as follows: where n is the total number of input parameters, I i is the input variable i, β j is some bias value and ω i j contains the connection weights.The summation n i=1 ω i j I i +β j expresses the product of inputs, weights, and bias values.
The f j (x) is an activation function.An activation function helps the neural network to learn complex patterns in a give information system [27], and subsequently decides whether a particular neuron should be activated or not, depending on  its importance to the prediction process.There are lots of activation functions available that can be added while building an MLP NN model.Some of these are sigmoid, softmax, tanh, ReLU, leaky ReLU, parametric ReLU, H-Swish, and softplus [27].The usage of these activation functions depends on the task that the MLP NN is to be modeled.

C. KNN Algorithm
The KNN classifier is one typical example of a lazy learner.The core function of the KNN classifier is that it measures or evaluates differences or similarities among and between instances in a given dataset [28] as shown in Fig. 4. For a given instance in a test set x, the class or category of x denoted by c(x) is determined with respect to its k closest neighbors y 1 , . . ., y k .The class of x, c(x) is determined by the following equations [29]: where c(y i ) is the class of y i , and δ is a particular function such that δ(u, v) A simple voting can also be performed for the KNN classifier to yield an estimate p(c|x).This estimate is a fraction of instances of class c in the KNN.The estimate p(c|x) is shown in the following equation: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Consequently, KNN may be viewed also as a probability-based classifier as given in the equation below [25] c(x) = arg max c∈C p(c|x). ( The quality of a formulated KNN model in terms of prediction depends on some distance measure.

D. Support Vector Machine
SVM is a supervised ML algorithm for classification and regression tasks invented by Vapnik [30].In SVM, nonlinear mapping of the input vectors into high-dimensional feature space is used to establish nonlinear class boundaries.Optimal classification hyperplane and the application of kernel functions are two main principles in SVM [31].An illustration of how SVM works is presented in Fig. 5. Circles and squares denote two different types of samples.H 1 and H 2 are parallel lines to the classification line H that separates the two samples.H 1 and H 2 run through the sample points closest to H , called the support vectors.The classification margin is the distance between H 1 and H 2 .The main objective of the optimal classification hyperplane is to correctly separate the two different types of samples while maximizing the classification margin.On the other hand, the kernel function transforms the input samples into a high-dimensional space so they can be classified linearly [32].
The maximum margin hyperplane can be expressed by the equation where y i is the class of the given training samples x(i).x is a particular vector denoting the test samples and vectors x(i) are the support vectors.b and α i are hyperplane parameters.A high-dimensional version is expressed as follows: The function K (x(i), x) is the kernel function.Common examples of kernel functions are Gaussian radial basis, polynomial kernel, sigmoid kernel, hyperbolic tangent kernel, string kernel, and tree kernel function [33].

E. NB Algorithm
NB is a supervised learning algorithm that is a typical type of simple probabilistic classifier.NB is based on a Bayesian theorem that assumes strong independence between variables/features [34].This implies that the probability of one particular feature is not affected and does not affect the probability of the other feature in a given information system [35].The classification process via NB, as discussed in the article of Sharmila and Geethanjali [36], is carried out as given below.
Given a train set D for n number of classes with attribute vector Y and associated class labels.The following equation below expresses Y belongs to the class with the highest posterior probability: where The probabilities P((y k |C i )) are obtained from the training set and y k denotes the attribute value for a given dataset Y .P(Y |C i )P(C i ) is evaluated for each class C i to subsequently estimate the class label of Y .The classifier identifies the class label of attribute Y is C i based on the given condition below III. FS ALGORITHMS OVERVIEW The main purpose of FS is to reduce the complexity of the dataset by removing features that are found to be irrelevant for the classification/prediction task.Consequently, improve the accuracy during prediction.In this section, we present the four FS algorithms used as combinations to each ML classifier in predicting students' learning modality under the new normal.These FS algorithms are RFE, BA, and RA.

A. RFE Algorithm
The RFE is a wrapper-type FS algorithm.In wrapper methods, all possible combinations of features are being evaluated against an evaluation criterion where it follows a greedy search approach [37].Based on some measure of feature importance, RFE recursively ranks features in a given dataset [38] removed while those features with greater importance for the classification task are retained.In a feature space, RFE requires a number of features to be retained, but in some cases, a variable importance measure may be performed, such as the varImp() function from the "caret" package in R [39] where it returns the final ranking of features based on their importance.

B. Boruta Algorithm
The BA is a wrapper-built type FS algorithm around RF classification.Given an information system, BA iteratively captures relevant features and removes redundant or irrelevant features.As discussed in the article of Kursa and Rudnicki [41], BA consists of the following steps.
1) Additional copies of features are added to extend the given information system.2) Added attributes are then shuffled for the purpose of removing their correlations with the response.3) An RF classifier is trained on the extended information system.Computed Z -scores are then gathered.4) The maximum Z -score among shadow features is identified.A particular hit is then assigned to each feature that scored better than the maximum Z -score among shadow features.5) Each feature with unidentified importance is then evaluated by implementing a two-sided test of equality with the maximum Z -score among shadow features.6) Features with significantly lower importance values than the maximum Z -score among shadow features are labeled as "unimportant" and permanently removed from the information system.7) Features with significantly higher importance values than the maximum Z -score among shadow features are labeled as "important."8) Shadow features are then removed.9) The procedure is repeated until the importance is assigned for all the attributes or the algorithm has reached the previously set limit of the RF runs.The pseudocode of the BA can be seen in Algorithm 3. Most conventional FS algorithms satisfy the minimal-optimal problem; in contrast, the BA is an all-relevant FS algorithm.A detailed discussion of this can be found in [40].
The RA is a feature estimator algorithm that estimates the quality of each feature in a given data set with strong dependencies between features.ReliefF is highly efficient in dealing with complex datasets, both continuous and discrete [43], which can also handle incomplete and noisy data [44].The key idea of the ReliefF is to estimate the quality of features according to how well their values distinguish between instances that are similar to each other.The measure of quality W i for each feature is given by the equation where D H (k) or D M (k) is the sum of the distance between the selected instance and its KNN in H (or M), and p c is the prior probability class c.A more detailed explanation of ( 12) can be found in [45], and a comprehensive flow of ReliefF is presented in Algorithm 4.

IV. EXPERIMENTAL METHOD
The framework of the experimental method is demonstrated in Fig. 6.The proposed experimental method consists of data collection and data preprocessing, partitioning of LESF data into consistent and inconsistent datasets, FS, ML model development, model evaluation, model testing of inconsistent dataset and sensitivity analysis.Details for each step are elaborated on throughout this section.Experimental processes of this study are mostly implemented in RStudio using R programming language [46].
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Algorithm 4 Pseudo Code for ReliefF [44] Input: Feature data matrix: D, repeat times: n, the number of neighbors: K Output:Vector W for the feature attributes ranking Begin f or j= 1 to n do Randomly select an instance R j Find K nearest hits H and nearest misses M; f or i= 1 to all features do Updating estimation W i by Equation ( 2); end end End

A. Data Collection
The experimental method begins with collecting the LESF data set of learners from the database of a private high school in the Philippines.A summary of the dataset is presented in Table I.The LESF dataset contains the learner's information, parent/guardian's information, as well as the household capacity and learner's access to distance learning.This dataset has a total of 41 input features and an output variable with three classes: synchronous, asynchronous, and modified asynchronous.The dataset includes LESF of grade 7 learners up to grade 12 learners with a total of 1326 samples.

B. Data Preprocessing
After gathering the LESF dataset, it is then cleaned by omitting samples with missing values.Categorical responses are transformed using binary encoding (for dichotomous responses), ordinal encoding (for rank variables), and one hot encoding (for nominal variables -specifically the LM).Data preprocessing also includes the removal of the learner's name, the learner's classroom assignment, and other unnecessary information.After data cleaning and transformation, there are only 1214 remaining samples from 1326 original samples.

C. Data Partitioning
The remaining 1214 samples is partitioned into two datasets: the "consistent" dataset with 561 samples of learners who did not shift LM throughout the academic year and the "inconsistent" dataset with 653 samples of learners who did shift LM in the middle of the academic year.
It is postulated that knowledge and patterns can be extracted from a "consistent" dataset alone rather than combining "consistent" and "inconsistent" datasets, which may result in contamination; hence, a "consistent" dataset is used for formulating an ML model for learning modality classification of students.

D. Feature Selection
In building any ML model, feeding redundant data from a large volume dataset may cause complexities for the model to interpret the data, which leads to unsatisfactory performance metrics.This is where FS plays an important role by extracting important features.Dimension of the LESF dataset with 41 input features is explored using FS algorithms, namely the RFE algorithm, the BA, and the RA.The purpose is to capture all important variables that will be used for building an ML model for the learning modality classification of students.

E. ML Model Development
In this article, we have utilized five ML classifiers for predicting/classifying learners' learning modalities under the new normal.Specifically, we applied RF, MLP NN, KNN, SVM, and NB.Training and building of these ML classifiers are implemented using the following R packages: "random-Forest" [48] for the RF algorithm; "neuralnet" [49] for MLP NN; "class" [50] for KNN algorithm; "e1071" [51] for SVM; and "naivebayes" [52] for NB algorithm.For each training of the ML classifier, a k-fold cross validation (KFCV) technique is utilized to validate the performance of the developed ML model.It is a resampling technique used to evaluate ML models on a limited data sample.KFCV works well in dataset with a smaller number of observations and has a lower bias among other cross-validation methods [53].Specifically, tenfold cross validation is used.The "consistent" dataset with 561 observations is randomly partitioned into ten approximately equal parts.For each K learning trial, K − 1 folds are used for training and the remaining onefold for testing.Every data point in the "consistent" dataset gets to be in the test set exactly once, and gets to be in the training set K − 1 times.For each fold cycle, the process of training and testing the ML classifier is repeated until the iteration requirement is completed.Hyperparameters for each ML classifier are shown in Table II.The recorded hyperparameters found in Table II are the specific values that performed better through Manual Search.It is an ad hoc approach to finding the best hyperparameter values for training an ML algorithm.The idea is to first take big jumps in values and then small jumps to focus around a specific value that performed better.

F. ML Model Evaluations
Confusion matrix analysis (CMA) is conducted for each developed ML classifier.A total of five confusion matrices are compared and evaluated to evaluate which among these formulated ML models is best.CMA is a robust method for model evaluation that performs a precise evaluation of validity and gives further information on the type of error [54].Accuracy, Kappa-score, and F-score are then derived from the generated confusion matrices.
Accuracy for each developed machine model is calculated using the equation Kappa coefficient measures the interrater reliability for each trained ML model.Kappa coefficient ranges from 0 to 1 where a score of 0.81-1.00indicates that the model is almost perfect [55].A high F-measure [56] value implies that the model has a high rate of predicting true positives and true negatives in learning modality classifications.The Kappa coefficient and F-measure are given by the following equations: where K is the Kappa coefficient, P o is the probability of correctly classified, P e is the probability of random classifications (probability sum of correctly and incorrectly classified); and where Recall is (TP)/(TP + FN) and Precision is (TP)/(TP + FP).
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

G. Analysis
Sensitivity analysis is then implemented in the best-trained ML classifier using PDM.In this study, since the MLP NN showed considerably high performance as compared to RF, SVM, KNN, and NB, the PDM for sensitivity analysis is applied because it is an appropriate method for MLP NN.The purpose of sensitivity analysis is to know which of the following LESF features gives greater influence to each learning modality.PDM consists in calculating the derivative of the output with respect to the inputs of the MLP NN model [57].These partial derivatives are called sensitivities and are defined as where x n refers to the n sample of the dataset, s ik | x n is the sensitivity of the output kth neuron in the output layer with respect to the input of the ith neuron in the input layer evaluated in x n .Pizarroso et al. [57] proposed the following sensitivity measures for multiclass classification in analyzing the result and summarizing the acquired information.It is done by evaluating the sensitivity of the outputs for all the input samples X n of the provided dataset.First is the mean sensitivity with respect to the ith input variable Second, is the sensitivity standard deviation with respect to the ith input variable and third, is the mean squared sensitivity with respect to the ith input variable [59] Mean sensitivity and sensitivity standard deviation will determine the relationship (linear or nonlinear) between an input variable and an output variable, while mean squared sensitivity will indicate a low or high sensitivity of each input variable.

H. Testing the MLP NN Model to Inconsistent Dataset
Checking the results of any developed ML model for accuracy against the real world is essential to conclude that the developed model is good at interpreting uncertainties in the real world.Developed ML models are deemed to be powerless unless they can be applied to practice [58].The best-formulated ML model among RF, MLP NN, KNN, SVM, and NB is used to predict the shifted LM in the inconsistent dataset to check the interpretability of the model as applied to a real-world context.
The appearance of MEA, MES, FEA, GEA, and GES variables from the results of RFE, BA, and ReliefF; supports the claim of De Villa and Manalo [9] that the educational attainment and employment status of both parents and the guardian are helpful information for implementing BL, especially for learning modality classification.In addition to [9], socioeconomic status (SES) also plays an important consideration.In this study, it is perhaps depicted by variables DPC, Laptop, and CS, which gives the idea of learners's capacity to have a technological advantage for distance learning.An existing health condition or EHC variable is also seen as important based on the results of the three FS algorithms.

B. ML Model Development
Experimental results using the five ML classifiers: RF, MLP NN, KNN, SVM, and NB integrated with various FS algorithms for learning classification problem is presented in Table II.It can be observed from the obtained results that the usage of RFE, BA, and ReliefF as FS algorithms aided the five ML classifiers in yielding better results.This can be seen from the numerical results for each ML classifier without using an FS algorithm, where it yielded the least performance.For instance, in RF without FS, the model obtained the least performance as compared to the utilization of FS algorithms with an overall accuracy = 0.6434, Kappa-score = 0.3010, and F-score = 0.5246.
Respective performances of the 4 ML classifiers without the utilization of FS algorithms are as follows: MLP NN (overall accuracy = 0.7203, Kappa-score = 0.5090, and F-score = 0.6242); KNN (overall accuracy = 0.6953, Kappascore = 0.3940, and F-score = 0.5807); SVM (overall accuracy = 0.7500, Kappa-score = 0.4230, and F-score = 0.5762); and NB (overall accuracy = 0.6171, Kappa-score = 0.3150, and F-score = 0.5247).These numerical results confirm that the utilization of FS algorithms is an acceptable method to increase the performance of ML classifiers in predicting learners' learning modality.This allowed the ML classifiers to learn well in view of the fact that the complexity of the dataset is reduced, as caused by the effect of applying features selection algorithms.
Additionally, all formulated ML models for learning modality classification are found to have the least accuracy in predicting asynchronous classes compared to modified asynchronous and synchronous categories.This is perhaps because of the lesser number of instances in the dataset that belongs to the asynchronous category compared to modified asynchronous and synchronous categories.Formulated ML classifiers are best for predicting the synchronous class.
Among the formulated ML classifier without the integration of FS algorithms and with the integration of RFE, BA, and RAs, the result showed that the MLP NN model integrated with BA as FS technique yielded the highest performance.Compared to all formulated ML models for learning modality classification, the MLP NN with BA obtained the following performance metrics: overall accuracy = 0.8992, Kappa-score = 0.8160, and F-score = 0.9105.The resulting accuracy, Kappa-score, and F-score describe that the formulated MLP NN model with BA is an "almost perfect" ML model for classifying learners' learning modality.

C. Sensitivity Analysis
The sensitivity of input features for each learning modality is evaluated based on sensitivity measures using PDM.The main function of sensitivity analysis as conducted to the formulated MLP NN model is to decipher how sensitive the output variables are to changes in the input features.Table III summarizes the sensitivities of input features for the three LM.As observed for each input variable in every learning modality, all sensitivity standard deviations were different from zero regardless of the corresponding mean sensitivities.This indicated that all LM has a nonlinear relationship with the input variables.Mean squared sensitivity as a valid indicator of input variable importance for each learning modality is also presented in Table III.Existing health condition has the highest mean squared sensitivity value of 1.764 for asynchronous learning modality.This means that EHC is the most important variable for asynchronous learning modality.Aside from EHC, the address of the learner, MES, GWH, Laptop, OBI, and GP also contributes higher importance to asynchronous learning modality with respective mean squared sensitivity values of 0.559, 0.505, 0.542, 0.530, 0.662, and 0.534.
The employment status of the mother and educational attainment of both mother and father are the most important input variable for modified asynchronous learning, with respective mean squared sensitivity values of 1.383, 1.404, and 2.209.With these mean squared sensitivity values, the corresponding mean sensitivity values are −1.057,−0.981, and −1.076.These measures indicated that modified asynchronous learning modality is specifically highly sensitive to the following: a "self-employed" or "unemployed" MES and low educational attainment of both mother and father (vocational graduate, high school graduate, and elementary graduate).
The input variable with the highest impact on synchronous learning is "existing health condition," with a mean squared sensitivity value and mean sensitivity value of 2.004 and −1.260.This implied that synchronous learning modality requires a learner with likely no existing health condition.
Employment status and educational attainment of both parents and guardian also contribute to a higher impact on synchronous learning.MEA, FEA, and GEA have the following mean squared sensitivity values of 1.421, 1.397, and 0.614 and mean sensitivity values of 1.016, 0.893, and 0.673.This means that high educational attainments of both parent and/or guardian are indicators that a learner should be classified to synchronous learning modality.The mean squared sensitivity values of MES and GES are 0.884 and 0.992, with respective Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
mean sensitivity values 0.705 and −0.882, which showed that MES is either part-time or full-time, while GES is either unemployed or self-employed.Moreover, the sensitivity result also indicated that synchronous learning modality is best for learners residing in suburban or urban areas, where the address has a mean squared sensitivity value of 0.468 and a mean sensitivity value of 0.642.DPC and laptop also contribute higher importance for synchronous learning modality with respective mean squared sensitivity values of 1.001 and 0.748 and mean sensitivity values of 0.880 and 0.727.
The results of this study revealed factors that indirectly affect the classification of LM, as previously discussed.Parents' educational attainment is one of the three dimensions of family SES [60].The other two dimensions include family income and occupational prestige.Parents and guardians with high educational attainment for at least a college degree tend to have higher incomes, as stressed by Gooding [61].With this, families with high incomes are more likely to reside in urban or suburban areas, perhaps due to their employment statuses.Their residential location is fit for synchronous learning modality where Internet access is possible.Also, parents/guardians with high income are able to support their child to go for synchronous LM by providing support, such as technological resources like desktop personal computers, laptops, smartphones, and ways to connect to the Internet.
Corlatean [62] argued that students, along with their families with low statuses, have been struggling significantly in terms of Internet and technological resources.These findings of Corlatean [62] support the results of this study where learners under modified learning modality struggle to go for synchronous and asynchronous LM, perhaps due to low SES, which makes it difficult for parents and guardians to support their child in terms of technological resources and access to a stable Internet connection.Although some modified asynchronous learners are able to connect to the Internet and have technological resources, the problem now is in the sustenance of online learning since their parents have unstable work employment statuses.
According to Villanueva and Nuñez [63], there is a link between SES and the online learning experience; however, despite the availability of technical resources, other factors also affect students' online learning experiences, such as the availability of gadgets, stable Internet connection, study area or even comfort, including health conditions.As asynchronous learning modality is investigated in this study, it was found out that technological resources are not enough to be considered to classify learners for asynchronous learning modality.As previously discussed, several factors affect asynchronous learning modality that needs to be examined and considered in classifying learners into this kind of learning modality.One of these is the existing health condition, which has a greater impact on asynchronous learning modality.Health conditions that may possibly arise in an online class do not limit only to eye problems like computer vision syndrome, eye strain, and eye infection [64] but also include increased stress and anxiety, virtual learning fatigue, and other mental health issues related to online learning [65].These health conditions perhaps caused learners under asynchronous learning modality to be unable to settle for synchronous learning.

D. Testing the MLP NN Model to "Inconsistent" Dataset
In this study, MLP NN as the best-developed ML model is tested to the inconsistent data set.Out of 653 samples from the inconsistent dataset, shifted learning modality of 73.81% or 482 samples were correctly predicted by the MLP NN model.The remaining 26.19% or 171 samples from the inconsistent dataset whose shifted LM were not correctly, predicted perhaps due to significant changes in the learners' end, such as enhancement of living, change of residence, and adjustments to learning style within the academic year.The developed MLP NN model is able to classify a learner to its appropriate learning modality by analyzing the learner's information and providing a probability estimate for its corresponding learning modality.With the results obtained as the developed MLP NN model is tested to the "inconsistent" dataset with 73.18% correct predictions, it showed that it is possible to minimize the number of LM shifters within a school year.In general, technological resources, Internet connectivity, and independent learning should not be the only factors to be considered in classifying learners into synchronous, asynchronous, and modified asynchronous LM, as suggested by the ALDM.Other LESF features also have nonlinear relationships that have an indirect impact on learning modality classification, as shown in the results of the FS and sensitivity analysis.These results implied that schools must not limit the factors (as suggested by ALDM) for learning modality classification but also to assess other important LESF features for the decision-making process.Hence, the formulated MLP NN model integrated with BA as FS method is a capable tool for this kind of decision-making process -the learning modality classification.

VI. CONCLUSION AND RECOMMENDATION
In this study, we compare the effectiveness of five stateof-the-art ML classifiers (RF, MLP NN, KNN, SVM, and NB) for learning modality classification problems under the new normal.We also compared the effectiveness of existing FS algorithms (RFE, BA, and ReliefF) as integrated into each ML classifier considered in this study.It is found that among all the formulated ML models, MLP NN integrated with BA for FS is found to be the most effective model for classifying learners' LM under the new normal that classifies a learner into synchronous, asynchronous or modified asynchronous learning modality using learner's LESF data as input parameters.The obtained overall accuracy of 0.8992, Kappa statistic of 0.8160 with an F-score of 0.9105 implied that the model is "almost perfect."Based on these performance metrics, we concluded that the developed MLP NN model integrated with BA as FS method is an acceptable tool for classifying learners' LM under the new normal.
In addition, based on the results generated from BA during FS and from PDM during sensitivity analysis, we further conclude the following.
1) Important factors to consider in learning modality classification are existing health conditions, learning distractions, computer shop, employment status, and educational attainments of both parents and/or guardian, laptop, indigenous people, desktop personal computer, smartphone, conflicts of activities while learning, independent learning, available gadgets, grandparent/s as instructional support, 4Ps member, address of the learner, broadband Internet, sex of the learner and available space for studying.These factors have nonlinear relationships to synchronous, asynchronous, and modified asynchronous LM. 2) Asynchronous learning modality is most sensitive to existing health condition variable.Asynchronous learning modality has high sensitivities to the following factors: address of the learner (either rural or suburban), mother's employment status, guardian working from home, laptop, own broadband Internet and grandparents, as instructional support.Although many parents/guardians of asynchronous learners are capable of supporting their child in terms of enough technical resources and stable Internet connectivity, health conditions affect learners to go for synchronous online learning, which turned out to settle in asynchronous learning modality.These health conditions may include common eye disorders and diseases, mental health and even virtual learning fatigue, anxiety, and stress, which possibly caused these types of learners to stay in this kind of learning modality since the learning phase does not require virtual conferences and learners can decide when to be on screen and when to rest away from the computer screen.3) Modified asynchronous learning modality is characterized by the following factors: low educational attainment of both parents and/or the guardian with an unstable employment status, without desktop personal computer, and is not able to take up independent learning.Learners in this learning modality need instructional support since they struggle with independent learning.Also, learners coming from this learning modality have low socioeconomic status.Low SES (educational attainment and employment status) plays a huge impact in modified asynchronous learning modality where parents/guardians struggle to support their child not only by providing technological resources and Internet access but also in sustaining the needs for online learning.Low educational level perhaps caused parents/guardians to have an unstable employment status, which caused them to have low income and struggle in supporting their child for online classes.4) Synchronous learning modality is highly characterized by the following variables: no existing health condition, stable employment status, and high educational attainment of both parents and/or guardian, learner's address is either suburban or urban and available gadgets.Also, synchronous learners have higher socioeconomic status as compared to learners from asynchronous and modified asynchronous LM.High educational level possibly caused parents/guardians to have a stable employment status having high incomes, which made them to support their child for synchronous learning modality.Stable employment status caused them to easily sustain the needs of their child for online distance learning by providing enough gadgets.Learners under this learning modality can sustain longer on screen for online meetings and other online activities since they do not have existing health conditions, which may lead them to struggle in this kind of learning modality.The developed MLP NN model for learning modality classification has considerably high predictive performance.Its high-performance metrics aided the sensitivity analysis phase to best determine, which among the input variables greatly affect each LM.The results of the sensitivity analysis may be an aid for DepEd in formulating guidelines for objective classification of learners' LM.A similar study may be conducted using the LESF of other basic education institutions to promote result generalization.Researchers are encouraged to conduct more analysis on the features.

Algorithm 1
Pseudo Code for RF [22] I nput : training dataset D N * P and Number of trees (B) For each variable i ∈ P do For b = 1 to B : 1. Draw a bootstrap sample Z * of size N from the training data.2. Grow a random -forest tree T b to the 2/3 of bootstrapped data.3. Predict classification of the remaining 1/3 using the tree, and calculate the misclassification rate = out-ofbag error rate (OOB), e b .4. For variable i, permute the value of variable and compute OOB (E b ), subtract to the original OOB error d b = E b − e b , the increase is an indication of the variable importance.End f or Aggregate total OOB error rate from all trees and calculate the variance.
) where P(C i ) are the class probabilities, P(Y ) is the prior probability of Y , P(C i |Y ) is the posterior probability, and P(Y |C i ) is the posterior probability of Y conditioned on C i .Since P(Y ) remains constant for all classes, only P(Y |C i )P(C i ) is to be maximized.The estimate of P(Y |C i ) can be reduced at which the classifier adopts the features that are conditionally independent of each other.Thus,
. A specific ML model starts training the given dataset in the RFE algorithm (see Algorithm 2).Feature importance is then evaluated for each iteration.Less important features are then Algorithm 2 Pseudo Code for RFE [40] Tune/Train the model on the training set using all predictors Calculate model performance Calculate variable importance or rankings f or each subset size S i , i = 1, 2, . . ., S do Keep the S i most important variable [Optional] Preprocess the data Tune/Train the model on the training set using S i predictors Calculate model performance [Optional] Recalculate the rankings for each predictor end Calculate the performance profile over the S i Determine the appropriate number of predictors Use the model corresponding the optimal S i [42]rithm 3 Pseudo Code for Boruta[42]I nput s : originalData -input dataset; RFrunsthe number of random forest runs.Out put : finalSet that contains relevant and irrelevant features confirmed set = Ø

TABLE I SUMMARY
OF THE LESF DATASET

TABLE COMPARATIVE RESULTS
OF PERFORMANCE MEASURES OF 5 ml CLASSIFIERS WITH VARIOUS FS ALGORITHMS

TABLE III SENSITIVITIES
OF INPUT VARIABLES FOR THE THREE LM