Algorithm for DT Model Construction
Initialize: Set appropriate values for the minimum purity of samples (Pmin) and the minimum number of samples (Smin) in the node required to be split. Input: Training dataset D={(x1, y1), (x2, y2), …, (xn, yn)}, where xi is a feature vector of size m and yi is a real-valued target variable. Construct a root node N with training data set D. For each node N: Calculate the purity of the target variable in node N by . If the purity of node N is less than Pmin or the number of instances in D is below Smin, mark N as a leaf node and return the mean value as the predicted target variable. If not, for each feature i, calculate the purity loss by based on splitting the instances in D according to the values of feature i. Determine the feature with the highest purity loss as the splitting criterion for node N. Split the instances in D into two subsets: Dleft and Dright, based on the selected feature and splitting value. Create two child nodes for N: Nleft and Nright. Recursively apply the above steps to each child node, using the corresponding subset of instances. Return the decision tree T.