Imagine we have two continuous tuple where . We have the within-leave variance of leaf .
where , the mean values of leaves in the class.
The basic algorithm is as follows:
- Start with a single node containing all points. Calculate and S.
- If the points have all the same values for all the independent variables, stop. Otherwise, search over all possible splits and take the one that minimizes . If the value of is , or the new node would contain less than points, then stop. Otherwise take the split to create two new nodes.
- In each new node, start over with step 1.