Clearly Explained: Top 2 Types of Decision Trees- CHAID & CART

栏目: IT技术 · 发布时间: 4年前

内容简介：Let’s dive in to understand the CHAID Decision tree algorithm first.This algorithm was originally proposed by Kass in 1980. As is evident from the name of this algorithm, it is based on the chi-square statistic. A Chi-square test yields a probability value

Let’s dive in to understand the CHAID Decision tree algorithm first.

CHAID- Chi-Squared Automatic Interaction Detection

This algorithm was originally proposed by Kass in 1980. As is evident from the name of this algorithm, it is based on the chi-square statistic. A Chi-square test yields a probability value as a result lying anywhere between 0 and 1. A chi-square value closer to 0 indicates that there is a significant difference between the two classes which are being compared. Similarly, a value closer to 1 indicates that there is not any significant difference between the 2 classes.

Variable types used in CHAID algorithm:

Variable to be predicted i.e Dependent variable: Continuous OR Categorical

Independent variables: Categorical ONLY (can be more than 2 categories)

Thus, if there are continuous predictor variables, then we need to transform them into categorical variables before they can be supplied to the CHAID algorithm.

Statistical Tests used to determine the next best split:

Continuous Dependent Variable: F-Test (Regression Problems)

Categorical dependent Variable: Chi-square (Classification Problems)

Let’s understand Bonferroni Adjustment/Correction before we progress further.

Bonferroni Adjustment/Correction

In statistics, the Bonferroni correction is one of several methods used to counteract the problem of multiple comparisons.

This adjustment tackles with the fact that the more tests you perform, the greater the risk of Type 1 error (False Positive) i.e. it appears as if you have stumbled upon something significant, but in reality, you haven’t.

The above image indicates that if we take an alpha value of 0.05 and we conduct 1 test, then we have a 95% confidence level indicating that there is a 95% probability that we’ll be able to avoid Type 1 error. Now, observe that as we start increasing the number of tests to 100, we are left with only 0.^% probability to be able to avoid Type 1 error. To counter this effect, we calculate the adjusted alpha value in tandem with the number of tests. Now, as long as we can use this new adjusted value of alpha, we can be in a safe zone theoretically.

Observe the adjusted alpha value at 100 tests, it has become so low that the tree will stop growing because it’ll not be able to find any variables that can achieve that level of super-significance.

Generally, all the software to develop decision trees gives an option to the modeler to turn it off. Bonferroni adjustment setting should be left on. If in case we turn it off in case of a scenario where the tree is not growing and we would like to experiment by turning Bonferroni off, then consider making the alpha value lower than the usual 0.05 to be careful of the Type 1 error possibility that we discussed above.

Also, always, always validate your tree once the modeling stage has been completed.

Under-the hood process of CHAID Algorithm

Iterate cyclically through all the predictors one by one to determine the pair of (predictor) categories which is least significantly different with respect to the dependent variable. A chi-square statistic will be computed for classification problems (where the dependent variable is categorical as well), and an F-test for regression problems (where the dependent variable is continuous).
If the respective test for a given pair of predictor categories is not statistically significant as defined by an alpha-to-merge value , then it will merge the respective predictor categories and repeat the first step(i.e., find the next pair of categories, which now may include previously merged categories).
If the statistical significance for the respective pair of predictor categories is significant = less than the respective alpha-to-merge value, then it will compute a Bonferroni adjusted p-value for the set of categories for the respective predictor if the setting is enabled.
This step is about selecting the split variable. The predictor variable with the smallest adjusted p-value , i.e., the predictor variable that will yield the most significant split will be considered for the next split in the tree. If the smallest (Bonferroni) adjusted p-value for any predictor is greater than some alpha-to-split value, then no further splits will be performed, and the respective node will become a terminal node.
This process will continue iteratively until no further splits can be performed (given the alpha-to-merge and alpha-to-split values).