Description
Decision tree regression trains a decision tree based machine learning model.
Decision tree models models represent data by constructing a trees of branched binary decisions. These branches end in leaf nodes that represent the average target value of the observations in the node.
Application
Similar to other machine learning regression models, decision tree regression can be used to predict single or multiple “target” output values from “feature” inputs. See the last section for advantages and disadvantages of this type of model and when to use or not to use it.
How to use
Decision tree regression requires three inputs.
Data | The data to use to train the model |
Inputs | The columns in the data to use as inputs (“features”) to make predictions from. |
Outputs | The column(s) in the data to use as the output(s) of the models. These are the values the models will predict based on the input features. |
Advanced Options
Maximum depth | Controls how deep the tree can grow, i.e. how many binary decisions (binary = boolean, yes/no, true/false) can be made in a branch. The same depth is applied across all branches. Deeper trees represent the training data better, but are more likely to overfit. |
Minimum split samples | The number of rows of data required to add a new split to a branch. Lower values allow branches to grow to a greater depth and increase overfitting. |
Minimum leaf samples | The number of rows of data required to create a leaf (final) node in a branch. Lower values allow branches to grow to a greater depth and increase overfitting. |
Examples
This article provides a good and vivid illustration of how a decision tree works: A visual introduction to machine learning.
Decision trees learn chains of multiple simple rules to represent data. These rules can be visualised and interpreted to understand how the model works.
Advantages
- Fast to train and scale well with data size.
- Robust to data types, quality, and preparation methods - works well with continuous and categorical data, doesn’t care about scaling or normalisation.
- Relatively easy to interpret.
- Non-parametric - meaning there’s no set underlying mathematical equation predefining the interactions between features (as there is with some other model types, such as linear regression). How features interact is learned from the data, rather than imposed by assumption.
Disadvantages
- Training can be unstable, with small variations in training data resulting in different trees being trained.
- Prone to overfitting the data.
- The predictions aren’t necessarily smooth or continuous, which may poorly represent the underlying physical process being modeled if there is not enough training data.
Most of the limitations of single decision tree models can be mitigated by combining multiple trees into a Random Forest model.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article