Decision Tree Regression

Modified on Tue, 18 Jul 2023 at 02:03 PM


Decision tree regression trains a decision tree based machine learning model.

Decision tree models models represent data by constructing a trees of branched binary decisions. These branches end in leaf nodes that represent the average target value of the observations in the node.


Similar to other machine learning regression models, decision tree regression can be used to predict single or multiple “target” output values from “feature” inputs. See the last section for advantages and disadvantages of this type of model and when to use or not to use it.

How to use

Decision tree regression requires three inputs.

The data to use to train the model
The columns in the data to use as inputs (“features”) to make predictions from.
The column(s) in the data to use as the output(s) of the models. These are the values the models will predict based on the input features.

Advanced Options

Maximum depth
Controls how deep the tree can grow, i.e. how many binary decisions (binary = boolean, yes/no, true/false) can be made in a branch. The same depth is applied across all branches. Deeper trees represent the training data better, but are more likely to overfit.
Minimum split samples
The number of rows of data required to add a new split to a branch. Lower values allow branches to grow to a greater depth and increase overfitting.
Minimum leaf samples
The number of rows of data required to create a leaf (final) node in a branch. Lower values allow branches to grow to a greater depth and increase overfitting.


This article provides a good and vivid illustration of how a decision tree works: A visual introduction to machine learning.

More on this step

Decision trees learn chains of multiple simple rules to represent data. These rules can be visualised and interpreted to understand how the model works.

Example decision tree from scikit-learn documentation


  • Fast to train and scale well with data size.
  • Robust to data types, quality, and preparation methods - works well with continuous and categorical data, doesn’t care about scaling or normalisation.
  • Relatively easy to interpret.
  • Non-parametric - meaning there’s no set underlying mathematical equation predefining the interactions between features (as there is with some other model types, such as linear regression). How features interact is learned from the data, rather than imposed by assumption.


  • Training can be unstable, with small variations in training data resulting in different trees being trained. 
  • Prone to overfitting the data.
  • The predictions aren’t necessarily smooth or continuous, which may poorly represent the underlying physical process being modeled if there is not enough training data.

Most of the limitations of single decision tree models can be mitigated by combining multiple trees into a Random Forest model.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article