Polynomial Regression

Modified on Tue, 18 Jul 2023 at 08:32 AM

Description

Polynomial Regression is a machine learning model that can be used to fit a non-linear relationship between an output variable and one or more input variables.


Application

Polynomial Regression is a useful model when the relationship between the outputs and inputs is non-linear, as it can be used to detect and model more complex patterns in the data. See the last section for advantages and disadvantages of this type of model and when to use or not to use it.


How to use

  • Under the Data input choose a tabular dataset to build the model on
  • Under Inputs select the input features you would like to use for the model
  • Under Outputs select the output features you would like to predict with this model
  • The Output Polynomial Coefficients? tick box decides whether to provide the coefficients of the fitted model as part of the output. See below for more.
  • Under the Name input type the name that you would like to refer to the model being created.
  • Click Apply to train the model.

Output of Polynomial Coefficients

If the option Output Polynomial Coefficients? is enabled further fields become available to customise the output.

Option: Select format for coefficients

You can select between two different output formats:

Inline Equation (Pi Compatible)

The entire polynomial equation is printed in an info box.

You can copy & paste that String over to your downstream system. Depending on the expected format of that system you might need to adapt the string to it.

Table (CSV Exportable)

The polynomial equation is converted in a table. The first column contains the term (the inputs and exponents), the second column contains the coefficients.

Export the table with coefficients to a .csv file by clicking the “Export Dataset” button at the top right of the step.

Option: Select decimal places to round coefficients to (for display purposes only)

With this option the precision of the coefficients is controlled. By default this option is set to 10. That is, each coefficient is printed and exported with 10 decimal places. If you need higher precision you can increase that number. If less precision is sufficient you can decrease this number. The example screenshots above are created with a value of 4.

This option does not impact the precision of the coefficients used for internal model training and predictions! This can be controlled by the hyperparameter Rounding in the Advanced Options, see below.

Advanced Options Summary

The polynomial regression model has the following hyperparameters:

Model Type

The model type enables the user to choose regularised variants of the polynomial regression model. This is especially important when the data contains a lot of inputs and/or little data points. Regularised versions will prevent the model from overfitting by trying to keep coefficients as small as possible. The available options are:

  • Basic Polynomial regression: this model fits a polynomial curve to the data without regularisation. This option provides a flexible modeling approach but may be susceptible to overfitting, especially with limited data.
  • RidgeCV (regularised): this model adds regularisation to the model, helping prevent overfitting by introducing a penalty term to the loss function. It constrains the model's coefficients, leading to a smoother and more robust fit.
  • LassoCV (regularised, focused on main inputs): this is the default model for the step, which also includes regularisation but with a focus on identifying and prioritising the most important input variables. It applies a penalty that encourages the model to favor a sparse solution, effectively performing feature selection. This can be particularly beneficial when dealing with high-dimensional datasets where feature relevance is crucial. This makes it particularily attractive for polynomial regression, where combination of input powers create a large input space.
Degree of Polynomial

The Degree of Polynomial determines the maximum degree of polynomial terms in the model. Higher values of the degree of polynomial will lead to more complex relationships that can be modeled but is also more prone to overfitting.

This parameter has to be set to the highest degree you want to use in your polynomial.

Max Degree of N

If you assign an input feature to one of these parameters all terms related to this feature are limited to a polynomial degree of at most N. That way the global maximum degree of the polynomial is overridden for the assigned input feature by this parameter.

This parameter is available for N = 1, 2, 3, and 4.

Use this feature to restrict input features to lower degrees than the globally set Degree of Polynomial. Currently you can only restrict the maximum degree. It is not possible to increase the maximum degree for an input compared to the global setting.

That is, if Degree of Polynomial is set to 3 you can restrict certain inputs to a maximum degree of 1 or 2 by means of the corresponding field. You cannot use the field Max Degree of 4 to increase one feature to a higher degree than 3; it would just be ignored.

All fields Max Degree of N with N ≤ Degree of Polynomial are ignored by the model.

Rounding

This parameter defines the numerical precision with which to return the polynomial coefficients

The available options are:

  • None
  • float16
  • float32
  • float64

The default selection is None which means the internal precision of the Python framework is used which is float64.

It is highly recommended to leave this parameter unchanged unless you have a clear requirement!

More on this step

Advantages

  • Powerful tool for modeling nonlinear data. It is capable of fitting more complex relationships between the inputs and outputs than linear regression. Additionally, a lot of engineering data follows physical laws that are often polynomial, which makes this family of models often suitable.
  • Can handle a wide range of data, including data that is not normally distributed. 
  • Useful for extrapolating data outside the range of the original data points, allowing it to generate predictions beyond the range of the original data. 
  • Is explainable by returning an equation that can help to better understand the product response, but also can be easily exported and embedded outside of the platform. Most of the more complex models (e.g. Neural Network) don't have this degree of explainability.

Disadvantages

  • Can be prone to overfitting, which can lead to unreliable results. 
  • Can be computationally expensive and may require significant computing resources for large datasets. 
  • May be susceptible to outliers.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article