Neural Network

Modified on Thu, 17 Aug, 2023 at 9:27 AM

Description

Neural Network is an AI model step that can learn to predict numeric values. Neural Networks use an algorithm inspired by the processes in animal brains. The data are passed through a series of connected neurons and can learn to perform tasks or find relationships in your data.

Application

Neural Networks are widely applicable to a large range of machine learning problems. With a large enough dataset, Neural Networks can learn to predict extremely complex non-linear relationships. Neural Networks have many parameter choices which affect the accuracy and performance of the model predictions. In the Monolith platform, we have simplified these choices down to the most common ones for Engineering use-cases.

A common criticism of Neural Networks is that they are more difficult to explain than other simpler models: it can be hard to understand why the model has made a specific prediction. We have several features in the Monolith platform to help with this problem, for instance the Sensitivity Analysis and Explain Predictions steps.

How to use

As with all AI models, you must select a dataset to train the model. Then select the Input and Output columns for the model to learn.

Neural Networks have many parameters that you can use to tune the prediction performance of your model. We have selected sensible default choices for all these parameters, so you can just press Apply and train a model immediately. The rest of this section describes which Neural Network parameters you can change in the Monolith platform, with a short description of how each parameter usually affects the model prediction performance. Many of the parameters appear in the Advanced Options panel, which you can expand by clicking on the Show/Hide Advanced Options button.

Advanced Options

Hidden Layers	One of the most important choices is the shape of the network. The neurons in a Neural Network are arranged in a series of layers known as “hidden layers”. In the Monolith platform you can choose both the number and the size of these hidden layers. Both choices affect the model performance: a model with more layers (or larger layers) will be able to fit complex relationships but is more liable to overfitting. A model with fewer layers (or smaller layers) will only be able to capture simpler relationships but is less prone to overfitting. In general, finding the best choice is a balance between these two extremes (underfitting and overfitting). You can train and compare many models with different architectures, to see which architecture gives you the best performance both on training data and on unseen testing data. See this page for more details on searching for the best parameters.
Number of training steps	The second most important choice is how many training steps to use. Using more training steps will make the model fit more closely to the training data. It is usually a good thing to make the model predictions match the training data more closely. However, if you train the model for too many steps then you can end up overfitting to the training data. Overfitting means that the model makes good predictions for training data, but worse predictions for unseen data. It’s hard to know how many training steps to use the first time you run a model. A common approach is to use the default value and then use the loss history curves to decide how many training steps to use. Check this page for more details.
Batch Size	Neural Networks look at the data in chunks or “batches”. The number of rows in each batch (“Batch Size”) can affect how quickly the model trains, and can also affect the final performance. We use a common default size of 32 rows per batch, but you can try using larger or smaller values to see how this affects the model performance. Usual values are between 16 and your full dataset size, although you might see much slower training times for Batch Size values larger than a few thousand.
Dropout Fraction	This parameter can be used to control the randomness in your network. A higher dropout fraction increases the randomness in your model, by randomly turning off (“dropping out”) some fraction of the neurons in each training step. Higher dropout fraction can help reduce the chance of overfitting (through a process known as “regularization”), but it can mean that your model will require more training steps to learn to fit the data. Usual values are between 0 and 0.2. If you want uncertainty estimates with your model then you should use a non-zero value for the Dropout Fraction.
Intermediate Layer Activation Function	This parameter defines how the neural network processes information, making it better at understanding complex patterns in data. The different options available are: Relu (Rectified Linear Unit): A widely used function efficiently capturing non-linear relationships with simplicity and effectiveness. Elu (Exponential Linear Unit): Addresses limitations of Relu, potentially improving performance and preventing the “dying relu“ problem. Swish: Combines features of other functions, offering a balanced approach to capture linearity and non-linearity effectively.
Include Uncertainty	Our Neural Networks can provide uncertainty estimates along with their predictions. This is only possible if you have used a non-zero value for Dropout Fraction. Read here for more details on how certainty is calculated and how it differs from the model error. (This option doesn't appear in the Advanced Options panel but as a checkbox among the regular settings.)

Examples

See tutorial 1.2 in the Getting Started page on the Monolith platform for a walkthrough tutorial of how to train a Neural Network:

More on this step

Artificial neural networks are computing systems that are inspired by, but not identical to, biological neural networks that constitute animal brains. Such systems “learn” to perform tasks by considering examples, generally without being programmed with task-specific rules. The structure is normally composed of many layers, themselves composed of many neurons.

The different layers of a neural network are:

An input layer (in blue), which gathers all the inputs that will be learnt from (e.g. speed, angle of attack, length, …),
One of many hidden layers (in red), where neurons will be connected,
An output layer (in yellow), which contains all desired outputs (e.g. drag, strength, pressure, …).

During the training phase, the different “links” (or weights) between these neurons will be progressively “tuned” so that the network becomes able to accurately predict the outputs from the inputs.

In general, if the structure is too small, the problem will be too “simplified”, which might lead to inaccurate predictions of the data. On the opposite, if the structure is too large and too complex, the network will “overfit” to the current data, which might lead to errors on new data.