Workflow if only small amount of data is available

Modified on Thu, 01 Jun 2023 at 01:11 PM

If only a small amount of data is available so that model performance still increases with any data point added, it has big impact for your model performance to hold back data points for testing (which you have to do to evaluate model performance). An obvious solution is to bring more data if possible. But if this is very high effort or even impossible, then the described workflow enables both model testing and achieving best model performance for end users.

The outlined approach might most often be useful for 3D models as these usually need big amounts of data to perform best while only smaller amounts of data are available. In these situations it is crucial to use as many data points for training as possible for the final model. At the same time you need some unseen data to evaluate model improvements as ML is usually an iterative process. You iterate towards the final model incrementally and each iteration needs to be evaluated on unseen data.

  • Create three datasets
    • Dataset 1: All data points
    • Dataset 2: Training data, subset of Dataset 1 (80% of initial data is most often a good starting point)
    • Dataset 3: Test data (20% subset of dataset 1 not used in Dataset 2)
  • Train two models
    • Model A for Release on Dataset 1 (all data)
    • Model B for Evaluation on Dataset 2 (80% of data)
    • For both models use the same model type with the same set of hyperparameters!
  • Purpose of Model B
    • Whenever you iterate on the current modelling approach you can use this model to test how model performance evolves.
    • Use the unseen Dataset 3 for model testing.
  • Purpose of Model A
    • This is the best possible model which can be achieved with the available data.
    • Model performance is likely as good or even better than Model B. That is, any model improvements seen with Model B will reflect in Model A respectively.
    • This model can be exposed to and used by any model-consumers for predictions.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article