Random Subset

Modified on Tue, 8 Nov, 2022 at 12:00 PM

Description

With this function you can create a subset of your dataset. The subset is selected randomly.


Application


Working on a large dataset can become cumbersome as processing the data is going to take a significant amount of time (depending on what you want to do). You could build your initial workflow with a much smaller dataset by using Random Subset at the beginning of your notebook. That way you have a dataset which represents your problem (in terms of available columns and the statistics of the data) and can quickly build a notebook. Later on you could remove the random subset step and refresh the notebook with the full dataset to get the actually intended results.

You could also use this function to create a Learning Curve for your model(s) manually to check if you have already enough data (or possibly too much). In that case you would create random subsets of different sizes (e.g. 10%, 30%, 50%) and check the performance of your model(s) on each dataset.


How to use

  • Assign the dataset from which you want to create a subset in the field Data.
  • Use the slider Percentage to define the size of the dataset in percent of the original dataset (a value between 0.01 % and 100 %).
  • You can save the subset as new dataset if you enable the option Save the output under new name. If you not enable that option your original dataset will be overwritten.
  • Click Apply. Your random subset will be generated.

More on this step

This function produces a random subset of your dataset. The Random Sample function in contrast is an exploration tool which shows you N random rows of your dataset as a preview.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article