Description
With this function you can remove rows with missing data from a dataset.
Application
Missing data can either be an empty field or entries like NaN
. NaN
values are typically the result of failed calculations (e.g. division by zero, missing upstream values, ...). Empty values could be because of sensor failure during tests or data sheets not being filled consistently.
Empty strings in categorical columns are also considered to be missing values and can be removed with this function. If a categorical column contains NaN
strings these would be considered as missing values as well.
How to use
- Select the dataset to work on in the field Data.
- In the field Columns select all columns of the dataset in which to search for missing data. You can either select a single column or multiple columns.
- Select the Method how missing data should be handled across multiple columns.
Any | Remove a row if any of the selected columns have a missing value. This is equivalent to a
|
All | Remove a row only if all selected columns have a missing value. This is equivalent to a
|
- You can either overwrite the existing dataset or enable Save output under different name to save a copy of the dataset.
- Click Apply to run the step.
- When the step is finished it will show an info how many rows were removed from the dataset.
Examples
Consider the following example table:
Row | A | B |
---|---|---|
1 | 1 | 4 |
2 | 2 | |
3 | 3 | |
4 | ||
5 | 5 | 5 |
- Selecting both Columns
A
andB
and the MethodAll
would only remove Row 4 in the example above. - Selecting both Columns
A
andB
and the MethodAny
would remove Rows 2, 3, and 4 in the example above.
You can use the function Plot Missing Data first to visually check your data for missing data.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article