Remove Missing

Modified on Tue, 3 Jan, 2023 at 12:16 AM

Description

With this function you can remove rows with missing data from a dataset.

Application

Missing data can either be an empty field or entries like NaN. NaN values are typically the result of failed calculations (e.g. division by zero, missing upstream values, ...). Empty values could be because of sensor failure during tests or data sheets not being filled consistently.

Empty strings in categorical columns are also considered to be missing values and can be removed with this function. If a categorical column contains NaN strings these would be considered as missing values as well.

How to use

Select the dataset to work on in the field Data.
In the field Columns select all columns of the dataset in which to search for missing data. You can either select a single column or multiple columns.
Select the Method how missing data should be handled across multiple columns.

Any

Remove a row if any of the selected columns have a missing value. This is equivalent to a OR condition:

if column A is empty OR column B is empty then remove the row.

All

Remove a row only if all selected columns have a missing value. This is equivalent to a AND condition:

if column A is empty AND column B is empty then remove the row.

You can either overwrite the existing dataset or enable Save output under different name to save a copy of the dataset.
Click Apply to run the step.
When the step is finished it will show an info how many rows were removed from the dataset.

Examples

Consider the following example table:

Row	A	B
1	1	4
2		2
3	3
4
5	5	5

Selecting both Columns A and B and the Method All would only remove Row 4 in the example above.
Selecting both Columns A and B and the Method Any would remove Rows 2, 3, and 4 in the example above.

More on this step

You can use the function Plot Missing Data first to visually check your data for missing data.