Which of the following should not be used to mark missing data in a data set?

Prepare for the Data Mining Test with our comprehensive quizzes. Practice with various question types, each with hints and explanations. Boost your understanding and ensure success on your exam!

Multiple Choice

Which of the following should not be used to mark missing data in a data set?

Explanation:
Using a real numeric value to denote missing data is risky because that value can be a legitimate observation. Zero often represents an actual measurement (for example, a count, a temperature, or a score). If you mark missing data with 0, analyses such as computing averages, totals, or correlations will interpret those records as if the quantity truly equals zero, which can bias results and hide the fact that data are incomplete. This misrepresentation can propagate through imputation, modeling, and reporting, leading to incorrect conclusions. Markers like NA or NULL are designed to indicate absence and are typically treated specially by data tools, ensuring they are not mistaken for real values. A placeholder like ? is also commonly used in human-readable data to signal missingness, and many systems map it to a missing value. Because of this, 0 should not be used to signify missing data, as it can blur the line between observed values and missingness.

Using a real numeric value to denote missing data is risky because that value can be a legitimate observation. Zero often represents an actual measurement (for example, a count, a temperature, or a score). If you mark missing data with 0, analyses such as computing averages, totals, or correlations will interpret those records as if the quantity truly equals zero, which can bias results and hide the fact that data are incomplete. This misrepresentation can propagate through imputation, modeling, and reporting, leading to incorrect conclusions.

Markers like NA or NULL are designed to indicate absence and are typically treated specially by data tools, ensuring they are not mistaken for real values. A placeholder like ? is also commonly used in human-readable data to signal missingness, and many systems map it to a missing value. Because of this, 0 should not be used to signify missing data, as it can blur the line between observed values and missingness.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy