True or false: Decision trees are better than more numerical approaches (such as linear regression) at handling attributes that have missing or inconsistent values.

Prepare for the Data Mining Test with our comprehensive quizzes. Practice with various question types, each with hints and explanations. Boost your understanding and ensure success on your exam!

Multiple Choice

True or false: Decision trees are better than more numerical approaches (such as linear regression) at handling attributes that have missing or inconsistent values.

Explanation:
Decision trees handle missing or inconsistent attributes more gracefully than numerical models like linear regression. The tree makes decisions by splitting data on available attributes, so it can still progress even if some values are missing. Many tree algorithms use surrogate splits: if the primary splitting attribute is missing, they look for another attribute that closely mimics that split to route the instance. They can also treat missing values as their own category or distribute a missing example across branches during training, which lets the model learn from patterns in how missingness relates to the target. This flexibility reduces the need for extensive imputation or preprocessing. By contrast, linear regression and other numerical approaches typically require complete data or rely on imputation to fill in gaps before fitting the model. Imputation introduces additional assumptions and potential bias, and incomplete data can disrupt the assumptions these models rely on, making them less robust to missing or inconsistent values. So, for datasets where attributes are frequently missing or measured inconsistently, decision trees often offer more reliable performance without heavy preprocessing.

Decision trees handle missing or inconsistent attributes more gracefully than numerical models like linear regression. The tree makes decisions by splitting data on available attributes, so it can still progress even if some values are missing. Many tree algorithms use surrogate splits: if the primary splitting attribute is missing, they look for another attribute that closely mimics that split to route the instance. They can also treat missing values as their own category or distribute a missing example across branches during training, which lets the model learn from patterns in how missingness relates to the target. This flexibility reduces the need for extensive imputation or preprocessing.

By contrast, linear regression and other numerical approaches typically require complete data or rely on imputation to fill in gaps before fitting the model. Imputation introduces additional assumptions and potential bias, and incomplete data can disrupt the assumptions these models rely on, making them less robust to missing or inconsistent values.

So, for datasets where attributes are frequently missing or measured inconsistently, decision trees often offer more reliable performance without heavy preprocessing.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy