Which coding method is described for transforming nominal attributes like eye color into numeric features in some models?

Prepare for the Data Mining Test with our comprehensive quizzes. Practice with various question types, each with hints and explanations. Boost your understanding and ensure success on your exam!

Multiple Choice

Which coding method is described for transforming nominal attributes like eye color into numeric features in some models?

Explanation:
Transforming nominal attributes into numeric features for models that require numbers is often done by turning categories into binary indicators. Dummy coding uses binary indicators for each category except one reference category, which is encoded implicitly as all zeros. This setup avoids redundancy and makes the interpretation of coefficients straightforward: each coefficient shows the effect of being in that category relative to the baseline. For eye color, imagine categories like blue, brown, and green. Choose brown as the baseline and create two indicators: is_blue and is_green. An observation with blue eyes would be is_blue = 1 and is_green = 0; brown eyes would be 0 and 0; green eyes would be 0 and 1. The model can then learn how blue or green eye color differs from brown, without implying any natural order among colors. Ordinal encoding would wrongly suggest an order among colors, which isn’t appropriate for nominal attributes. One-hot encoding is similar in spirit but uses a binary column for each category (sometimes leading to redundancy unless one category is dropped). Dummy coding, with a baseline, is the standard way described here to convert nominal categories for many models.

Transforming nominal attributes into numeric features for models that require numbers is often done by turning categories into binary indicators. Dummy coding uses binary indicators for each category except one reference category, which is encoded implicitly as all zeros. This setup avoids redundancy and makes the interpretation of coefficients straightforward: each coefficient shows the effect of being in that category relative to the baseline.

For eye color, imagine categories like blue, brown, and green. Choose brown as the baseline and create two indicators: is_blue and is_green. An observation with blue eyes would be is_blue = 1 and is_green = 0; brown eyes would be 0 and 0; green eyes would be 0 and 1. The model can then learn how blue or green eye color differs from brown, without implying any natural order among colors.

Ordinal encoding would wrongly suggest an order among colors, which isn’t appropriate for nominal attributes. One-hot encoding is similar in spirit but uses a binary column for each category (sometimes leading to redundancy unless one category is dropped). Dummy coding, with a baseline, is the standard way described here to convert nominal categories for many models.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy