Which of the following statements is true about using binary values in k-Means?

Prepare for the Data Mining Test with our comprehensive quizzes. Practice with various question types, each with hints and explanations. Boost your understanding and ensure success on your exam!

Multiple Choice

Which of the following statements is true about using binary values in k-Means?

Explanation:
Binary values can be used in k-Means because the method operates on numeric features and minimizes the sum of squared distances to cluster centroids. When you have 0/1 features, the centroid of a cluster becomes the mean of those values for the members in that cluster, giving a vector with components between 0 and 1. This fractional centroid can still be used to compute distances to each data point, and each point is assigned to the nearest centroid. Interpreting the centroid components as the probability that a feature is 1 in that cluster helps you understand the result. Since 0 and 1 are already on a standard scale, scaling binary values isn’t required, though you might scale if you’re mixing binary features with other types of features. Hence, binary values can be used in k-Means.

Binary values can be used in k-Means because the method operates on numeric features and minimizes the sum of squared distances to cluster centroids. When you have 0/1 features, the centroid of a cluster becomes the mean of those values for the members in that cluster, giving a vector with components between 0 and 1. This fractional centroid can still be used to compute distances to each data point, and each point is assigned to the nearest centroid. Interpreting the centroid components as the probability that a feature is 1 in that cluster helps you understand the result. Since 0 and 1 are already on a standard scale, scaling binary values isn’t required, though you might scale if you’re mixing binary features with other types of features. Hence, binary values can be used in k-Means.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy