The data type of the dependent variable in linear regression must be numeric.

Prepare for the Data Mining Test with our comprehensive quizzes. Practice with various question types, each with hints and explanations. Boost your understanding and ensure success on your exam!

Multiple Choice

The data type of the dependent variable in linear regression must be numeric.

Explanation:
In linear regression, the dependent variable is treated as a numeric quantity that can be added, subtracted, and squared to measure errors. The method estimates a straight-line relationship by minimizing the sum of squared differences between observed numeric values and their predictions. This relies on having a true numeric scale for the outcome, so the residuals and the fit are meaningful. Categorical outcomes don’t have a natural numeric distance between categories, so forcing them into a linear combination of predictors leads to misinterpretation of coefficients and unreliable predictions. While a binary outcome can be coded as 0/1, using linear regression for it ignores the probability structure and can yield predictions outside the 0–1 range; logistic regression is typically preferred because it models probabilities directly. Ordinal outcomes have an ordered ranking but unequal gaps between levels, which a simple linear model would incorrectly assume; ordinal or multinomial models preserve the appropriate structure. Therefore, the appropriate data type for the dependent variable in standard linear regression is numeric.

In linear regression, the dependent variable is treated as a numeric quantity that can be added, subtracted, and squared to measure errors. The method estimates a straight-line relationship by minimizing the sum of squared differences between observed numeric values and their predictions. This relies on having a true numeric scale for the outcome, so the residuals and the fit are meaningful.

Categorical outcomes don’t have a natural numeric distance between categories, so forcing them into a linear combination of predictors leads to misinterpretation of coefficients and unreliable predictions. While a binary outcome can be coded as 0/1, using linear regression for it ignores the probability structure and can yield predictions outside the 0–1 range; logistic regression is typically preferred because it models probabilities directly. Ordinal outcomes have an ordered ranking but unequal gaps between levels, which a simple linear model would incorrectly assume; ordinal or multinomial models preserve the appropriate structure.

Therefore, the appropriate data type for the dependent variable in standard linear regression is numeric.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy