Which cross-validation technique ensures each fold preserves the overall distribution of the target variable?

Get ready for the GARP Risk and AI Exam with flashcards and multiple choice questions. Each question comes with hints and explanations. Prepare for success!

Multiple Choice

Which cross-validation technique ensures each fold preserves the overall distribution of the target variable?

Explanation:
Preserving the target distribution across folds is what stratified cross-validation achieves. When you split data for validation, you want each fold to reflect the same mix of target values as the full dataset, so the evaluation metrics aren’t biased by unusual fold compositions. Stratified splitting ensures each fold contains roughly the same proportion of each class (or distribution shape in classification tasks) as the entire data, which is especially important when classes are imbalanced. This leads to more reliable estimates of model performance. Other approaches don’t guarantee this balance: random cross-validation can produce folds with differing class proportions, leave-one-out validation uses single samples and can be unstable for small or imbalanced datasets, and time series split preserves temporal order rather than maintaining a representative distribution across folds.

Preserving the target distribution across folds is what stratified cross-validation achieves. When you split data for validation, you want each fold to reflect the same mix of target values as the full dataset, so the evaluation metrics aren’t biased by unusual fold compositions. Stratified splitting ensures each fold contains roughly the same proportion of each class (or distribution shape in classification tasks) as the entire data, which is especially important when classes are imbalanced. This leads to more reliable estimates of model performance. Other approaches don’t guarantee this balance: random cross-validation can produce folds with differing class proportions, leave-one-out validation uses single samples and can be unstable for small or imbalanced datasets, and time series split preserves temporal order rather than maintaining a representative distribution across folds.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy