Episode 36 — Use cross-validation correctly: folds, leakage avoidance, and time-aware splits

This episode breaks down cross-validation as a method for estimating performance more reliably, and it emphasizes the two DY0-001 failure modes that matter most: leakage and using the wrong split strategy for the data. You’ll learn how k-fold cross-validation works, what “stratified” means for imbalanced classification, and why repeated CV can reduce sensitivity to a lucky split. We’ll also cover when cross-validation is the wrong tool, such as strict time series problems where shuffling breaks temporal order and produces inflated results. You’ll practice recognizing time-aware alternatives like rolling or expanding windows, and you’ll learn how to keep preprocessing, feature selection, and imputation inside each fold so you don’t train on information you shouldn’t have. Troubleshooting includes spotting “too good to be true” scores, diagnosing fold leakage from target encoding or scaling, and choosing fold counts that balance compute cost with estimate stability. Produced by BareMetalCyber.com, where you’ll find more cyber audio courses, books, and information to strengthen your educational path. Also, if you want to stay up to date with the latest news, visit DailyCyber.News for a newsletter you can use, and a daily podcast you can commute with.
Episode 36 — Use cross-validation correctly: folds, leakage avoidance, and time-aware splits
Broadcast by