Brian Richmond's profile

ML Basics: Prediction and Confidence Intervals

Data Scientist Brian Richmond currently works for Aura Health in San Francisco. As leader of the product intelligence team, Brian Richmond creates machine learning (ML) models to improve products for the company. An important part of modeling is the confidence around any prediction, such as prediction intervals and confidence intervals.

Prediction intervals differ from confidence intervals in their purpose. Confidence intervals show the window of error within which an estimated value, such as an average, will fall. Prediction intervals represent the window of error around a value and the error in the prediction of the future value. So, the prediction interval is always larger than the confidence interval.

Prediction intervals work in ML by forecasting what a value will be in the future, such as the number of employees in a company a year from now, based on past growth together with data on business trajectory. If a company currently had 1,000 employees 2 years ago, 2,000 one year ago and 4,000 employees now, an ML model might predict 2x growth each year, at 8,000 a year from now. However, the number of employees is unlikely to be exactly 8,000. So, if the 95% prediction interval is 7,000-9,000, then there is a 95% chance that the company’s headcount will fall in that range, assuming the model’s data is robust and the future business environment does not change dramatically.

Confidence intervals function differently in that they provide estimates of multiple samples in a given population regarding how likely a statement is to be true. These intervals involve using the mean, and standard deviation to determine the chance that a sample group will fall within a certain standard deviation.

For example, if a researcher is trying to estimate the percentage of Americans that love basketball, they would take a number of groups (samples, perhaps from surveys of different groups) to see what percentage of people like basketball. After several samplings, the researcher plots each percentage on a graph to determine how many people fall within the distribution parameters (e.g. standard deviation of -2 to +2). Typically, the confidence interval investigates how many of the samples (people who love basketball) fall between the preset parameters that determine confidence. So, the confidence interval tells you about the likely location of the true population parameter.
ML Basics: Prediction and Confidence Intervals
Published:

ML Basics: Prediction and Confidence Intervals

Published:

Creative Fields