Standard Deviation :
given by
The sum
The square root
Standard Normal Distribution :
This refers to a specific type of probability distribution.
It is a normal distribution (a type of bell curve) that has a mean of
The graph of a standard normal distribution is a bell-shaped curve where :
the x-axis represents the z-scores
the y-axis represents the probability density
Cumulative Distribution Function (CDF) :
The CDF of any probability distribution, including the standard normal distribution,
is a function that gives the probability that a random variable is less than or equal to a certain value.
In the case of the standard normal distribution, the CDF is denoted as
Confidence Level : The confidence level is the probability that the true value of a parameter lies within the calculated confidence interval.
Common choices for the confidence level are 90%, 95%, and 99%.
For instance, if you set a confidence level of 95%, you are stating that if the same population were sampled multiple times, approximately 95% of the resulting confidence intervals would contain the true population parameter.
Confidence Interval : The confidence interval is the range of values, derived from the statistical analysis, that is likely to contain the true parameter with a specified level of confidence.
It provides a range estimate for unknown population parameters.
The width of the confidence interval gives us some idea about how uncertain we are about the unknown parameter.
A wider interval might indicate that we need more data to get a precise estimate of the parameter, while a narrow interval might suggest our estimate is quite accurate.
The CDF of the standard normal distribution, often denoted by
This returns :
the area under the curve from negative infinity to
the probability that a random variable from a standard normal distribution is less than or equal to a certain value
Z-Score Example : To find the z-score such that
Because the standard normal distribution is symmetric around 0 :
Therefore, the equation becomes :
Solving for
So we're looking for the z-score where the cumulative distribution function
This is the step that involves the inverse cumulative distribution function , or the percent-point function.
This function is generally denoted
The problem is , there's no simple closed-form solution for the inverse of the cumulative distribution function
The standard normal cumulative distribution function
In the computation of
This is because the standard normal distribution is symmetric around
and the error function , commonly used in statistics , is a similar function ,
but defined over
For a probability
Excel or some calculator software will just do a series approximation to compute the value of the inverse CDF
Simple Series approximation :
where
This is a power series that is summed over all integer values of
This power series provides a mathematical approximation for the inverse error function , which can be used to compute the inverse of the cumulative distribution function of the standard normal distribution.
However , this series converges very slowly and is not practically used for numerical computation.
Instead, other faster-converging numerical approximations are used in software and tables.
The implementation details of these faster methods are usually more complex and are beyond the scope of a basic understanding of statistical theory.
The confidence level corresponds to a Z-score, which is a constant value for any given confidence level
it technically is the number of standard deviations a data point is away from the mean
Here are the Z-scores for the most common confidence intervals :
90% - Z score = 1.645
95% - Z score = 1.96
99% - Z score = 2.576
To calculate an appropriate sample size :
Example of a
95% confidence level :
if you conducted the same experiment 100 times, you'd expect the true value to fall within your calculated range 95 times
with a confidence interval of
if the mean is estimated to be 16 , the range would be from 15.2 to 16.8 ,
suggesting that we are 95% confident the true mean falls within this range