Standard Deviation :
$σ = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} (x_{i} - μ)^{2}}$
- $\sigma$ represents the standard deviation.
- $N$ = the number of data points.
- $x_i$ $i^{th}$ data point in the dataset
- $\mu$ is the mean ( average ) of the dataset
  - $\mu = \frac{1}{N} \sum_{i=1}^{N} x_i$
- $\sum_{i=1}^{N} (x_i - \mu)^2$ calculates the squared deviation of each data point from the mean , and then sums these squared deviations.
- $\sqrt{\cdot}$ is taken to bring the units back to the original units of the data , since squaring the deviations had squared the units
Standard Normal Distribution :
- This refers to a specific type of probability distribution.
- $0$ $1$ .
- The graph of a standard normal distribution is a bell-shaped curve where :
  - the x-axis represents the z-scores
  - the y-axis represents the probability density
Cumulative Distribution Function (CDF) :
- The CDF of any probability distribution, including the standard normal distribution,
  - is a function that gives the probability that a random variable is less than or equal to a certain value.
- $\Phi(z)$ $z$ $\Phi(z)$ $z$
Confidence Level : The confidence level is the probability that the true value of a parameter lies within the calculated confidence interval.
- Common choices for the confidence level are 90%, 95%, and 99%.
- For instance, if you set a confidence level of 95%, you are stating that if the same population were sampled multiple times, approximately 95% of the resulting confidence intervals would contain the true population parameter.
Confidence Interval : The confidence interval is the range of values, derived from the statistical analysis, that is likely to contain the true parameter with a specified level of confidence.
- It provides a range estimate for unknown population parameters.
- The width of the confidence interval gives us some idea about how uncertain we are about the unknown parameter.
- A wider interval might indicate that we need more data to get a precise estimate of the parameter, while a narrow interval might suggest our estimate is quite accurate.

Z-Score

$\Phi(z)$ , is given by :

Φ (z) = \frac{1}{\sqrt{2 π}} \int_{- \infty}^{z} e^{- \frac{t^{2}}{2}} d t

This returns :
- $z$
- $z$

$95\%$ $+z$ $-z$ $z$ :

Φ (z) - Φ (- z) = 0.95

Because the standard normal distribution is symmetric around 0 :
$Φ (- z) = 1 - Φ (z)$
Therefore, the equation becomes :

2 * Φ (z) - 1 = 0.95

$\Phi(z)$ :

Φ (z) = 0.975

$\Phi(z)$ $0.975$
This is the step that involves the inverse cumulative distribution function , or the percent-point function.
- $Φ^{-1}$ , so we solve :

z = Φ^{- 1} (0.975)

$\Phi(z)$ for the standard normal distribution.
$\Phi(z)$ involves an integral that cannot be expressed in terms of elementary functions
$\Phi^{-1}(0.975)$ $I(x)$ .
- $0$ ,
  - and the error function , commonly used in statistics , is a similar function ,
    - $[0, 1]$ $[-0.5, 0.5]$
- $p$ $0.975$ $I(x)$ $0.975 - 0.5 = 0.475$
Excel or some calculator software will just do a series approximation to compute the value of the inverse CDF
Simple Series approximation :
$z = Φ^{(- 1)} (p) = s q r t (2) * I (p - 0.5)$
- $I$ $\text{erf}^{-1}(x)$ , and is given by :
$I (x) = \sum_{n = 0}^{\infty} \frac{(2 n)!}{n! (2^{n})} * \frac{\sqrt{π} x^{2 n + 1}}{2^{2 n + 1}}$
- $n$ $0$ to infinity
- This power series provides a mathematical approximation for the inverse error function , which can be used to compute the inverse of the cumulative distribution function of the standard normal distribution.
- However , this series converges very slowly and is not practically used for numerical computation.
- Instead, other faster-converging numerical approximations are used in software and tables.
- The implementation details of these faster methods are usually more complex and are beyond the scope of a basic understanding of statistical theory.

Sample Size

The confidence level corresponds to a Z-score, which is a constant value for any given confidence level
- it technically is the number of standard deviations a data point is away from the mean
Here are the Z-scores for the most common confidence intervals :
- 90% - Z score = 1.645
- 95% - Z score = 1.96
- 99% - Z score = 2.576
To calculate an appropriate sample size :

Necessary Sample Size = 2 * (Z-Score) * (StdDev) * (1 - StdDev) * \frac{1}{{(Confidence Interval)}^{2}}

$95\%$ $0.5$ $\pm 5\%$ :
- 95% confidence level :
  - if you conducted the same experiment 100 times, you'd expect the true value to fall within your calculated range 95 times
- $\pm 5\%$ ,
  - if the mean is estimated to be 16 , the range would be from 15.2 to 16.8 ,
    - suggesting that we are 95% confident the true mean falls within this range

Sample Size = \frac{2 * 1.96 * 0.5 * (1 - 0.5)}{(0.05)^{2}}

Sample Size \approx \frac{3.92 * 0.25}{0.0025}

Sample Size \approx 392.0