Agenda
•
Sampling
distribution
•
Central
Limit Theorem
•
Confidence
intervals
•
Hypothesis
Formulation
•
Null
and Alternative Hypothesis
•
Type
I and Type II Errors
•
Hypothesis
Testing
– One-tailed v/s two-tailed test
– Test of mean
– Test of proportion
– Test of variance
•
Examples
Concepts of sampling distribution
• Why do we need sampling?
• Analyse the sample and
make inferences about the population
• Sample statistic vs population parameter
• Sampling distribution –
distribution of a particular sample statistic of all possible samples that can be
drawn from a population – sampling distribution of the mean
Sampling Distribution: CLT
•
If n
samples are drawn from a population that has a mean µ and standard deviation σ:
•
The
sampling distribution follows a normal distribution with:
•
Mean:
µ
•
Standard
Deviation: σ / √n (also c/a Standard Error)
•
The
corresponding z-score transformation is:
If
the population is normal, this holds true even for smaller sample sizes.
•
However,
if the population is not normal, this holds true for sufficiently large sample
sizes.
Central Limit Theorem
•
“Sampling
Distribution of the mean of any independent random variable will be normal”
•
This
applies to both discrete and continuous distributions.
•
The
random variable should have a well defined mean and variance (standard
deviation).
•
Applicable
even when the original variable is not normally distributed.
•
Assumptions:
•
The
data must be randomly sampled.
•
The
samples values must be independent of each other.
•
The 10% condition: When the sample is drawn without replacement, the
sample size n, should be no more than 10% of the population.
– In general, a sample size of 30 is considered
sufficient.
•
The
sample size must be sufficiently large.
– If the population is skewed, pretty large
sample size is needed.
– For a symmetric population, even small samples
are acceptable.
Central Limit Theorem (contd.)
Assume a dice is rolled in sets of 4 trials and the faces are
This is repeated for a month (30 days)
Null and Alternative Hypothesis
All statistical conclusions are made in reference to the null
hypothesis.
We either reject the null hypothesis or fail to
reject the null hypothesis; we do not accept the null hypothesis.
From the start, we assume the
null hypothesis to be true, later the assumption is rejected or we fail to
reject it.
•
When we reject the null hypothesis, we can conclude that the
alternative hypothesis is supported.
•
If we fail to reject the null hypothesis, it does not mean
that we have proven the null hypothesis is true.
– Failure to reject the null hypothesis does not
equate to proving that it is true.
– It just holds up our assumption or the status quo.
Type of
hypothesis tests
•
Single
sample or two or more samples
•
One
tailed or two tailed
•
Tests of
mean, proportion or variance
Example
problem - Single Sample z – test of mean
•
You are the
manager of a fast food restaurant. You want to determine if the population mean
waiting time has changed from the 4.5 minutes. You can assume that the population
standard deviation is 1.2 minutes. You select a sample of 25 orders in an hour.
Sample mean is 5.1 minutes. Use the relevant hypothesis test to determine if
the population mean has changed from the past value of 4.5.
Steps to
solve the problem…
•
One-tailed or two-tailed
•
What is Ho and Ha
•
Determine Z and Zstat
•
Draw the normal curve
•
Reject/Fail to reject Ho?
Hypothesis Tests using Python
z-test
statsmodels.stats.weightstats.ztest(x1, x2=None, value=0, alternative='two-
sided')
Link to refer -
https://www.statsmodels.org/stable/generated/statsmodels.stats.weightstats.ztest.html
t-test
scipy.stats.ttest_ind(a, b, axis=0, equal_var=True, nan_policy='propagate')
Link to refer -
Chi-square
(χ2 ) test
scipy.stats.chisquare(f_obs, f_exp=None)
Link to refer
-
F-test
alpha = 0.05 #Or whatever you want your alpha to
be.
p_value = scipy.stats.f.cdf(F, df1,
df2)
if p_value > alpha: # Reject the null
hypothesis that Var(X) == Var(Y)
Hypothesis Testing Using Python
Two Sample Testing
Some important functions:
1.t_statistic, p_value = ttest_ind(group1, group2)
2.u, p_value = mannwhitneyu(group1, group2)
3.t_statistic, p_value = ttest_1samp(post-pre, 0)
4.z_statistic, p_value = wilcoxon(post-pre)
5. levene(pre,post)
6. shapiro(post)
ANOVA- One Way Classification
•
The
samples drawn from different populations are independent and random.
•
The
response variables of all the populations are normally distributed.
•
The
variances of all the populations are equal.
Hypothesis of One-Way ANOVA
•
H0
: µ1 = µ2 = µ3 = µ4 = …= µk
–
All population means are equal
•
H1
: Not all of the population means are equal
–
For at least one pair, the population means are unequal.
solutions and examples next part.....
No comments:
Post a Comment