Tuesday, April 28, 2020

Hypothesis Testing -


Agenda




          Sampling distribution

          Central Limit Theorem

          Confidence intervals

          Hypothesis Formulation

          Null and Alternative Hypothesis

          Type I and Type II Errors

          Hypothesis Testing

  One-tailed v/s two-tailed test

  Test of mean

  Test of proportion

  Test of variance

          Examples



Concepts of sampling distribution




       Why do we need sampling?

       Analyse the sample and make inferences about the population

       Sample statistic vs population parameter

       Sampling distribution – distribution of a particular sample statistic of all possible samples that can be drawn from a population – sampling distribution of the mean




 Sampling Distribution: CLT




          If n samples are drawn from a population that has a mean µ and standard deviation σ:

          The sampling distribution follows a normal distribution with:

          Mean: µ

          Standard Deviation: σ / √n (also c/a Standard Error)

          The corresponding z-score transformation is:

            If the population is normal, this holds true even for smaller sample sizes.

          However, if the population is not normal, this holds true for sufficiently large sample sizes.



 Central Limit Theorem




         “Sampling Distribution of the mean of any independent random variable will be normal”

         This applies to both discrete and continuous distributions.

         The random variable should have a well defined mean and variance (standard deviation).

         Applicable even when the original variable is not normally distributed.



         Assumptions:

         The data must be randomly sampled.

         The samples values must be independent of each other.

         The 10% condition: When the sample is drawn without replacement, the sample size n, should be no more than 10% of the population.

–  In general, a sample size of 30 is considered sufficient.

         The sample size must be sufficiently large.

–  If the population is skewed, pretty large sample size is needed.

–  For a symmetric population, even small samples are acceptable.

Central Limit Theorem (contd.)

 Assume a dice is rolled in sets of 4 trials and the faces are 

This is repeated for a month (30 days)








 Null and Alternative Hypothesis



All statistical conclusions are made in reference to the null hypothesis.

We either reject the null hypothesis or fail to reject the null hypothesis; we do not accept the null hypothesis.

From the start, we assume the null hypothesis to be true, later the assumption is rejected or we fail to reject it.

        When we reject the null hypothesis, we can conclude that the alternative hypothesis is supported.

        If we fail to reject the null hypothesis, it does not mean that we have proven the null hypothesis is true.

–  Failure to reject the null hypothesis does not equate to proving that it is true.

–  It just holds up our assumption or the status quo.










Type of hypothesis tests



        Single sample or two or more samples

        One tailed or two tailed

        Tests of mean, proportion or variance















Example problem - Single Sample z – test of mean




          You are the manager of a fast food restaurant. You want to determine if the population mean waiting time has changed from the 4.5 minutes. You can assume that the population standard deviation is 1.2 minutes. You select a sample of 25 orders in an hour. Sample mean is 5.1 minutes. Use the relevant hypothesis test to determine if the population mean has changed from the past value of 4.5.








Steps to solve the problem…



          One-tailed or two-tailed

          What is Ho and Ha

          Determine Z and Zstat

          Draw the normal curve

          Reject/Fail to reject Ho?





















Hypothesis Tests using Python


z-test

statsmodels.stats.weightstats.ztest(x1, x2=None, value=0, alternative='two-

sided')

Link to refer -
https://www.statsmodels.org/stable/generated/statsmodels.stats.weightstats.ztest.html

t-test



scipy.stats.ttest_ind(a, b, axis=0, equal_var=True, nan_policy='propagate')

Link to refer -

Chi-square (χ2 ) test



scipy.stats.chisquare(f_obs, f_exp=None)

Link to refer -
F-test


alpha = 0.05 #Or whatever you want your alpha to be.
p_value = scipy.stats.f.cdf(F, df1, df2)
if p_value > alpha: # Reject the null hypothesis that Var(X) == Var(Y)







Hypothesis Testing Using Python




Two Sample Testing

Some important functions:





1.t_statistic, p_value = ttest_ind(group1, group2)

2.u, p_value = mannwhitneyu(group1, group2)

3.t_statistic, p_value = ttest_1samp(post-pre, 0)

4.z_statistic, p_value = wilcoxon(post-pre)


5. levene(pre,post)
6. shapiro(post)















ANOVA- One Way Classification




                The samples drawn from different populations are independent and random.




                The response variables of all the populations are normally distributed.




                The variances of all the populations are equal.



























Hypothesis of One-Way ANOVA



            H0 : µ1 = µ2 = µ3 = µ4 = …= µk



–  All population means are equal




                  H1 : Not all of the population means are equal




–  For at least one pair, the population means are unequal.











solutions and examples next part.....



No comments:

Post a Comment