Thursday, April 23, 2020

Statistical Learning - Probability and Distributions


Statistical Learning - Probability and Distributions



Probability – Meaning & Concepts



Probability refers to chance or likelihood of a particular event-taking place.


An event is an outcome of an experiment.


An experiment is a process that is performed to understand and observe possible 

outcomes.


Set of all outcomes of an experiment is called the sample space.





Example

       In a manufacturing unit three parts from the assembly are selected. You are observing whether they are defective or non-defective. Determine

a)            The sample space.

b)            The event of getting at least two defective parts.



Definition of Probability















Marginal Probability


      Contingency table consists of rows and columns of two attributes at different levels with frequencies or numbers in each of the cells. It is a matrix of frequencies assigned to rows and columns.

      The term marginal is used to indicate that the probabilities are calculated using a contingency table (also called joint probability table).





Solution


a)          What is the probability that a randomly selected family is a buyer of the 

Car?

      80/200 =0.40.



b)          What is the probability that a randomly selected family is both a buyer of car and belonging to income of Rs. 10 lakhs and above?
      42/200 =0.21.



c)         A family selected at random is found to be belonging to income of Rs 10 lakhs and above. What is the probability that this family is buyer of car?

      42/80 =0.525. Note this is a case of conditional probability of buyer given income is Rs. 10 lakhs and above.



Bayes’ Theorem




      Bayes’ Theorem is used to revise previously calculated probabilities based on new information.

      Developed by Thomas Bayes in the 18th Century.

      It is an extension of conditional probability.






Many modern machine learning techniques rely on Bayes' theorem. For instance, spam filters use Bayesian updating to determine whether an email is real or spam, given the words in the email. Additionally, many specific techniques in statistics, such as calculating p-values or interpreting medical results, are best described in terms of how they contribute to updating hypotheses using Bayes' theorem.







What is a Probability Distribution



      In precise terms, a probability distribution is a total listing of the various values the random variable can take along with the corresponding probability of each value. A real life example could be the pattern of distribution of the machine breakdowns in a manufacturing unit.

      The random variable in this example would be the various values the machine breakdowns could assume.

      The probability corresponding to each value of the breakdown is the relative frequency of occurrence of the breakdown.

      The probability distribution for this example is constructed by the actual breakdown pattern observed over a period of time. Statisticians use the term


“observed distribution” of breakdowns.




Binomial Distribution




      The Binomial Distribution is a widely used probability distribution of a discrete random variable.

      It plays a major role in quality control and quality assurance function. Manufacturing units do use the binomial distribution for defective analysis.

      Reducing the number of defectives using the proportion defective control chart (p chart) is an accepted practice in manufacturing organizations.

      Binomial distribution is also being used in service organizations like banks, and insurance corporations to get an idea of the proportion customers who are satisfied with the service quality.


Conditions for Applying Binomial Distribution
(Bernoulli Process)


      Trials are independent and random.

      There are fixed number of trials (n trials).

      There are only two outcomes of the trial designated as success or failure.

      The probability of success is uniform through out the n trials





Example for Binomial Distribution


A bank issues credit cards to customers under the scheme of Master Card. Based on the past data, the bank has found out that 60% of all accounts pay on time following the bill. If a sample of 7 accounts is selected at random from the current database, construct the Binomial Probability Distribution of accounts paying on time.





















Statistical Learning - Probability and Distributions

download pdf








No comments:

Post a Comment