Probability Distributions every Data Scientist should know (1/3)

Probability Distributions are a representation where the value of the random variables is marked to their corresponding probabilities of occurrence.
A probability distribution is used to denote the probability mass or probability density, of either a discrete or a continuous variable. Hence, they are further classified as discrete probability distributions and continuous probability distributions.
In this blog, we are going to discuss two discrete distributions, namely, Bernoulli Distribution and Binomial Distribution.

Bernoulli Distribution


A Bernoulli distribution has only two possible outcomes, namely 1 (success) and 0 (failure), and a single trial. The random variable X which follows a Bernoulli distribution can take value 1 with the probability of success, say p, and the value 0 with the probability of failure, say q or 1-p.


  1. It consists of a single trial.
  2. There are only two outcomes a 1 or 0, i.e., success or failure each time. The probabilities of success and failure need not be equally likely.
  3. If the probability of success is p then the probability of failure is 1-p.

Probability Mass Function

The probability mass function is given by:



  1. Probability of getting head on the upper face in a coin toss.
  2. Probability of a roll of two dice resulting in a double six.
  3. Probability of how many boys are born and how many girls are born each day.

Binomial Distribution


An experiment with only two possible outcomes repeated n number of times is called a binomial distribution. We can also call it as a Bernoulli experiment with n trials. 


A random variable X following a binomial distribution with n number of trials with a probability of success p in each trial is denoted by –

X ~ B(n,p)


  1. The experiment is repeated fixed number of times (n times).
  2. There are only two outcomes i.e. success or failure each time. The probabilities of success and failure are exactly the same for each trial.
  3. If the probability of success is p then the probability of failure is 1-p and this remains the same across each successive trial.
  4. Every trial must be independent. That means the probabilities must remain the same throughout the trials; each event must be completely separate and have nothing to do with the previous event.

Winning a scratch-off lottery is an independent event. Your odds of winning on one ticket are the same as winning on any other ticket and thus can be considered as a Binomial trial. But drawing red balls from a bag of 50 colored balls is a dependent event. Your probability changes after every trial which is contradictory to the 4th rule.

Probability Mass Function

The probability mass function is given by –


If n = 1, i.e. a single trial then the binomial distribution reduces to Bernoulli distribution. Hence, Bernoulli distribution is also referred to as point binomial.


  1. Probability of the number of successes (getting head) in 12 trials when a coin is flipped 12 times.
  2. Probability of the number of successful baskets in 4 independent free throws of a basketball player.

In practice, Bernoulli distributions are no longer used because in fields like Data Science (and related) we will never see scenarios that consist of a single trial, but, it does cover the base for Binomial distribution (explained later) and the Poisson distribution that we are going to see in our next blog.

Join the Conversation


  1. Very well structured and even better explaination of all the basic statistic concepts….. I doubt if I ever have to open up books to look for basics of probability distribution…. Great work!!

  2. Very well structured article, with even better explanation of the concepts….. I sure don’t have a need to open my probability books for basic concepts….. Thanks for sharing!! It was a pleasure reading all the three posts!

Leave a comment

Your email address will not be published. Required fields are marked *