Binomial Distribution

Binomial distribution of D atoms in methylcyclopentane results from repeating the forward and backward reaction of the H- (or D-) transfer step to form methylcyclopentane.

From: Studies in Surface Science and Catalysis , 2007

Random Variables

Kumar Molugaram , G. Shanker Rao , in Statistical Techniques for Transportation Engineering, 2017

4.20.1 Binomial Distribution

Binomial distribution is a discrete distribution. It is a commonly used probability distribution. Then it is developed to represent various discrete phenomenons, which occur in business, social sciences, natural sciences, and medical research.

Binomial distribution is widely used due to its relation with binomial distribution. The following should be satisfied for the application of binomial distribution:

1.

The experiment consists of n identical trials, where n is finite.

2.

There are only two possible outcomes in each trial, i.e., each trial is a Bernoulli's trial. We denote one outcome by S (for success) and other by F (for failure).

3.

The probability of S remains the same from trial to trial. The probability of S (Success) is denoted by p and the probability of failure by q (where p+q=1).

4.

All the trials are independent.

5.

The Binomial random variable x is the number of success in n trials.

If X denotes the number of success in n trials under the conditions stated above, then x is said to follow binomial distribution with parameters n and p.

Definition

(Binomial distribution) A discrete random variable taking the values 0, 1, 2, …, n is said to follow binomial distribution with parameters n and p if its pmf is given by

P ( X = x ) = P ( x ) n C x p x q n x , x = 0 , 1 , 2 , , n 0 < p < 1 , q = 1 p = 0 , otherwise

If x follows Binomial distribution with parameters n and p symbolically we express X~B(n, p) or B(x: n, p).

A binomial random variable is the number of successes x in n repeated trials of a binomial experiment. The probability distribution of a binomial random variable is called a binomial distribution (It is also known as a Bernoulli distribution).

A cumulative binomial probability refers to the probability that the binomial random variable falls within a specified range.

Remark

We have

x = 0 n P ( X = x ) = x = 0 n C x n p x q n x = ( q + p ) n = 1

The probabilities are the terms in the binomial expansion of (q+p) n (i.e., (p+q) n ), hence name Binomial distribution given.

The Binomial distribution is used to analyze the error in experimental results that estimate the proportion of in a population that satisfy a condition of interest.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128115558000040

Methods to develop mathematical models: traditional statistical analysis

Jorge Garza-Ulloa , in Applied Biomechatronics using Mathematical Models, 2018

5.1.4.2.1 Discrete binomial distribution

The discrete binomial distribution is the number of success in a sequence of independent experiments, and the pmf is indicated in Eq. (5.43) and the cdf in Eq. (5.44)

(5.43) Discrete   binomial   distribution   pmf f x = n x p x 1 p n x

where x = 0,1,2,…,n and n x = n ! x ! n x ! and n ! = n ( n 1 ) ( n 2 ) ( 2 ) ( 1 )

(5.44) Discrete   binomial   distribution   cumulative   distibution cdf F x = P X x = i = 0 | x | n i p i 1 p n i

Note: The discrete binomial distribution with n = 10 and p = 0.5 is indicated in Fig. 5.5A.

Figure 5.5. Chart Examples for Probability Distributions:

(A) Discrete binomial distribution pdf with n = 10 and P = 0.5, (B) discrete poisson distribution pdf with lambda = 5, and (C) continuous exponential distribution pdf with lambda = 2.5.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128125946000056

Mathematics

In Standard Handbook of Petroleum and Natural Gas Engineering (Third Edition), 2016

1.8.4 Probability Distributions for Discrete Random Variables

The binomial distribution applies to random variables where there are only two possible outcomes (A or B) for each trial and where the outcome probability is constant over all n trials. If the probability of A occurring on any one trial is denoted as p and the number of occurrences of A is denoted as x, then the binomial coefficient is given by

n x = n ! x ! n x !

and the probability of getting x occurrences of A in n trials is

b x ; n , p = x n p x 1 p n x for x = 0 , 1 , 2 , , n

The cumulative probability of the binomial distribution is given by

B x ; n , p = i = 0 x b i ; n , p

For the binomial distribution

μ = n p σ = n p 1 p

For np ≥ 5 and n 1 p 5 , an approximation of binomial probabilities is given by the standard normal distribution where z is a standard normal deviate and

z = x n p n p 1 p

The negative binomial distribution defines the probability of the kth occurrence of an outcome occurring on the xth trial as

b x ; k , p = x 1 k 1 p k 1 p x k for x = k , k + 1 , k + 2 ,

and

μ = k 1 p / p σ 2 = k 1 p / p 2

If the probabilities do not remain constant over the trials and if there are k (rather than two) possible outcomes of each trial, the hypergeometric distribution applies. For a sample of size N of a population of size T, where

t 1 + t 2 + + t k = T , and n 1 + n 2 + + n k = N

the probability is

h n i ; N , t i , T = t 1 n 1 t 2 n 2 t k n k T N

The Poisson distribution can be used to determine probabilities for discrete random variables where the random variable is the number of times that an event occurs in a single trial (unit of time, space, etc.). The probability function for a Poisson random variable is

P x ; μ = e μ μ x x ! for x = 0 , 1 , 2 , . . .

where μ=mean of the probability function (and also the variance)

The cumulative probability function is

F X ; μ = i = 0 x P i ; μ

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123838469000011

Theoretical distributions

J. Hayavadana , in Statistics for Textile and Apparel Management, 2012

Characteristics

1.

The binomial distribution is a distribution of discrete variable.

2.

The formula for a distribution is P(x) = nCx px qn–x

Or

P x = n 1 x 1 n x 1 p x 1 p n x

3

An example of binomial distribution may be P(x) is the probability of x defective items in a sample size of 'n' when sampling from on infinite universe which is fraction 'p' defective.

4.

Mean of binomial distribution is given by mean x ¯ = np

5.

Standard deviation is given by σ x = √nP(1 – P) or σ x = √npq These are the formulas used in "acceptance sampling" and in control charts.

6.

When P = 0.5, the binomial distribution is symmetrical around its mean.

7.

When P > 0.5, the right hand tail of distribution is longer.

8.

When P < 0.5, the left hand tail of distribution is longer.

9.

The standard deviation of the binomial distribution has its maximum value where P = 0.5.

Example 22: 10 coins are thrown simultaneously. Find the probability of getting atleast seven heads.

Solution: P = Probability of getting a head = ½; q = Probability of not getting a head = ½.

Probability of getting x heads in a thrown of 'n' coins is P(x) = nCx px qn–x

P 7 = 10 C 7 P 7 q 10 7 = 10 C 7 1 / 2 10 P 8 = 10 C 8 1 / 2 10 P 9 = 10 C 9 1 / 2 10 P 10 = 10 C 10 1 / 2 10

Or probability of getting at least 7 heads

P x = 7 = P 7 + P 8 + P 9 + P 10 = 1 / 2 10 C 7 + 10 C 8 + 10 C 9 + 10 C 10 = 1 / 2 10 120 + 45 + 10 + 1 = 176 / 1024 = 0.1718

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780857090027500059

Mathematical Foundations

Xin-She Yang , in Nature-Inspired Optimization Algorithms (Second Edition), 2021

2.6.2 Common Probability Distributions

The binomial distribution concerns a binary variable that takes only two possible outcomes: success/yes (i.e., 1) with probability p, or failure/no (i.e., 0) with probability 1 p . For n independent trials, the probability of X taking value of k is

(2.54) B ( n , p ) = ( n k ) p k ( 1 p ) n k , ( k = 0 , 1 , 2 , . . . , n ) ,

where

(2.55) ( n k ) = n ! k ! ( n k ) ! .

Its mean and variance are μ = n p and σ 2 = n p ( 1 p ) , respectively.

A very widely used probability density distribution is the Poisson distribution, which can be considered as the limiting case of a binomial distribution for small probability events with a large number of independent trials. This requires that λ = n p > 0 is a finite value with n 1 (but 0 < p 1 ). The Poisson distribution is given by

(2.56) P ( X = n ) = λ n e λ n ! , ( n = 0 , 1 , 2 , . . . ) ,

where its mean is λ. It is also easy to verify that its variance is also λ.

Probably, the most widely used distribution is the Gaussian distribution or Gaussian normal distribution of continuous random variables. The Gaussian distribution is given by

(2.57) p ( x ) = 1 σ 2 π exp { ( x μ ) 2 2 σ 2 } ,

where μ and σ 2 are the mean and variance, respectively, of the random variable X. Since the domain of X is the whole real number, the total probability requires that

(2.58) p ( x ) d x = 1 .

Its cumulative probability function (CPF) can be obtained by integrating

(2.59) F ( x ) = P ( X < x ) = 1 2 π σ 2 x e ( u μ ) 2 2 σ 2 d u = 1 2 [ 1 + erf ( x μ 2 σ ) ] ,

where the error function is defined by

(2.60) erf ( x ) = 2 π 0 x e ζ 2 d ζ .

The Gaussian normal distribution is usually denoted by N( μ , σ 2 ) in the literature. If μ = 0 and σ = 1 , it becomes simply a normal distribution N( 0 , 1 ).

In the context of nature-inspired algorithms and their initialization, the uniform distribution is commonly used, which is defined by a constant probability p over an interval [a, b]

(2.61) p ( x ) = 1 b a , x [ a , b ] .

By simple integration, it is straightforward to show that its mean is E [ X ] = ( a + b ) / 2 and variance is σ 2 = ( b a ) 2 / 12 .

Another important distribution is Student's t-distribution

(2.62) p ( t ) = A ( 1 + t 2 n ) ( n + 1 ) / 2 , A = Γ ( ( n + 1 ) / 2 ) n π Γ ( n / 2 ) ,

where < t < + and n is the degree of freedom. Here the special Γ-function is given by

(2.63) Γ ( ν ) = 0 x ν 1 e x d x ,

which leads to the factorial Γ ( n ) = ( n 1 ) ! when ν = n is a positive integer.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128219867000093

Engineering mathematics

John Barron , ... (Section 17.6), in Mechanical Engineer's Reference Book (Twelfth Edition), 1994

17.6.8 Probability Distributions

There are several mathematical formulae with well-defined characteristics and these are known as probability distributions. If a problem can be made to fit one of these distributions then its solution is simplified. Distributions can be discrete when the characteristic can only take certain specific values, such as 0, 1, 2, etc., or they can be continuous where the characteristic can take any value.

17.6.8.1 Binomial Distribution

The binomial probability distribution is given by

(17.64) ( p + q ) n = q n + n C 1 p q n 1 + n C 2 p 2 q n 2 + + n C x p x q n x + + p n

where p is the probability of an event occurring, q(= 1 − p) is the probability of an event not occurring and n is the number of selections.

The probability of an event occurring m successive times is given by the binomial distribution as

The binomial distribution is used for discrete events and is applicable if the probability of occurrence p of an event is constant on each trial. The mean of the distribution B(M) and the standard deviation B(S) are given by

(17.66) B ( M ) = n p

17.6.8.2 Poisson Distribution

The Poisson distribution is used for discrete events and, like the binomial distribution, it applies to mutually independent events. It is used in cases where p and q cannot both be defined. For example, one can state the number of goals which were scored in a football match, but not the goals which were not scored.

The Poisson distribution may be considered to be the limiting case of the binomial when n is large and p is small.

The probability of an event occurring m successive times is given by the Poisson distribution as

The mean P(M) and standard deviation P(S) of the Poisson distribution are given by

Poisson probability calculations can be done by the use of probability charts as shown in Figure 17.29. This shows the probability that an event will occur at least m times when the mean (or expected) value np is known.

Figure 17.29. Poisson probability paper

17.6.8.3 Normal Distribution

The normal distribution represents continuous events and is shown plotted in Figure 17.30. The x-axis gives the event and the y-axis the probability of the event occurring. The curve shows that most of the events occur close to the mean value and this is usually the case in nature. The equation of the normal curve is given by

Figure 17.30. The normal curve

(17.71) y = 1 σ ( 2 π ) 1 / 2 e ( x x ¯ ) 2 / ( 2 σ 2 )

where x ¯ is the mean of the values making up the curve and σ is their standard deviation.

Different distributions will have varying mean and standard deviations but if they are distributed normally then their curves will all follow equation (17.71). These distributions can all be normalized to a standard form by moving the origin of their normal curve to their mean value, shown as B in Figure 17.30. The deviation from the mean is now represented on a new scale of units given by

The equation for the standardized normal curve now becomes

The total area under the standardized normal curve is unity and the area between any two values of ω is the probability of an item from the distribution falling between these values. The normal curve extends infinitely in either direction but 68.26% of its values (area) fall between ±σ, 95.46% between ±2σ, 99.73% between ±3σ and 99.994% between ±4σ.

Table 17.1 gives the area under the normal curve for different values of ω. Since the normal curve is symmetrical the area from + ω to + ∞ is the same as from -ω to -∞. As an example of the use of this table, suppose that 5000 street lamps have been installed in a city and that the lamps have a mean life of 1000 hours with a standard deviation of 100 hours.

Table 17.1. Area under the normal curve from −∞ to ω

ω 0.00 0.02 0.04 0.06 0.08
0.0 0.500 0.508 0.516 0.524 0.532
0.1 0.540 0.548 0.556 0.564 0.571
0.2 0.579 0.587 0.595 0.603 0.610
0.3 0.618 0.626 0.633 0.640 0.648
0.4 0.655 0.663 0.670 0.677 0.684
0.5 0.692 0.700 0.705 0.712 0.719
0.6 0.726 0.732 0.739 0.745 0.752
0.7 0.758 0.764 0.770 0.776 0.782
0.8 0.788 0.794 0.800 0.805 0.811
0.9 0.816 0.821 0.826 0.832 0.837
1.0 0.841 0.846 0.851 0.855 0.860
1.1 0.864 0.869 0.873 0.877 0.881
1.2 0.885 0.889 0.893 0.896 0.900
1.3 0.903 0.907 0.910 0.913 0.916
1.4 0.919 0.922 0.925 0.928 0.931
1.5 0.933 0.936 0.938 0.941 0.943
1.6 0.945 0.947 0.950 0.952 0.954
1.7 0.955 0.957 0.959 0.961 0.963
1.8 0.964 0.966 0.967 0.969 0.970
1.9 0.971 0.973 0.974 0.975 0.976
2.0 0.977 0.978 0.979 0.980 0.981
2.1 0.982 0.983 0.984 0.985 0.985
2.2 0.986 0.987 0.988 0.988 0.989
2.3 0.989 0.990 0.990 0.991 0.991
2.4 0.992 0.992 0.993 0.993 0.993
2.5 0.994 0.994 0.995 0.995 0.995
2.6 0.995 0.996 0.996 0.996 0.996
2.7 0.997 0.997 0.997 0.997 0.997
2.8 0.997 0.998 0.998 0.998 0.998
2.9 0.998 0.998 0.998 0.998 0.999
3.0 0.999 0.999 0.999 0.999 0.999

Column 1 lists the ordinal values of ω and the corresponding values of area are presented in column 2. Interpolation between ordinal values can be achieved in steps of 0.02 by using the remaining 4 columns.

How many lamps will fail in the first 800 hours? from equation (17.72)

ω= (800-1000)/100 = −2

Ignoring the negative sign, Table 17.1 gives the probability of lamps not failing as 0.977 so that the probability of failure is 1 − 0.977 or 0.023. Therefore 5000 × 0.023 or 115 lamps are expected to fail after 800 hours.

17.6.8.4 Exponential Distribution

The exponential probability distribution is a continuous distribution and is shown in Figure 17.31. It has the equation

Figure 17.31. The exponential curve

where

x ¯

is the mean of the distribution. Whereas in the normal distribution the mean value divides the population in half, for the exponential distribution 36.8% of the population is above the average and 63.2% below the average. Table 17.2 shows the area under the exponential curve for different values of the ratio K = x/x¯, this area being shown shaded in Figure 17.31.

Table 17.2. Area under the exponential curve from K to + ∞

K 0.00 0.02 0.04 0.06 0.08
0.0 1.000 0.980 0.961 0.942 0.923
0.1 0.905 0.886 0.869 0.852 0.835
0.2 0.819 0.803 0.787 0.771 0.776
0.3 0.741 0.726 0.712 0.698 0.684
0.4 0.670 0.657 0.644 0.631 0.619
0.5 0.607 0.595 0.583 0.571 0.560
0.6 0.549 0.538 0.527 0.517 0.507
0.7 0.497 0.487 0.477 0.468 0.458
0.8 0.449 0.440 0.432 0.423 0.415
0.9 0.407 0.399 0.391 0.383 0.375

Column 1 lists the ordinal values of K and the corresponding values of area are presented in column 2. Interpolation between ordinal values can be achieved in steps of 0.02 by using the remaining 4 columns.

As an example suppose that the time between failures of a piece of equipment is found to vary exponentially. If results indicate that the mean time between failures is 1000 hours, then what is the probability that the equipment will work for 700 hours or more without a failure? Calculating K as 700/1000 = 0.7 then from Table 17.2 the area beyond 0.7 is 0.497 which is the probability that the equipment will still be working after 700 hours.

17.6.8.5 Weibull Distribution

This is a continuous probability distribution and its equation is given by

(17.75) y = α β ( x y ) β 1 e α ( x y ) β

where α is called the scale factor, β the shape factor and γ the location factor.

The shape of the Weibull curve varies depending on the value of its factors. β is the most important, as shown in Figure 17.32, and the Weibull curve varies from an exponential (β = 1.0) to a normal distribution (β = 3.5). In practice β varies from about 1 3 to 5. Because the Weibull distribution can be made to fit a variety of different sets of data, it is popularly used for probability distributions.

Figure 17.32. Weibull curves (α = 1)

Analytical calculations using the Weibull distribution are cumbersome. Usually predictions are made using Weibull probability paper. The data are plotted on this paper and the probability predictions read from the graph.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978075061195450021X

Intelligent procurement systems to support fast fashion supply chains in the apparel industry

D.A. Serel , in Information Systems for the Fashion and Apparel Industry, 2016

7.2.2 Price-sensitive demand

Assuming negative binomial distribution for demand with a price-dependent distribution parameter, Subrahmanyan and Shoemaker (1996) use dynamic programming to find optimal prices and order quantities in a two-period problem. They also present numerical results for a three-period problem in which lead time for the second order is one period.

Petruzzi and Dada (2001) study a two-period problem in which expected demand decreases linearly in demand. Using an additive demand model, they consider a setting in which orders for both periods are placed before the first period. After observing demand in period 1, the buyer can cancel a part of the order previously given for period 2, or can add new units to this standing order to increase the stocking level for period 2. In addition to these stocking decisions, the buyer also needs to choose the optimal selling price in each period. Petruzzi and Dada (2001) show that finding the optimal solution to this problem can be reduced to a search for one decision variable.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780081005712000075

Multi-Service Systems

Christofer Larsson , in Design of Modern Communication Networks, 2014

10.6 Summary

The three traffic distributions binomial, Poisson and negative binomial (referred to as the BPP family) can all be used to formulate systems that are reversible and have product form. The three traffic types correspond to different values of peakedness, Z < 1 , Z = 1 and Z > 1 , respectively. The binomial and negative binomial distributions can be regarded as extensions of Poisson traffic to allow for lower and higher variance than the mean. The binomial distribution may be used to model smooth traffic and the negative binomial distribution bursty traffic, such as data traffic.

The product form of systems offered Poisson traffic makes it possible to solve for state probabilities efficiently by an Erlang-type recursion (known as the Fortet-Grandjean or Kaufman-Roberts algorithm), or by convolution. Iversen (2013) describes the generalization of these methods to BPP traffic.

We also consider admission control by trunk reservation and suggest a simulation approach, which is relatively easy to implement. These are all models for loss systems.

A different class of models are systems with processor sharing. These represent a hybrid between loss and delay systems, and, even is the models do not have product form, it is possible to formulate iterative algorithms for such systems, see Iversen (2013). In this case, we restrict the discussion to Poisson traffic.

All these models are reversible, which means that they lend themselves to network analysis in a as if independent manner. We may use analytical methods or fixed-point methods to analyse networks consisting of such queues.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780124072381000105

Introductory tables and mathematical information

In Smithells Metals Reference Book (Eighth Edition), 2004

THE NORMAL APPROXIMATION TO THE BINOMIAL

Since a binomial rv, XBin, is the sum of n independent Bernoulli rvs, i.e. then the binomial rv XBin = Σi=1 n Xi, where each Xi is Bernoulli (i.e. Xi has a binomial distribution with the parameter n ≡ 1) with E(X i) = p and V(Xi) = pq. As a result of the CLT, as n → ∞, the binomial pmf can be approximated by a normal pdf with E(X) = np and V(X) = npq. However, because a binomial rv has a discrete range space, a correction for continuity has to be applied as will be illustrated in the following example. The approximation is adequate when n is large enough such that the product np > 15 and 0.10 < p < 0.90. If np < 15, we would recommend the Poisson approximation to the Binomial. The general rule of thumb was that n must be large enough such that n > 25 α3 2 and simultaneously n > 5α4, where for a single Bernoulli trial α3 2 = (q − p)2/(p × q), and α4 = (1 − 3p + 3p2)/(p × q) = (1 − 3 pq)/(p × q). It can be shown that for a Bin(n, p) distribution the value of skewness is α 3 = ( q p ) / npq and the binomial α4 = E[((X − μ)/σ)4] = E(Z4) is given by α4 = (1 + 3(n − 2)pq/npq).

Example 10

Consider a binomial distribution with n = 50 trials and p = 0.30 (so that μ = np = 15). Note that we would like to have np > 15, but 25α 3 2 = 19.05 implies that the requirement on skewness n = 50 > 25α3 2 is easily satisfied and similarly, 5α4 = 14.876 2 shows that the α4 requirement is also satisfied because n > 14.876 2. Then the range space of the discrete rv, XD, is Rx = {0, 1, 2, 3, …, 50}. The exact pr of attaining exactly 12 successes in 50 trials is given by b(12; 50, 0.30) = P(XD = 12) = 50C12(0.30)12(0.70)38 = 0.083 83.

To apply the normal distribution (with μ = 15 and σ2 = npq = 10.50) in order to approximate the P(XD = 12), we 1st have to select an interval on a continuous scale, XC, to represent XD = 12. Clearly this continuous interval on XC has to be (11.5, 12.5), i.e. we will have a correction of 0.50 for continuity. In short, XD = 12 ≅ (11.5, 12.5)C. Thus,

P ( X D = 12 ) P ( 11.5 X C 12.5 ) = P ( 11.5 15 3.240 4 Z 12.5 15 3.240 4 ) = P ( 1.080 1 Z 0.771 52 ) = Φ ( 0.771 52 ) Φ ( 1.080 1 ) = 0.220 200 0.140 044 = 0.080 157.

Since the exact pr was 0.083 83, the percent relative error in the above approximation is

% Relative Error = ( 0.083 83 0.080 157 0.083 83 ) × 100 = 4.382 % .

Next we compute the exact pr that X exceeds 12, i.e. P(XD > 12) = 1 − B(12; 50, 0.30) = 1 − 0.222 865 8 = 0.777 134. The normal approximation to this binomial pr is

P ( X C 12.5 ) = P ( Z 0.77 152 ) = 0.779 80 P ( X D > 12 ) .

The % error in this approximation is 0.343%.

As stated earlier, the normal approximation to the binomial should improve as n increases and as p → 0.50. To illustrate this fact, consider a binomial distribution with parameters n = 65 and p = 0.40. Then, μ = np = 26, σ = (npq)1/2 = 3.949 7, and P(XD = 22) = 65C22(0.40)22(0.60)43 = 0.061 7. The normal approximation to this b(22; 65, 0.40) is P(21.5 ≤XC ≤ 22.5) = P(−1.139 33 ≤ Z ≤ − 0.886 15) = 0.187 77 − 0.127 28 = 0.060 5 with relative % error of 1.935%.

Further, P(XD < 23) = B(22; 65, 0.40) = Σx=0 22 65Cx(0.40)x(0.60)65-x = 0.188 327. The normal approximation to this binomial cdf corrected for continuity (cfc) is P(XC ≤ 22.5) = P(Z ≤ −0.886 15) = Π(−0.886 15) = 0.187 769 2 with an error of 0.296 2%.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780750675093500051

Probability and Stochastic Processes

Sergios Theodoridis , in Machine Learning, 2015

The Multinomial distribution

This is a generalization of the binomial distribution if the outcome of each experiment is not binary but can take one out of K possible values. For example, instead of tossing a coin, a die with K sides is thrown. Each one of the possible K outcomes has probability P 1,P 2,…,P K , respectively, to occur, and we denote

P = [ P 1 , P 2 , , P K ] T .

After n experiments, assume that x 1, x 2,…,x K times sides x   =   1, x   =   2,…,x   = K occurred, respectively. We say that the random (discrete) vector,

(2.57) x = [ x 1 , x 2 , , x K ] T ,

follows a multinomial distribution, x ∼Mult( x |n, P ), if

(2.58) P ( x ) = Mult ( x | n , P ) : = n x 1 , x 2 , , x K k = 1 K P k x k ,

where

n x 1 , x 2 , , x K : = n ! x 1 ! x 2 ! x K ! .

Note that the variables, x1,…,x K , are subject to the constraint

k = 1 K x k = n ,

and also

k = 1 K P K = 1 .

The mean value, the variances, and the covariances are given by

(2.59) E [ x ] = n P , σ k 2 = n P k ( 1 P k ) , k = 1 , 2 , , K , cov ( x i , x j ) = n P i P j , i j .

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128015223000021