Send Close Add comments: (status displays here)
Got it!  This site "www.robinsnyder.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website.  Note: This appears on each machine/browser from which this site is accessed.
Distribution simulation
by RS  admin@robinsnyder.com : 1024 x 640


1. Distribution simulation
Simulating a known distribution can be important.

This page looks at how one can simulate a distribution.

2. Distribution
For simplicity, let us use the following distribution. At one time, the typical distribution of a plain package of M&M's was as follows.

3. Distribution history
Try performing an Internet search to see how the distributions have changed over time.

See, for example, https://qz.com/918008/the-color-distribution-of-mms-as-determined-by-a-phd-in-statistics/ (as of 2020-04-17).

4. Binning
The process of binning provides a set of values and a count of those values in each bin. Let us take the above distribution percents and operationalize it with the following binned list.

To see how to take raw data and summarize the data to binned form see Summarizing data : The M&M Problem .

5. Binned data
Here is the example binned data.

The total of the counts is needed.

6. Count the values
One way to get the total counts of the items in the (binned) dictionary is to use the reduce function (from package functools).

The first argument to reduce is the lambda function that takes the cumulative value x and the key key1 - since the second argument to reduce is a dictionary.

The third argument to reduce is the starting value 0 for the binary reduction operations - since 0 is the identity element for binary addition.

7. Approach
For each random selection, the count of total times, 20 (above), and of individual items needs to be known. A random number from 0.0 to 1.0 is drawn.

Then one goes through the list until past the cumulative percentage is greater than (or equal to) the random number.

8. Example table
Here is a table of individual and cumulative contributions.
6=0.300  6 0.300 4=0.200 10 0.500 4=0.200 14 0.700 2=0.100 16 0.800 2=0.100 18 0.900 2=0.100 20 1.000


9. Code to find the color
Here is the code to go through the dictionary of counts.

The color color1 at the exit of the for loop is the color selected.

10. Technical point
A technical point: The random number generated using rand1.random() is greater than or equal to 0.0 but less than 1.0. Thus, the for or while loop used will always exit with a break and never complete.

This is one place where, otherwise, one might use the else part of a for or while loop in case there was no break that case could be handled. Here is the Python code [#4]

Here is the output of the Python code.


11. Efficiency notes
Note that going through the dictionary many times can be inefficient. But the dictionary approach can be useful if there are not too many items in the samples and data sets and one wants to make it easy to have a very dynamic way to generate samples from data sets.

12. Numpy arrays
One way to achieve some machine efficiency is to convert the Python dictionary to a numpy array.

Note that such efficiency only becomes measurably apparent as the size of the dictionary increases and/or the number of simulated samples increases.

Note that the following code shows more than what is needed to show various ways of processing and transforming dictionaries and lists.

13. Python dictionary
Here is the code to show the Python dictionary.


14. Python list
Here is the code to convert and show the Python lists from a Python dictionary.


15. NumPy array
Here is the code to create and show the NumPy array from a Python list.

The above distribution simulation code is then added but using the NumPy array rather than the Python dictionary.

Here is the code to go through the array of counts in valueArray1 to simulate the samples.

The position spos1 at the exit of the while loop is the index of the color selected.

Note the use of the potentially infinite while loop since, if the code is correct, the loop will always break. Here is the Python code [#9]

Here is the output of the Python code.

Note that what appears to be a list structure in NumPy is actually an array but displayed using square brackets. For large arrays, NumPy will attempt to display the first and last few elements and use some shortcuts to make the text more compact.

16. End of page

by RS  admin@robinsnyder.com : 1024 x 640