Question about sets and random lists

For the problem below, I am practicing the set() data type. The first problem was easy, so I wanted to try using randomly generated lists to practice something more difficult.
Unlike the first program, which returns {1, 2, 3, 5, 8, 13}, my randomly-generated list program returns solely set()

Why would this be? Is there a condition of the random import that I’m not understanding?
Code is below.
Thanks!


import random

#write a program that returns a list that contains 
# only the elements that are common between the lists (without duplicates).
a = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
new_set = set(a).intersection(b)
print(new_set)

#perform same program but with randomly generated lists
list_a = random.sample(range(100), 5)
list_b = random.sample(range(100), 5)
random_set = set(list_a).intersection(list_b)
print(random_set)


that happens if both of list_a and list_b have no common elements. So your program returns empty set.

I was lucky and for me your code generated the same eleemtn for both list_a and list_b, and result is following:

import random

#write a program that returns a list that contains 
# only the elements that are common between the lists (without duplicates).
a = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
new_set = set(a).intersection(b)
# print(new_set)

#perform same program but with randomly generated lists
list_a = random.sample(range(100), 5)
print(list_a) #[15, 63, 71, 59, 48]
list_b = random.sample(range(100), 5)
print(list_b) #[68, 42, 96, 63, 97]
random_set = set(list_a).intersection(list_b)
print(random_set)#{63}

try to run your code more times, you will probably bump into similar situation.

It is just not high probability for your code to have many common elements for both sets

To increase this probability you can either increase size of sample or decrease range

A mathematical question follows then: what is the probability that the 2 lists have no common element?

My reasoning is like this: The first number in list a is randomly drawn from 100 numbers, of these 100 numbers, 5 of which are in list b, and the probability that the first number is not in list b is 95/100. As the random.sample function draws sample without replacement, the second number in list a is randomly drawn from 99 numbers, of which 94 are not in list b, given list b does not contain the first number. And the probability that both numbers are not in list b is 95/100 x 94/99. With the same logic, the probability that all 5 numbers in list a are not in list b is 95/100 x 94/99 x 93/98 x 92/97 x 91/96, which is roughly equals to 76.96%.

Do I get it right?

I am not competent enough in probability theory to answer this really.

The above statements are based more on common logic, if you will, not on some prob-s formulas.

However, I think your thoughts can be checked by some simulation.

One could wrap the whole code into a loop. With, lets say 10000 steps. then run it. And count number of iterations where both lists have/or not have common elements.
Also, could count different stuff - how much times both lists has 2 common elements for example.

The higher will be number of steps - the more precise stats can be gathered.

I have run some simulations on it, based on up to 1 million trials, and the resulting proportion of having no common elements is around 77%, but the figures are not very stable, even different million times simulations can have differences as large as 0.5%.

1 Like

Well, I guess 77% is close enough to 76.96%, and we can say that your reasoning above was correct.
Figures won’t be super stable with simulations.

Probably this code is not the best example, but, as far as I’ve learned, complexity of probability maths grows very rapidly if apply it to more complex problems… This statement based on lecture from MIT open course ware
Sometimes it is more reasonable approach: to run a simulation to get not ideal result instead of actually applying formulas.

1 Like

When I rethought about it, I recognized it is exactly the calculation of lottery probabilities. A search of Wikipedia yields the general formula of calculating all alternative matching odds:

COMBIN(K, B) * COMBIN(N-K, K-B) / COMBIN(N, K)

where N is the number of balls in lottery, K is the number of balls in a single ticket, and B is the number of matching balls for a winning ticket. And COMBIN is combination function, where COMBIN(n, k) = n! / k!(n - k)!

Here N = 100, K = 5, the probability of getting no matching number does equal to about 0.7696,
have 1 matching number: 15917725 / 75287520 ~ 0.2114,
have 2 matching numbers: 1384150 / 75287520 ~ 0.01838,
have 3 matching numbers: 44650 / 75287520 ~ 0.0005930,
have 4 matching numbers: 475 / 75287520 ~ 0.000006309,
have 5 matching numbers: 1 / 75287520 ~1.328e-8.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.