Error in test from Python probability calculator project

Tell us what’s happening:

I will be glad if I missed something but I believe the following to be correct.

In the test_module.UnitTests.test_hat_draw() method are two tests. The first one asks for an exact response when the results are random. The code below is the actual test method.

Here is a link to the forkable project

The test code from the forkable project

class UnitTests(unittest.TestCase):
    # ...
    def test_hat_draw(self):
        hat = prob_calculator.Hat(red=5, blue=2)
        actual = hat.draw(2)
        expected = ['blue', 'red']
        self.assertEqual(actual, expected, 'Expected hat draw to return two random items from hat contents.')
        actual = len(hat.contents)
        expected = 5
        self.assertEqual(actual, expected, 'Expected hat draw to reduce number of items in contents.')
    # ...

Because lists are ordered it requires exactly that the first random ball drawn is blue and the second random ball is red. In a correctly formed program the first value named 'expected’above will only occasionally be [‘blue’, ‘red’] (2/7 * 5/6) or around 24% of the time. That means (ignoring the ‘contributor’ to the problem described below) a correctly formed program will fail around 76% of the time.

Possible contributor to the problem

The problem is made worse (guaranteeing repeated results in a given system) by the line at the top of the file prob_calculator.random.seed(95). A hard coded seed will generate the same sequence of random numbers (from for example random.randint using the same arguments repeatedly)in a given system repeatedly. That means if tests are run in the same sequence (for example a single unit test repetaedly) and it passes on a system (as determined by the repeated ‘random’ number sequence), it will continue to pass 100% of the time and the same if it fails. The default seed is generated from the current system time. That problem is removed by removing that line of code allowing random to use the default seed.

Addendum regarding prob_calculator.random.seed(95)

All of the above was theory. My actual code seems to confirm.

When running the tests from main and using the uncommented line prob_calculator.random.seed(95) in test_modules, my actual results in the tests UnitTests.test_hat_draw() and UnitTests.test_prob_experiment() never vary and the tests both pass every time, but they shouldn’t.

If I comment it out and allow the default random seed, the actual results vary in both methods (as they should) and sometimes they succeed sometimes fail. The test_hat_draw should pass about 25% of the time. The test_prob_experiment seems to pass about 1/2 the time or more. So the full run from main should succeed about 13% of the time. I got actual 7 out of 50.

After all that, I am waiting for someone to point out the elephant in the room that I misunderstood. If you can confirm or disagree, or made similar observations please post.

Your browser information:

User Agent is: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36.

Challenge: Probability Calculator

Link to the challenge:

Isn’t the point of random.seed(95) to ensure you will get reproducible results during testing?

2 Likes

Greets sanity. I would think that is generally true. But this particular project is a probability calculator and the probabilities are experiental. It relies on a random output to mimic actually picking out balls from a hat.

Yes and it’s calculating results based on certain number of performed experiments. But as that may differ due to the randomness random.seed(95) allows to check, whether in the specific (reproducible) circumstances results are as expected. This is for testing only. If you run calculator outside of testing, the results will vary every time. When main.py is run, first it calculates probability based on 3000 experiments without the seed set, notice that this result is different with each run. Later tests are run separately from the previous results.

1 Like

None of that applies to the specific errors in these tests.

UnitTests.test_hat_draw(self):

The first assert, the hard-coded seed produces a false output. Simply changing the seed value gives a different output every time as if we were testing the seed value instead of the method.

prob_calculator.random.seed(95),
actual = hat.draw(2) is always ["blue", "red"] Test passes every time

prob_calculator.random.seed(96),
actual = hat.draw(2) is always ["red", "red"] Test fails every time

UnitTests.test_prob_experiment(self):

The first assert similarly relies on the seed(95) value to pass the test.

prob_calculator.random.seed(95),
probability = prob_calculator.experiment() with 1000 experiments
probability is always 0.277. Test passes every time

prob_calculator.random.seed(96),
probability = prob_calculator.experiment() with 1000 experiments
probability is always 0.241. Test fails every time

If the test suite did not select a specific seed value for the pesudo-random number generator, then the test suite would not know what specific output to expect from your code. The seed value is required for the test suite to work, but your code should work for any seed value and provide different results for each seed.

Those two tests are invalid or at least they don’t present a meaningful test. A properly functioning program should have varied results from the specific calls in those two tests. Getting [‘blue’, ‘red’] from the draw function, for example, is a fluke that allows the test to pass

The tests are perfectly valid. A properly designed function should have varied results. A properly designed test should have a singe expected result.

The entire purpose behind pinning a seed value for a test suite is ensuring that the output of the function remains unchanged as the code being tested is changed. It is much harder to test pesudo-random results.

1 Like

How can the test be valid if a properly functioning program will fail by changing the seed value?

The test is

“If the seed value is [95] and I do [procedure foo] then the expected output is [output baz]”

If you change the preconditions of the test, then the test should fail.

1 Like

I don’t think I am doing a great job explaining.

There are three ideas going on here

  1. pseudo-random number generation
  2. running code in ‘production’
  3. running code in ‘testing’

Looking at each idea in turn:

  1. Python’s ‘random’ module does not actually create true random numbers. The module takes a seed value (usually system clock time) and passes it through a very complex formula to make a value between 0 and 1 that looks like a random number. Sometimes its helpful to pick the seed value so that we know exactly what number we will get out of the random module.

  2. When we run our code in ‘production’, we are trying to run an actual simulation. In this case, we want to seed the random module with the system time and observe pseudo-random results.

  3. When we run our code in ‘testing’, we are trying to verify that our code is functioning correctly. In this case we want to verify that a known input generates the correct output. When we set the seed to 95, we know the number that will be generated by the random module and therefore we know what the code should do.

1 Like

I see and understand your point. Something is indeed tested , just not the proper random output from hat. Your point that ‘your code should work for any seed value’ is untrue. A properly functioning program will fail both tests with seed(96) set.

Sure. We could make a test to verify pseudo-random behavior, but those sorts of tests are tricky and prone to false negatives. (probability states that they have to eventually be wrong)

What I mean by “your code should work for any seed value” is that your code should work in ‘production’ with any seed value but your code needs to pass the test suite with the specific seed value chosen.

You shouldn’t change the test suite at all.

1 Like

I am having the same issue with this project. I just posted on here as well. Did you find a solution?

Nevermind. I was able to fix the problem.

I just posted, had a similar issue, but I was failing because I was shuffling my contents to get a more random draw, I did not know about the seed.

I know this is an older thread, but I’m having the exact same issue trying to get past this unit test. Everything else in this project passes without issue, but trying to deal with the random seed is proving problematic, to put it diplomatically.

My current draw method – which works as the instructions state – shuffles the contents of the hat and then once randomized takes a slice of the first N balls where N is the number to be returned. A copy of that result is stored as the “answer”, and then the same first N items are deleted from the original contents using a slice.

    def draw(self, numOfBalls):
        if numOfBalls >= len(self.contents):
            answer = self.contents
            self.contents = []
        else:
            random.shuffle(self.contents)
            answer = self.contents[:numOfBalls]
            del self.contents[:numOfBalls]
        return answer

This method works fine in terms of fulfilling the actual specification of the problem, but it doesn’t ‘play nice’ with the random seed in the unit test.

I’ve edited your post for readability. When you enter a code block into a forum post, please precede it with a separate line of three backticks and follow it with a separate line of three backticks to make it easier to read.

You can also use the “preformatted text” tool in the editor (</>) to add backticks around text.

See this post to find the backtick on your keyboard.
Note: Backticks (`) are not single quotes (’).

This post explains what is happening with the different ways of drawing balls randomly.

Thanks Jeremy. I did eventually solve the problem the ‘right’ way. But what was weird is that I was passing the test mentioned in that post you linked. The test I was failing with the shuffle method was whether or not the balls were coming out random. And empirically they were as they were different almost every time. That last unit test - the one you described in the linked post - basically ensures that the balls are coming out random, so if that test passes (which it eventuallywas for me nearly most of the time), that proves that the balls are coming out randomly regardless of the method used.

And for the record, I was having difficulty on that part as well early on and did the math by hand as a sanity check on my program results. That’s why I was so frustrated when I was still failing the ‘randomness’ test even after I was able to solve this problem successfully with the shuffle method. Anyway, the conditional probability works out to exactly 29 out of 110 attempts ≈ 26.36%, and that result can be obtained repeatedly with multiple different draw methods although only one of those draw methods will allow the class and experiment function to pass all the tests because of the random seed effects on the tests.

Maybe a hint should be included in the instructions to not use the random.shuffle() method in the draw function even though that will technically work to randomize the contents?