Am I right about the problem at least? Probability calculator

I’m rather tired and can’t quite figure out how to fix it yet, but am I right that the reason I am getting the wrong output for the probability calculator is that when I write:

if(all([j in contents for j in experiment_list])):

then the fact that certain strings appear twice in the lists means that it gets ticked off as true when it shouldn’t?

Link to code

I’m getting the output of 0.37 when it should be 0.272. The fact it’s that close is what makes me think the problem is with the duplicate strings. But if I cast the lists to sets, that won’t work because I need to test for duplicates as well.
I don’t want to be told the solution, but I’d really appreciate it if I knew I was on the right track. And then I can sleep on it!

If you suspect your test is the problem (it’s part of it), then log the positive tests and compare the expected and actual values with something like

            print("supposed positive draw")
            print(f"expected: {contents} actual: {experiment_list}")

which printed results like

supposed positive draw
expected: ['blue', 'blue', 'green'] actual: ['green', 'green', 'green', 'blue']
supposed positive draw
expected: ['blue', 'blue', 'green'] actual: ['green', 'green', 'green', 'green']
supposed positive draw
expected: ['blue', 'blue', 'green'] actual: ['blue', 'green', 'green', 'green']

which should all be failures. So, your test is not yet correct. I would start by expanding the list comprehension into a loop and then inserting print() statements to track the actual execution of the loop.

Hi - thanks for the help. I now have a loop that picks out the elements in common. It works correctly, according to print() statements. I compare that to the contents list with an equality operator after sorting the lists. That should be picking out only the times all the elements appear in experiment_list. I’m pretty darn sure it is, so the problem lies in my draw function. I altered it so that it resets the contents list to original contents every time it is called, so I’m not getting the skewed results of a reduced list, but that isn’t working either. My random statement seems OK to me - it pops a random list element n number of times and adds it to the list that is returned. That is correct. Completely stumped. There’s something obvious happening, and I have no idea what it is.

You’ll need to log your original experiment list that was drawn to see the problem as there are still trials like

good exp: ['green', 'green', 'green', 'blue'] contents: ['blue', 'blue', 'green']

which is not a good draw, which is expected since you are overcounting right now. Logging just the modified exp list hides this detail. You probably need to consider some way of counting the items in each set and then deciding if the experiment meets or exceeds the requirements.

There are easier ways to implement the copying functionality but I’m not convinced yours is incorrect.

Wait. The hat contains 3 blue, 2 red, 6 green at this point, so why is 3 green one blue a bad draw? If they are being taken by random (well, pseudorandom) in groups of 4, that is a possible draw, surely? The contents list is a bit irrelevant since that doesn’t change, they’re set by the test module.

I was using your variable names and contents should be the required balls and the exp or experimental_list was the draw. The code called the draw good, but required 2 blues and 1 green. The draw had 1 blue and 3 greens so the greens matched but not the blues and it should have been marked as a bad draw. exp was being processed from experimental_list and was obscuring the problem. The contents of the hat are irrelevant since they are sufficient to allow a successful draw to occur. A bad or good draws is determined by comparison of the drawn balls to the required list of balls.

Regardless, an overcount means the code is including false positives from somewhere and as it stands it still looks like the code that is determining good and bad draws is the problem. I would break that part out as a separate function so that you can more easily debug it in isolation.

No, actually. exp is not the same as the draw, which is experiment_list. exp only has the items in experiment_list that match the items in contents. So, if contents is [‘blue’, ‘blue’, ‘green’], exp can only contain up to 2 blues and up to 1 green and nothing else. Did you try running the code and printing the contents of exp? I don’t think you’ve read it properly.

However, I agree - there are false positives coming from somewhere. Just stumped as to from where. I have broken down the code in multiple places and it works correctly in those micro steps. Something is going wrong when it is all put together.

No matter, I’ve found the problem. Thanks for helping talk me through it : )

EDIT: nope. I am still getting the wrong probability but by just 0.02, but I went through the output and manually checked and it’s correct. 2 blues and one green if they appear in the draw. Infuriating. I am tempted to ditch the lot and write from scratch in case I missed something.

That’s over 1000 trials, so that’s 20 trials different than expected. I checked the last version of your classifier and now it seems to work, so that means the draw() method is the problem now. The problematic code is here

        for i in range(0, balls_to_draw):
            # fails
            # ball_list.append(self.contents.pop(int(random.random()*len(self.contents))))
            # ball_list.append(self.contents.pop(math.floor(random.random()*len(self.contents))))
            # works
            # ball_list.append(self.contents.pop(random.randrange(len(self.contents))))
            ball_list.append(self.contents.pop(random.randint(0, len(self.contents) - 1)))

The failing methods are not guaranteed to generate uniform random numbers, which could skew the results slightly. The other two are actually the same and are nearly uniform and were probably how the original probabilities were calculated. The documentation has more details.

I see. Random.random() would work in different situations, but here where I am drawing for a specific situation, and from 4 int values, it isn’t working. I’m not sure about the difference between random.randint and random.random as both are distributions, but I guess it’s because .random() needs to be converted to int.
Thanks for the explanation.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.