Playing With Perl 5.10

Friday, February 8, 2008 by Mistlee

Playing With Perl 5.10

By A.P. Lawrence

There is a simple mistake I make frequently with random numbers. I'll use Perl to illustrate, but trust me: I can screw this up in just about any language. And I have...

To make it both easy and obvious, let's say we want to randomly choose one of three things. Again to make it easy, we'll say that the integers 0, 1 and 2 will work for whatever it is we have in mind. So here's some code:



The "x" method is strongly biased toward the value of "1". That's because rand() returns values between 0 and 2 (but never 2), and int() always "rounds down" - it returns "1" for any number greater than 1 and less than 2. That biases the output toward 1. If that doesn't make sense to you, pretend for a moment that rand() could only generate integers or the .5 value between two integers - in other words, it could produce 0, 0.5, 1.0, and 1.5 only for rand(2) and 0, 0.5, 1.0, 1.5, 2.0 and 2.5 for rand(3). Here's what happens:


Do you see it now? Because int() doesn't round up as many of us seem to think it might, the x value clusters around "1"

I've carelessly and unwittingly made that mistake many times. I know better, but I find myself doing this quite often. No doubt it comes from a desire to "round up", even though that's completely unnecessary for the task at hand. The most likely explanation is that seeing "int()" triggers memories of rounding and overrides my sense of what I'm actually trying to accomplish. I'm a bit on "automatic pilot" there, letting my subconscious fill in the details, and it wants to add .5 when it sees "int()".

When you are trying to choose three or more items, this mistake does damage, but limits it to the lower and upper elements. For example, the output when we use :



Only 0 and 5 get shorted. Not great, but at least we do get some "randomness" (biased as it is). However, when limited to just two elements, we may not even notice the mistake at all because it's unbiased. To see that, this time pretend that rand() can only pick in increments of .25:


So if just picking betweeen two choices, the mistake is harmless.. but because it is harmful for larger sets, you really shouldn't get into the habit.

Comments

*Originally published at APLawrence.com




News Archives About Us Feedback PerlProNews.com About Article Archive News Downloads WebProWorld Forums iEntry Advertise Contact Jayde

0 comments: