I think the best way to start would be an XKCD comic strip on the same topic
Computers were designed to be deterministic in nature. If I ask what's 1 + 1, a computer always returns 2 and nothing else. This is an important and non negotiable prerequisite for us to rely on computers. This sheer predictability of a computer's behavior makes them such a powerful tool that today's generation can't even imagine a world without them.
On the other hand, random sequences are a starting point for a lot of statistical analyses. Even for developing machine learning models, we generally start with random numbers and iteratively build from there. Personally, I have been asking computers to generate random numbers since quite sometime now. And as far as my tiny brain can comprehend, they appear to be doing a decent job of it, because every-time I ask it to give me a random number... well, it gives me a different number and it appears random to me.
The words deterministic and random are pure antonyms of each other, yet somehow deterministic computers are able to generate random numbers. This weird combination makes me curious about the inner workings of this seemingly banal but widespread problem of today's computing world. With this new found curiosity, I started researching a bit more about this problem and turns out that a lot of people have worked very hard over the years so that we can make computers churn out random numbers ( or seemingly random numbers ... will delve into the details shortly )
Let's start by understanding the concept of randomness. Randomness loosely means unpredictability. The more certain we are about something, the less random it appears to be. As an example, Let's say someone who I don't know asks me to guess his/her name. I can take a guess but I can never be certain about the answer. I can start with a list of most common names in my head but I can never be sure unless someone / something tells me the answer. It's a situation with large degree of randomness and almost zero certainty. Anything can be the correct answer and I have absolutely no idea about it. On the other hand, if I am in a park and have a sudden urge to find which direction north is, then I'd be much more comfortable. Based on the approximate time, ( if it's morning or evening ) and the knowledge that - sun rises in the east and sets in the west, I can point to a fixed direction and say that's north. There would still be some randomness in it because north may lie a bit to the left ( or to the right ) but in this situation, there is a lot more certainty and significantly less randomness.
Coming to the case of numbers, a number is truly random only when all the numbers have an equal probability of arrival ( i.e. we don't have any idea of what the next number would be exactly ). Also, the concept of random numbers is meaningful only when we have to generate numbers repeatedly. For e.g. if we need to generate only one number, then 4 (or any other number) is a perfectly good answer. It is only when we have to repeat the same task again and again, the concept of randomness comes into picture. If our random number generator is programmed to always give 4 as an answer, then it's not random anymore. I can always predict with fair certainity what the next number would be. My experiments that use this random number generator would end up being biased, because the results will not have details about what'll happen when 6 comes ( or any other number ).
For a random number generator to be perfect :
Like a lot of things, Internet seems to be divided on the topic. This reddit post proves that humans don't do a very good job of generating random numbers. According to the post, If a human being is asked to generate a random number between 0-10, then there is a very high probability that the answer would be 7 and a very low probability that the answer would be 0. So even though the numbers would be random, they won't be uniformly distributed. While an answer on Quora suggests otherwise. The author explains that if we apply some post processing on top of human generated numbers, then the results are fairly acceptable.
PS: I'll start my clock around the time of WW-2 because that's when computers were invented and we are primarily interested in how computers do the job.
While researching for this article, I came across mention of a book from 1950s called A Million Random Digits with 100,000 Normal Deviates that has 400 pages filled with random numbers. If you are interested to flip through it, then a .pdf version is freely available on the publisher's website(linked with the name). That was a time unlike today where electronic computers were just being created and internet hadn't been invented yet. Acquiring information took considerable amount of effort. If someone had to do statistical experiments, then this book was a lifesaver because then it'd be one less thing off the list for them. They'd already have an acceptable list of random numbers. With availability of the information solved for, speed still remained a bottleneck. I can only imagine how slow those experiments would have been if someone had to do manually copy huge lists of these random numbers before starting an experiment.
A team of scientists who were working on the Hydrogen bomb in late 1940s at Los Alamos National Laboratory, came across an interesting problem. In scientific terms, it was related to neutron diffusion in fissionable materials. In simple terms, they were trying to estimate - Once a neutron enters a fissionable material, how will the subsequent chain reaction look like. A self sustained chain reaction would mean an explosive release of energy in a short span of time, so understanding it was critical to the research. The involved calculation consisted of the following components:
Being a secret, this work required a code name. A colleague of von Neumann and Ulam, Nicholas Metropolis, suggested using the name Monte Carlo, which refers to the Monte Carlo Casino in Monaco where Ulam's uncle would borrow money from relatives to gamble.
Ok coming back to the topic at hand. Using a precompiled lists of random numbers was extremely slow. A person would have to manually feed the list to the computer and at that point of time, people used punch cards to feed data, so feeding data was a slow and complicated process, It was then when von Neumann developed a way to calculate pseudorandom numbers ( I'll talk about it in the next section, loosely speaking it's a number which is not random in reality but gives the appearance of randomness ). He called the algorithm middle-square method (again coming below). Though this method has been criticized as crude, von Neumann was aware of this: he justified it as being faster than any other method at his disposal, and also noted that when it went off it did so in an obvious fashion, unlike methods that could be subtly incorrect. Later on, a paper was published describing the algorithm. In that paper, von Neumann wrote the famous words - "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin".
The above story has been picked from the below mentioned sources. Some phrases have been blatantly copied over while others have been created by my understanding of the story.
I think by this time you would have realized that true random numbers cannot be generated by arithmetic algorithms, once we know the algorithm, we know exactly what the results would look like. Using tables of random numbers was effective but slow. Additionally the available tables at that time were of limited size thus becoming a bottleneck in scaling of the experiments, so von Neumann did the next best thing. He created an algorithm which facilitated 2 important points.
The algorithm used a simple mathematical operation to generate n digit random numbers.
I couldn't figure out what numbers von Neumann used as seeds in his experiments but based on my experience, nowadays, we generally use either a fixed number from our whim ( for reproducibility ) or something fluctuating ( if reproducibility is not necessary ). The most common fluctuating quantity used is time ( more specifically the number of milliseconds elapsed since 1970 ).
As mentioned above the algorithm was not too good. It got stuck in cycles too quickly and more than often the sequences decayed to 0. Below is a toy PRNG based on this algorithm for generating random numbers of 4 digits. You can move around the slider to adjust the seed, and you'll see how short the cycles are. For e.g. if you start with the seed as 3792, you'll see the sequence repeating immediately, i.e the chain is of length 1 ( which just the seed as an element), whereas if you select 6239 as your seed, then the cycle starts after 111 numbers.
Some statistics on how the chain lengths vary by increasing the number of digits.
Sl No. | # digits | Minimum Chain Length | Maximum Chain Length | Average Chain Length |
---|---|---|---|---|
1 | 2 | 1 Sample Seed : 10, 60 | 15 Sample Seed : 42, 69 | 6 |
2 | 4 | 1 Sample Seed : 2500, 3792 | 111 Sample Seed : 6239 | 44 |
3 | 6 | 1 Sample Seed : 495475 | 943 Sample Seed : 229339 | 327 |
Middle square method is now a relic of the past. There are new algorithms that have been developed since then. Wikipedia has a list with some description on each. Interestingly in 2017, a modification was suggested to the middle square algorithm which helps the algorithm get rid of its follies. Here's a link to the paper describing the modification.
Computers cannot generate randomness, however they have access to signals which appear random to us. Using these signals, a pseudo random number generator can produce numbers which look completely random. There have been numerous algorithms over the years to perform this activity. First one was called Middle Square Method which was developed while researching the hydrogen bomb. Coincidently that project also led to popularizing Monte Carlo simulations ( which in general relies on random number generators to function ). The algorithm was not very good because it kept on getting stuck in cycles, i.e it kept on generating same sequence of numbers after a while, but it ultimately paved way for research on computational PRNGs.