now is the time for all good men...

scribble · Jun 7, 2004

The problem is number space. ...

I understand what you're saying. I've done a lot of work with psuedorandom generators.

I'm still confused about your comparing the seed length to four characters, though. Can you explain that?

"The " would consume the entire seed

That is what I am confused about.

evildave · Jun 7, 2004

If a seed of '1' would eventually produce "It was the best of times, it was the...", and '2' would eventually produce "It was a dark and stor...", and '3' would eventually produce "ALL'S ONE, OR, ONE OF THE FOUR PLAYS..."

(Not that anything more than a random snippet would be generated by any of the runtime random number generators. )

The seed length is merely a representation of the number space that a pseudo-random number generator will produce.

If the seed is 32 bits, there are only 4,294,967,296 possible number streams that can be generated by the random number generator, and most of these will repeat after a few million to a few tens of millions of iterations.

The likelihood that (given an exhaustive search) you would find even one paragraph of a book is mind-bogglingly tiny. However, you could perform a further exhaustive search against 'encoding methods' (as the second link explains) to attempt further 'discoveries'.

The problem is, to reproduce a page of text by this method, you would probably need to make a pseudo-random number generator that would have a very, very long 'BIGNUM' span, and it would internal state about the size of a book to hope to produce a book. Of course, if you want to be sure to 'find' a book, a much more certain method would be to clear a book-sized piece of memory, treat it as a bignum, and increment from zero.

I suppose I need a nother model to hold up would be data compression to explain my 'seed' statement.

Basically, from a naiive perspective, we could assume that all you have to do to get a 1 bit long file would be to keep compressing it and re-compressing it. of course, we both know that can't work, because then there would only be two possible books. If we compressed down to 1 byte, there would only be 256 possible books.

In the case of 7-bit ASCII encoding, packing the data down to seven bits would automatically save 12.5% of the file's size. LZ and Huffman encoding will make better strides. At the end, with a text file containing book text, you will always end up with a file between 30% and 50% of the size of the original.

Eventually, something has to 'give' in the lossless encoding of your data. A certain amount is overhead to keep track of things in the file. At some point, the file becomes 'incompressible', because there's nothing more that you can take away from it and still have the original contents recoverable.

Of course, you can still keep compressing the data with other algorithms, but then the list of algorithms and parameters applied that were necessary to compress the data (and would be necessary to decompress the data) would begin to creep up to some portion of the size of the original data.

So, from the perspective of diminishing returns in data compression, your 'randomly derived' book compression could only produce 2 to the 'n' possible books, per the size of the seed.

Sorry this took the long way around, but I often remember the effect, and not the cause, and it takes a while to work my way back around to it from "go".

scribble · Jun 7, 2004

The seed length is merely a representation of the number space that a pseudo-random number generator will produce.

Yeah, I said that:

What's the seed size got to do with it? -- besides determining how many characters you can generate before you've cycled - which is nowhere near three.

I'm just hung up on this that you said:

"The " would consume the entire seed, unless you wanted to break down text to five bits per characters.

evildave · Jun 7, 2004

So, thinking of the seed as compressed data representing the book doesn't help?

I'm at a loss for the moment.

scribble · Jun 7, 2004

I just don't see how it's relevant. I mean, you clearly agree that you can get a string longer than four characters by this method. How are the - I assume by your example - four eight-bit character representations you can fit into a 32 bit seed relevant in any sense? Or six five-bit?

evildave · Jun 7, 2004

So, now let's state that what we're really attempting to do is expand data with the randomatic decompression algorithm.

It is assumed that to compress "BOOK", randomatic did some sort of exhaustive search to find a random number key that, when applied through a pseudo-random number generator, reproduces "BOOK" somewhere in its stream.

Is merely four bytes going to be enough to specify what is needed to extract a "BOOK"?

Well, first off, we probably need to know how many iterations of the random number generator are needed to find "BOOK" within its seeded output.

The number will probably be on the order of 32 bits worth of number to tell us at what point in a random output stream "BOOK" will appear, assuming a 32 bit seed to eventually produce "BOOK".

So far, we're storing 64 bits to store "BOOK". That's twice as much space as "BOOK" took to store all by its self.

So, let's look at "now is the time for all good men". That's fairly concise, but quintillions upon quintillions of times less likely to appear in a stream than "BOOK".

Ignoring indexing, how much data would it take to guarantee that this string will be generated by a general purpose random number generator? I guarantee that it will require more than four bytes. It will probably require an amount of data roughly equivalent to the most packed representation of the data you could hope to produce. A number space log2(27 to the 32nd power) to represent a-z and space to store 32 characters according to the original sample. You will either consume them with your 'customizations' and specifications to the pseudo-random number generator, or you will consume them in the seed.

The data won't pack down past a certain point if you want to guarantee that a particular pattern (i.e. the original text) will appear.

If you want to further increase the odds of hitting a phrase, you can compress the number space much further by creating a standard word lookup dictionary. Assign 16 bits of data for glyphs, words, punctuation, etc.

Now words of a fairly large dictionary of terms can be representable in 16 bits or less. Assuming all the words of your phrase are found in the dictionary, then you will substantially increase the odds of getting longer runs because all the 'bogus' misspellings will be removed from your output.

Further savings might be made by reducing the phrase to an alternate language based on multiple dictionaries per sentence element, with rigorously defined grammatical rules, such that meaningless phrases can not possibly be generated, just as meaningless words could not be when we specified a dictionary.

Once you've gone this far, you might want to consider assigning a number space to whole phrases. After all, if you have a syntactical pattern pattern like verb:noun:target:modifier, you could probably turn most sentences into 64 or 128 bit value(s). Now it's impossible for the random number generator to make any sentence that is not meaningful. Of course, this system could not reproduce something like "now is the time for all good men", because that's a sentence fragment.

Of course, more useful than a polynomial generator, some form of fractal representation of the end data could be better suited. The problem then becomes how to encode a seed and set of function specifications/parameters to reproduce the value "now is the time for all good men". This technique is probably more suitable for approximate, lossy results, such as storing images or audio samples though. After all, you don't need the original, every bit of noise intat, if an approximation will be good enough. For text however, you generally want an exact representation of the original.

Is this more helpful, or more confusing, yet?

scribble · Jun 7, 2004

It's totally irrelevant, Dave. You've changed the example to something where you are assiging a value to the concept "is this a good way to compress data?"

Your answer being, I gather, "no." Which is all well and good, but it's totally irrelevant to the monkeys-at-typewriters simulation. And leaves out the storage of longer strings... like the 15 character strings this project has already churned out. If you could store that in 64 bits, you have to agree it's a huge savings.

So far, we're storing 64 bits to store "BOOK". That's twice as much space as "BOOK" took to store all by its self.

Yes, but that's nothing like the illustration we're working with here. Here we are studying a stream of randomly generated characters to see if "BOOK" turns up. We're not garunteeing it will. That would defeat the purpose of running the simulation.

Well, first off, we probably need to know how many iterations of the random number generator are needed to find "BOOK" within its seeded output.

Or, indeed, if it appears at all. THIS is what we're doing here, and you've not discussed that at all.

You said if we have a 32 bit seed, it could only be enough to represent "BOOK". You're wrong. It represents an index into a table of random sequences, and "BOOK" or even a *much* longer string *could* appear in that sequence.

That represents only the first three or four characters as ASCII code. "The " would consume the entire seed, unless you wanted to break down text to five bits per characters. Then you might get "THE PL".

It's that statement. It's true, you can represent four eight-bit characters or six five bit in that seed. But you don't. The seed isn't the answer, it's the question.

Your hypothetical random-number-compression-routine is irrelevant to the topic, but let me point out a couple of things while we're here. You are saying we search a set of random strings for our selected target string anywhere in their length, and store the seed and offset to the selected data.

If we can do that, why can't we just store a seed that will point directly to our search string, appropriately terminated? If we did that, we wouldn't need any storage aside from the seed itself to represent any sequence our number generator could output.

The *worst* you would have to store is all the state for your random number generator at the point it began your sequence (assuming the "get next number in sequence" is a one-way function and decoding it to a new seed is impossible).

Now, do you feel I misunderstand anything?

varwoche · Jun 7, 2004

AK-Dave said:
I had always heard it as "If you have enough monkeys banging randomly on typewriters, they will eventually type the works of William Shakespeare."
Fortunately, in this age of the internet and programmers with too much time on their hands, there is a way to find out:
The Monkey Shakespeare Simulator

Ah, that's right, I think the version I heard was Hamlet. And was it monkeys or chimps?

Chimps + Hamlet is way funnier than Monkeys + All good men!

varwoche · Jun 7, 2004

Re: Re: now is the time for all good men...

Iacchus said:
Are you speaking of the whole arbitrariness of existence here?

Not intentionally, but I suppose I could claim so in retrospect.

It was just one of those days where it was hard to muster enthusiasm for the normally joyous task of toiling for greenbacks.

Calee · Jun 7, 2004

evildave said:
You still would:

The monkeys would be prone to break the keyboards, eat the keys. Someone would have to feed and clean up after them and make sure they're producing, the keyboards are all still plugged in and working, replace dead monkeys, etc.

Of course, most of the time, the output would be a bit like "ddddddddddddddfdddddeddedddedddgvuief bndddddddeddddsdsdsdffdddddd", as the monkey keeps pressing the same key over and over, missing occasionally in between key smashes.

A bit like the output from fundies, except the monkey doesn't claim it's 'ultimate truth', which puts them a notch up in my esteem.

Was it Letterman who said "I don't know what you would get if you put a million monkeys in a room with a million typewriters, but I DO know that the smell would be horrible"?

roger · Jun 7, 2004

scribble, all he was doing was comparing sizes - not making a mathmatical connection. I agree it's a bit misleading, but dave's underlying logic is fine.

Look at it again by reducing the problem - a 1 bit seed. That means you only have 2 random sequences available to you. Let's assume each sequence reproduces itself at 2^32, giving 16 billion 8 bit characters.

But since you only have 2 seeds, that means you can only generate 2 different books. However, the number of possible books of 16 billion characters is something like 60^16 billion (all caps + all lower case + punctuation).

2 vs 60^16,000,000,000.

You ain't likely to find a readable book in a solution space of 2.

Likewise, a seed of 32 bits will give you 4 billion different random sequences, but that works out to

4 billion vs 60^16,000,000,000.

It's still really just about 0 chance of finding a readable book.

roger · Jun 7, 2004

edit: double post

Skeptical Greg · Jun 7, 2004

I always liked the Redneck version..

Given an infinite number of Rednecks, in an infinite number of pick-up trucks,
armed with shotguns, shooting at an infinite number of road signs..

They will eventually duplicate the works of Shakespeare in Braille..

evildave · Jun 7, 2004

And my simple point has been it's not bloody likely from a fast, simple runtime library polynomial generator to produce much text (or particular text) from a seed.

32 bits of internal state is not enough.

It's not an index, per-se. It's simply a starting point that could yield the string. Possibly there is more than one seed value that will yield a given string at different points, but much more likely there is no seed at all for a simple random number generator that will generate the state you're looking for. There is not enough 'noise' potential in the 32 bit random number generator.

You need better random numbers than the computer simulation is generating, and if it's going to be a deterministic pseudo-random generator, that means bigger seeds and more internal state, and slower random numbers. Keep in mind that it's not truly random at all to start with.

The compression example is entirely relevevant, in that it nicely covers the reason why you're not likely to ever get a whole book from a small seed, otherwise there are only 2^seedsize possible books. Any saved internal state of the pseudo-random number generator will be on a par with the compressed size of the book's data (at best).

The fact that the square root of 2 or pi are non-terminating, non-repeating in no way implies they will *ever* produce what you are looking for. There are lots of ways to not repeat, and not terminate. 1.01001000100001... is a non-terminating, non repeating value. Would that sequence produce a particular book? Of course not.

Perhaps the square root of (or polynomials generated from) some larger prime(s)? Possibly. There's probably a prime number with about 10,000 places that might produce your book if you took a particular root of it. There is enough information in THAT seed to visit a differnet enough variety of numbers to possibly produce a particular book.

A 32, 64, 96, 128 or even 1024 bit random number generator? Not likely. But at 96 on upwards you get a better likelihood for your "now is the time for all good men" snippet to appear.

Now then, the 'monkey shakespear simulator' is most likely a poor example of 'science' in that it is not (being written in java script) turning out anything like the statistics it is generating. For starters, is it supposedly scanning the text of several of Shakespear's works with each random set of numbers it generates? Why is it this tends to always generate lines like "Leonato. I lea" or "GLOUCESTER. N", always at the starts of lines? You'd think there would be a lot of nice, long matches like "nce of the " from the middles of lines. Watch those numbers. Even scannning only the starts of lines, you would have me believe the number of tries grows two orders of magnitude from 10^20 to 10^22 power in a few minutes? In a Java applet?

No, the monkey simulator is something a lot more like Progrss Quest. Believing that the monkey sim is doing what it is reporting to you is on the order of believing that progress quest has a 3D MMORPG with thousands of players hacking away at each other and monsters behind the window.

Anders · Jun 7, 2004

Well, the first post was clearly written to discredit evolution. But that’s not how evolution works. Evolution has a sort of a blueprint, written by life and death, that why we walk on 2 legs instead of four. Germ A mutate tats in to Germ A’, Germ A’’, A’’’, A’’’’, and to forth, you get the picture, right? All these germs dies, but germ A’’’, which is now a little better than his father. Let’s call germ A’’’ for germ B, or simply B. B mutates in to B’, B’’, B’’’, B’’’’, etc, etc. Only B’’’’’’ survives. Let call him, or her, C. C mutates to C’, well, you get the picture!

No one, except, fundies, think that there has been ONE cycle were germ A, to B to C eventfully to Humans, has been done in one single go! There were many trails on the way, where many a germ went down in the battle called evolution.

So, in order to simulate evolution, the first post, is wrong. The correct time would be about 15 minutes (one key-stroke per second), if we want to use the same laws as evolution used.

But very interesting posts on the very complicated issue of true random numbers though.

evildave · Jun 7, 2004

If we're going to get technical, germ A’’’ is not 'better' than germ A. It may only be luckier or slightly more efficient at reproduction. The various germs do die, but generally while most of the other variations still exist. It may take ages for A to die out, unless some specific condition that A is prone not to survive while others are occurs.

Life doesn't do a few serial experiments that die out leaving a 'winner'. It keeps millions of similar experiments going indefinitely.

'Better' is a subjective thing. If A''' went on to overrun the whole habitat, consume all of its food and go extinct, while an 'inferior' A'' had a few stragglers in a pocket somewhere, A'' 'wins'.

It might be a little easier to model with a Texas Hold'Em game. The community cards represent the conditions in the world that all the things share, and the hole cards are the creatures. If you have an Qh3d hole, and the community cards contain AhKhJh10h5s, you clearly have the 'best' hand possible. Would Qh3d be such a great hole if the community cards weren't conducive to a royal flush? It's 'perfect' for a given situation, but any competent human player would judge it a relatively weak hand with poor chances of winning, and they'd probably fold it before the flop.

wildflower1 · Jun 8, 2004

Diogenes -

Given an infinite number of Rednecks, in an infinite number of pick-up trucks,
armed with shotguns, shooting at an infinite number of road signs..

They will eventually duplicate the works of Shakespeare in Braille.

This is hilarious! Thanks!

varwoche · Jun 8, 2004

Anders said:
Well, the first post was clearly written to discredit evolution.

Entirely incorrect.

Anders · Jun 8, 2004

evildave said:
If we're going to get technical, germ A’’’ is not 'better' than germ A. It may only be luckier or slightly more efficient at reproduction. The various germs do die, but generally while most of the other variations still exist. It may take ages for A to die out, unless some specific condition that A is prone not to survive while others are occurs.

Life doesn't do a few serial experiments that die out leaving a 'winner'. It keeps millions of similar experiments going indefinitely.

'Better' is a subjective thing. If A''' went on to overrun the whole habitat, consume all of its food and go extinct, while an 'inferior' A'' had a few stragglers in a pocket somewhere, A'' 'wins'.

It might be a little easier to model with a Texas Hold'Em game. The community cards represent the conditions in the world that all the things share, and the hole cards are the creatures. If you have an Qh3d hole, and the community cards contain AhKhJh10h5s, you clearly have the 'best' hand possible. Would Qh3d be such a great hole if the community cards weren't conducive to a royal flush? It's 'perfect' for a given situation, but any competent human player would judge it a relatively weak hand with poor chances of winning, and they'd probably fold it before the flop.

OK, the different germs A, A'', etc, maybe don't die but they take no more part in that particular germs species, and it's still natural selection.

The first post is maybe not criticism towards the theory of evolution, but it's very similar to the strange and stupid analogy that fundies use: That evolution is as likely as a strong wind would blow together a 747 in a hangar if only the pieces were there. But that theory and the monkey theory do not take in account natural selection, which my modification of the monkey theory indeed do.

canadarocks · Jun 8, 2004

I always liked the BareNaked Ladies song that in part goes...

"If a hundred monkeys each could get their own show, perhaps one day a chimp would say 'we all have faith, you just have to use it, saith the Lord'".

Reminds me of some evangelical preachers i've heard!

now is the time for all good men...

Master Poster

evildave

Unregistered

Master Poster

evildave

Unregistered

Master Poster

evildave

Unregistered

Master Poster

Penultimate Amazing

Penultimate Amazing

Scholar

Penultimate Amazing

Penultimate Amazing

Agave Wine Connoisseur

evildave

Unregistered

Muse

evildave

Unregistered

Scholar

Penultimate Amazing

Muse

Thinker