Techie Snapshot of 20Q
The 20Q program is made up of three integral elements: the game engine, the application of user input to the knowledgebase, and the CGI/HTTP interface. The program is implemented by a sizeable program written in C.
Fundamentally, the knowledgebase is a matrix of objects by questions. Every element in the matrix represents whether the answer is yes or no. It also represents the weighting of the answer to a question for an object; unknown answers have a weighting of zero.
Every time a player answers a question, the URL of the answer contains three pieces of information: the current state of their game, the current question index, and their answer. To generate the next page, the game determines which objects are currently probable by comparing the current responses to the knowledgebase. If an element in the knowledgebase is unknown, the algorithm ignores that response for that object. The program then determines the number of yes and no answers for each question for the probable objects; this may include several unknown answers. The program then chooses the question that has the best ratio of yes-to-no answers (a 50/50 split would be ideal). The actual weightings are also worked into this calculation.
Finally, the game compares the probability of the objects and the effectiveness of the question and decides whether to guess at an object or ask another question (this is aided by some prodding at the appropriate time). This information is used to generate a new page, with new links, to reflect the new state.
At the end of a game, when the program knows the object, the program makes a pass over the knowledgebase and updates the answers for that object. If the knowledgebase doesn't have an answer, the value is set. However, if the answer that the player gave matches the answer in the knowledgebase, the weighting is incremented. If, on the other hand, the answer contradicts, the weighting is halved.
That is the heart of the system. The only other things that need to be done are things like: if the program is stumped, it asks the user for the object they were thinking of and then checks to see if it already knows it. If the program was stumped, or had to ask a lot of questions, or guessed about too many objects, it will ask for a new question. Last, the program looks at objects that are similar to the target object and draws conclusions; the author calls this "spontaneous knowledge".
This description makes Twenty Questions sound more like a database manager, and to a certain extent, that is part of the system. But from another perspective, every question is input into a 2D neural network, the objects are the output side. By performing a large number of simple calculations, the game is tolerant of missing knowledge and contradictory input.