Assuming a magic number generator like in Robin's example or my example, and assuming you press the button and receive a value of 123, what is the unconditional probability that you would receive a 123? What value would you estimate it as, and why?
(rocketdodger actually said "1", not "123". I changed it to a number that can't be a probability, to make the discussion below perhaps clearer. Nothing essential depends on which number is used.)
There are two 'levels' of probability here, which should be distinguished.
One is what you're calling "the unconditional probability that you would receive a 123". This is supposed to be a real property of the machine, with a single definite value, between 0 and 1, which we happen not to know. Let's call this number "
X". (The value of
X determines, for example, how often, over the long term, the machine will display 123 rather than any other number if we repeatedly press its button.)
Now, according to the Bayesian approach to probability, whenever we are uncertain about something, we can use probability to describe the nature of our uncertainty. Let's forget for the moment that
X is a (different kind of) probability, and just treat it as a number about which we are uncertain. Then, we can say things like, for example, "P(0.3 <
X < 0.4) = 0.2" (translation: "there's a 20% probability that
X is between 0.3 and 0.4"), where
this probability is
not an objective property of the machine, but only a way of characterizing our limited knowledge of the machine.
X doesn't 'really' have a 20% probability of being between 0.3 and 0.4, or any other probability either. It is a single number, which either is in that range or not. We just don't know which. But we might have some ideas about which, and we quantify these ideas by saying that it has a 20% probability of being there.
Of course, 0.3-to-0.4 is just an example, and we can talk about the various probabilities that
X has of being in various other ranges as well. All these probabilities together constitute a 'probability distribution for
X'. Again, a probability distribution for
X is not an objective property of the machine, but only a way for us to express, as precisely as we can, our more or less vague ideas about the machine. ("We don't know exactly which number
X is, but here's a summary of what we do know about
X.")
A probability distribution for
X contains a lot more data than a single 'estimate' of
X (to quote rocketdodger's question). From a distribution, we can, if we wish, derive various single estimates---for example, the mean (or the median, etc.). But we can't forget the entire distribution, and just remember the mean, if we want to be able to change our ideas about
X appropriately when we get new information relevant to
X---for example, when we press the machine's button and see what number it displays.
The appropriate way to change our ideas about
X is to use Bayes's theorem to produce a new distribution for
X from the old distribution. The old mean alone is not enough to enable us to produce even a new mean, let alone an entire new distribution.