Here are my thoughts on these questions since Bill is absent.
1. For a mound-shaped distribution, what is a decent way to estimate the standard deviation from the range?
Assuming approximate normality, take the range/4 for small data sets, or range/5 or range/6 for larger datasets.
2. If you were to do a test to see if Passing and Failing an examination were independent of Before and After applying a drug, what is the name of a test you would use?
It is like a chi-square test for independence, but it is looking at Before and After, so it is actually a McNemar's test.
3. If you wanted to see if a digital blood pressure cuff's results can be used in place of a traditional blood pressure cuff's results (ie. are "the same"), what statistic/test would you use?
It is an error to use the correlation coefficient (because one cuff could always register 5 points higher than the other. There would be perfect correlation, but the cuffs wouldn't be reading the same at all). It is also an error to use a t-test by itself or in a regression. I did my oral exam on this topic, it is called a Overall Concordance Correlation Coefficient, and looks at the expected value of the squared difference between measurements. Lin (1989 in Biometrics and later) did a lot of work on this subject.
4. Is it possible to have a two-sided alternative hypothesis for a test using the chi-square distribution?
Yes. When testing variances, for example.
5. Is it possible for degrees of freedom to be a non-integer?
Yes. The Satterwaithe approximation.
6. P(x) = e^(a+bx)/[1+e^(a+bx)] is a linear model, true or false.
True. The linear part refers to how the parameters (the a and b) enter in the model.
7. If two individual 2x2 tables in a chi-square test for independence showed non-significance, but the combined 2x2 table (ie, literally table 1 + table 2) showed significance, how would you interpret that?
You usually go with the individual tables, because the combined tables ignores effects. For example, if we were looking at a combined 2x2 table of gender and education, we'd be ignore the gender effects of male and female.
8. Why is testing if the population correlation is 0 equivalent to testing if the population slope coefficient is 0?
In simple linear regression, b = r*(Sy/Sx), so testing if b=0 is the same as testing if r=0.
9. If we only observe the outcomes from coin flipping (Heads = 1, Tails = 0): 1000100010101, what is the most sensible estimate of the probability of Heads? What general mathematical technique would you use here?
5/13 would be a sensible estimate. Call each Head, a sucess, p, and each tail, a failure, as 1-p. From our string of 1's and 0's, our likelihood function (multiplying them all together) is p^5*(1-p)^8. Differentiating this and setting it equal to 0, yields p^=5/13. (of course, you still have to show it is a maximum)
The question arose of 'What do we do if we get a string like 111111, or 0000? Does the above Maximum Likelihood estimation method break down? There is something called a Wilson estimate that overcomes this. It says that instead of estimating p as X/n, where X is the number of successes, it estimates p as p=(X+2)/(n+4).
10. If I am studying bugs in a statistics class, what statistics am I studying?
BUGS is a computer program for Bayesian inference Under Gibbs Sampling.
11. What was the name of the person who gave an example of the correlation coefficients be the same for multiple sets of data, but their plots looking completely different?
Anscombe. A very well-known example.
12. If X's are distributed normally, what do you do to them to get to a log-normal distribution?
e^X.
13. If X and Y are individually distributed as chi-squares, what do you do to them to get to a F distribution?
(X/v1) / (Y/v2), where v1 and v2 are their respective degrees of freedom.
14. If X and Y are individually distributed normally, what do you do to them to get to Cauchy?
X/Y
15. How do you transform between a similarity measure and a distance measure?
It is called the Standard Transformation. If you have objects A and B, with distance measure d(A,b) and similarity measure s(A,B), then:
d(A,B) = sqrt(s(A,A)-2s(A,B)+s(B,B)).
16. If X's are distributed as an Exponential, what do you do to them to get to a Double Exponential?
Take their difference, like X1-X2.
17. Are partial derivatives important in asymptotic normality? If so, how?
Yes. They are used in calculating the variance.
18. What are the conditions for a distribution belonging to the Regular Exponential Family?
It is from an exponential family if we can write its probability function as:
f(x;theta) = a(theta)*h(x)*exp{SUM b_j(theta)*R_j(x) (j=1 to k) }
It is regular if:
a) k=p (where p is the number of parameters)
b) theta contains a p-dimensional rectangle
c) b_j(theta) are differentiable
19. How do you determine if a statistic, T(x), is sufficient for estimating, say, the parameter, theta?
Take the ratio f_x(x,theta)/f_T(T(x),theta), and see if theta cancels out.
20. Why is ancillarity important in theoretical statistics?
Basu's Theorem for one (determining independence knowing some other conditions)
21. Canberra and Bhattacharyya formulas are used for...?
Distance measures. There are many others too, like Euclidean, city block distance, and various types of squared distances.
22. What is a sensible plotting technique in a repeated measure analysis?
Profile plots.
23. Why is the F distribution called an F distribution?
Fischer.
24. Let's say forty subjects are randomly assigned to four treatment groups, ten to each group. Three responses are measured on each subject. What specific distribution would you use to draw inferences about the differences between means in the different treatment groups?
An F-distribution, with 3 and 36 degrees of freedom. This is from multivariate statistics, using Hotelling's T^2. In general, use F with q and n1+n2+n3+n4-q-1 degrees of freedoms, where q is the number of responses, and the n's are the number of people in each group.
25. Who worked with the quincunx?
Galton.
26. Does this make sense: "The probability of the population mean being in the interval [26.4, 28.8]mg's is 95%"?
Yes, but only in a Bayesian interpretation.
27. What statistician had a ladder?
Tukey. (used in transformations)
28. What are the differences between a confidence interval, prediciton interval, and a statistical tolerance interval?
Ugg, too long to answer this one completely.

But confidence intervals are used to say that we are so and so confident that the parameter is between so and so endpoints. A prediction interval is used to say that we are so and so confident that a future value of the response is between so and so endpoints. A tolerance interval is used to see if a process is out of control or not, like a machine producing weights that are outside so and so endpoints hints that the process needs to be examined.
29. Is a MVUE accurate, or precise?
Both. It is minimum variance and unbiased.
30. For what distributions is range/SD >= sqrt(2) ?
All of them. I found this in my notes from a theory class (without proof). I'm trying to prove it, but am having a hard time.
31. How do you interpret: "The Pearson correlation coefficient of age and gender is .96."
You interpret it as an error.

The Pearson correlation coefficient can only be done on two continuous variables. Gender is not a continuous variable.
32. What is the average of a dataset with n elements if you repeatedly take a 1st level Winsorized mean?
I think the maximum of the dataset.
33. Why do we worry about S (the sample standard deviation)? Why not just focus on S^2?
Because it gets us back to the original units of the data.
34. If X_i = K*Y_i+C, then what is the mean and the standard deviation of X?
("_i" is a subscript)
E[X_i] = E[K*Y_i+C] = K*E[Y_i]+C
V[X_i] = V[K*Y_i+C] = K^2*V[Y_i], so the standard deviation is sqrt(K^2*V[Y_i]) = K*SD[Y_i]
35. What is the general name for the mean of the means?
The grand mean.
36. Show mathematically that for a standard normal random variable Z, that the variance of Z is 1.
Properly, it is showed by an integration. But, I'll do it just using a standardized variable:
Z = (x-mu)/s, so
V[Z] = V[(x-mu)/s] =
1/s^2*V[x-mu] =
1/s^2*V[x] = s^2/s^2 = 1.