Okay, several things, mostly off the top of my head.
1) randomization does not control for many of the things I wrote about in my post. I reread it, and realized I left much implied. Any successful experiment must demonstrate that Beth is not moving the flame via physical means. So, for example, breathing on the glass during the test changes the boundry conditions that the air column inside the glass experiences. It is quite feasible that this could alter the flame's movement. Randomization will not correct for this. Please note that my assumption is that Beth is doing this unconsciously, but purposefully. Meaning, one way or another she behaves differently during a real trial vs. the control. She may lean closer, tap the table top because she is concentrating, or may have learned unconsciously how to move the flame.
2) It's hard to opine about this setup without studying what it does. So last night at 1230am I had a candle set in a short glass, and measured the temperature with a very sensitive J-style thermocouple. My set up was NOT the same as Beth's as I believe the glass was shorter, and I didn't have the wax ring. Nonetheless, I believe I gathered some suggestive data.
I believe what happens is air is sucked in from underneath and the side of the flame, gets heated by the candle, and raises above the flame. At the same time, the burning wick is generating gases, which also rise. I established this in 2 ways. First, I took a lot of temperature measurements around the flame. I could place the probe almost in the flame on the side, and I'd get reading in the 140 to 155 range (all temps F). Lifting up the probe slightly would quickly register higher values - up to 180 to 190, but no more. Secondly, the heat caused visual distortions in the surrounding air, and I was able to observe the flow to some extent with my eye.
As you move the probe above the flame, the temps increased rapidly. I backed off at 450F, as I didn't want to damage the probe. I got those readings about an inch above the flame.
Moving about 3 inches above the top of the flame, the readings were very erratic. In one second intervals, you might see: 148 173 179 199 223 257 221 199 178 170 158 ... (that data is completely made up).
In other words, very significant swings. The swings by far exceeded the movements of the flame, which was just dancing around a bit like a candle is wont to do.
Moving the probe just a bit off center, just a 1/2 inch, would result in a 100degree or more temperature drop. Basically there was a very narrow tunnel of hot gases rising, surrounded by a much cooler boundary of cool air.
However, this tunnel was very susceptible to small perturbations in the environment. For example, if I would very lightly tap the counter, the temperature would vary wildly. I didn't measure the force of the tap, but we are talking half way between just sitting down your finger without any force, and the kind of tap you might do unconsciously. This tap was not significant enough to produce visually obvious movements in the flame - sometimes you'd think you see a reaction, but sometimes not.
So, much as I suspected, Beth has constructed a device that vastly amplifies tiny movements in the flame. In a sense, this is good, because it's easier to measure amplified signals. However, the downside is that the amplifier is in no way shielded - it reacts just as wildly to small variations in the physical environment. Furthermore, the gain seems quite underdamped - a tiny impulse signal injected into the amplifier results in prolonged and vacillating output. Finally, the X/Y positioning of the output is rather small, even when the temperature swung wildly right over the flame, it really didn't seem to vary much at all just offset an inch or so.
As I see it, this has at least two negative consequences for Beth. First, if she is exhibiting a real, but very effect, it may very well be swamped by the noise in the system. Second, it makes it much harder to prove that any results are the result of her mind, because, despite the randomization of trials, she will very likely have ways to physically affect the flame unless a much more expensive apparatus is used. In short, with the wide and rapid fluctuations that I was seeing, I could never really tell if what I was doing was affecting the flame, or whether I was seeing the normal variations caused by the flickering flame.
3) Dave' thermocouple idea: I was basically thinking the same thing when I got out my thermocouple last night. If for example, you had a thermocouple in the North and South position, and instructed Beth to either move the flame N, S, or not at all, you might be able to record say 10 seconds worth of readings and average them. I would be impressed with, for example, if by thumping on the table, blowing at the flame, etc, you could never make the N sensor exceed 280F, but when Beth used her mind, she repeatedly was able to reach 300F or better. But if the data is essentially the same as the control runs, and requires significant statistical analysis to extract the results, then I would fear and assume that small physical forces have not been accounted for.
--------------
I'm musing about a different setup altogether. It just came to me, so even if it does work it'll need refinement. But it occurs to me that burning is a typical way to compute the caloric content of an item. So I'm wondering if it would be possible to say put two candles very close together in a chamber used for measuring caloric heat. Only one candle would be lit. Beth's task would be to move the candle towards the unlit candle, so it is partially vaporized and thus increased the caloric energy produced.
Thinking about it, I think it is still too sensitive to initial conditions - each candle will have a different caloric content within a certain rainge, and so Beth would have to exceed that output. I'm guessing her effect, if any, would be swamped by the normal variances of the candle.