The HD and other leakages and stimulus are rather minor points
By low-pass filtering, taking the incremental skin value, delaying, using a wide sampling window upon the skin resistance input, he is conditioning that input so as to decrease the variance of each subject and leveling it between each subject.
This is tantamount to selecting the subject(s) for a particular characteristic.
He knows that he must do better than what is predicted by the GF. By conditioning the skin readings to resemble the simulation model (monotonic response) his manipulations will not work against him in that area, or be obvious in the data. That's the base line from which he must get a favourable result.
Because the input is now less variable, or tamed as I earlier put it, he is now in a better position to reliably apply any pattern of behaviour, additional stimulus, (or carried over mental state) to his advantage.
One means may be to keep the average level of stimulus high, so that the already processed skin readings are compressed to within an upper band (the response will 'top out' at some point).
If he uses small runs, the subject will relax between them. The GF effect will be present at the start of each run, and the response to the image will be higher than in a longer run, where the subject becomes fatigued, inured, bored, etc.
If the time constants in the input filtering are long enough, then there will be inter-symbol interference caused by information carried from one image to the next. The events are no longer independent, and no longer in agreement with the behaviour that would be attributed to GF. He can exploit this difference to his advantage.
For a more visceral explanation, take a look at Fig 2 of Experiment one. The response begins to climb just before the image goes off at t=3, and peaks roughly 1.5 secs later.
Where would the presentiment of the photo going off be? Does the future show us only the leading edge of discrete-time events?
If the change at t=3 is the result of the start of the image, but delayed, then wouldn't you expect something to happen around t=10? Couldn't that output carry on to the next trial at t=18? Wouldn't any post-image stimulus then have an effect?
Keep in mind that this chart represents the change in skin resistance, not the absolute level, and at no time is the actual measured voltage reset to a reference value.
Summary:
By filtering and controlling the subject's response at the analogue and sampling levels, he gets a reliable approximation that he may use in his simulations, and expect in practice. From that point, he finds a means of manipulation that serves his outcome, and tests it on the simulator. This manipulation is built into the experiment (which indeed looks contrived), slack is added to allow some real-time tweaking, expected criticism is diverted to matters of statistical bias, and the magic trick is done. (Look out for the faux accuracy).