I don't understand this discussion of "pre-recording" or "matching movements." It may reveal that some posters don't quite understand how this video trick works. Since I have done this years ago, let me try to explain from a different angle. And, yes, it is a trivial effect, but it does require some care to avoid giving it away.
No prerecording is necessary. We will also assume that the camera was on a stand or dolly and any apparent camera movement was artificially added in the control room.
Normal camera, normal output up until Darren steps over to the TV when the camera gets in a medium-wide position, but fixed. At that time, a digital snapshot is taken of a single frame and stored in a buffer.
Live video mixers can accept feed from two sources and blend them into a single output, frame by frame, in real time. It's possible, using a slider, to use input number one (the snapshot) for the left half of the output, then switch to input number two (the continuing live shot) for the right side. The place where the blend occurs is "soft" and as long as the studio light doesn't change, shadows don't interfere, no one makes a wrong move or the camera doesn't get bumped, it can be absolutely perfect and seamless.
During the time the video mixer is combining inputs 1 & 2, anyone walking on the left side of the frame, where the balls are, will be invisible to the output, since the left side image output is not from the live camera but from the snapshot. So a technician rolls on a desk with a set of numbered balls, removes the blank ones in the rack, and places each numbered ball in sequence as he hears it from the TV. As soon as he is done, he rolls the desk with the leftovers off stage and the control room moves the slider back to normal. This is where the one ball appears to jump, as frame N is from the snapshot and N+1, the live feed, and one ball just happened to be placed badly (s. happens).
From then on, it's just a normal camera and TV show.