You haven't gotten enough information yet.
These types of trials are highly subjective, not in terms of selection, but in terms of confidence. You need a framing protocol. If you don't use a framing protocol then the person being tested can simply say that it was a bad day (as Mojo predicted). Lack of such a protocol is one of the major cheats used for things like telepathy and clairvoyance.
Instead of buckets I would suggest either plastic or glass containers so that you can tell just by looking at it whether it contains sand or water. I like Loss Leader's suggestion of covering the containers with boxes.
Before you start, you place the containers on the ground at least three paces apart so that there is no confusing which box a hit is for. Then cover them with the boxes. The test subject can clearly see which one has the water. You ask them to check the boxes and see if they can get a hit. If they do then you can continue with the test. You repeat this framing test again when you complete the trial. In other words, if the test subject says that they can detect the water (under the same conditions) before the test and after the test then it would be difficult to argue that it was an off day.
The next problem you run into is bird-in-the-hand selection. Let's say you do five tries in a test run. Let's say the test subject knows the results of each try. After three negatives, the subject can say that they aren't feeling it and need a break. In other words, they can abort a negative run. But let's say that they got a match on the first try. They could continue and do the full test run knowing that they have one success in hand. The aborted negative/continued success is another major way of skewing the results.
This then leads into the issue of cheating. If you don't observe the test subject then they can cheat. However, if you watch the test subject while knowing where the positive sample is located then you might give unconscious cues that help them find it. If the test subject knows the results of each try then they can used aborted negative/continued success to get a higher score. However, if they don't know the result then they can claim that you are cheating. Or if they want to be nice they can claim that you made some alteration that you weren't aware would affect the result. This is where it gets complicated.
The easiest way to handle this would be to use a video camera to watch the test area. This way you could do the test run double blind (the person who knows the location of the sample is not present) without the possibility of cheating. After the test, you can both watch the video to make certain where the sample was located and that the test subject didn't cheat to try to find it. All you have to do is make sure that the containers are visible to the camera when placed and when uncovered. In other words, you don't stand in between the camera and the boxes where you could be doing something out of camera view. The test subject would also have to stay on the opposite side of the box.
If you don't have a video camera you can get a third person to help. One person sets the samples and then leaves. The test subject does the try with the second person present so there is no question of cheating. Then the test subject leaves and the second person checks the boxes and scores the try. Then the second person leaves before the boxes are set again. Typically they would go to the same location as the test subject to make certain that there was no peeking. So, neither the test subject nor the sample setter knows the score. The second person knows the score but does not know where the sample is located. The obvious way this would fail is if the test subject asks the scorer, "How am I doing so far?" The only way of avoiding this is to use a third person to observe the test subject during the sample setup and during the try. Then they both leave before it is scored. This keeps the person who knows the score from interacting with the test subject.
If you don't have extra people to help then you typically resort to disposable test samples. This is what they try to do in the Soapbox Derby except this doesn't actually work because the rules don't allow for swapping wheels that have bad bearings. The use of brand new wheels keeps the contestants from cheating. However, the lack of swapping means that you can end up with a wheel with a bad bearing and there is nothing you can do. In other words, the Soapbox Derby sometimes cheats against the contestants.
The disposable protocol works something like this. You make up a full set for each try. So, if you use six boxes and have five tries you need 30 boxes. You also need at least one more set for the framing protocol. So that is a minimum of 36 boxes if you use the framing boxes twice or 42 boxes if you use a different set for first and last frames. Basically, with each try you use a new set of boxes. Each set would have something like an identifying number so that you can see that they are all from the same set. For the try, the test subject has one sticker to put on a box. You use the same sticker for each set. You collect and set aside each set for each try. There is no reason to score since the sticker shows which was selected. In fact, you don't even have to worry about the double blind protocol if the tester can't tell which box has the positive sample (except for the framing sample which must be known). Then you simply score when you are done. This can be observed by the test subject.
To do another sample run you first put the same sticker on all the boxes in the same location. Then you change to a new sticker for the next test. It's better to use a sticker than using a mark from different colored felt tipped markers because it can be difficult to get the marks exactly the same. The stickers will all be the same. The idea of making the boxes anonymous again by using the same sticker in the same place on all boxes within a set is the Morgiana protocol from Ali Baba and the Forty Thieves.