I have no idea about the heritability numbers, but whoever did this study is banking on people knowing very little about intelligence testing. WAIS-III, the test that I am most familiar with, and one of the most respected and widely-used IQ tests out there, gives a score with error margins. A 3-point error margin would be a minimal margin, and it's not that uncommon to see 10 point error margins, or higher, depending on which confidence interval one uses, and the degree of accuracy in the administration of the test. Furthermore, the same person being tested twice with this test would, on average, have scores that are different by about 5 points. If that person had been trained in the kinds of tasks used in the test in between the two tests, the difference would be even higher. So to me, it's ridiculous to claim that IQ tests are precise to the point that they won't change more than 3 points.
Edited: Forgot to add, there is, in fact, a documented 3-point difference in IQ scores between Americans as a whole and Canadians as a whole, discovered during the original norming process. The average Canadian IQ is 103, not 100. Given that the two cultures have similar genetic makeup and variety, share a language and mostly share a culture, I do wonder how the authors of the article would like to explain this one away. The way the difference is used in practice is to ignore it, because really, the usual error margins more than cover the difference, anyway, and the only time it becomes important is when the score is on the borderline of a cutoff that will determine what kinds of services someone receives.
You're ignoring the principle of aggregation and how that affects margins of error.
Sure, study one person's IQ and it might vary by 3 or 5 or 10 points over time, tests and testers.
Study 1000s of people classified on some dimension (e.g., race) and this becomes far less of a problem. Whether one is studying IQ or agression, any test measuring one or the other construct would be validated by using lots of people and not one or two.
Even the the classical model of reliability (invented by Spearman, coincidentally!) assumes this: A person's observed score on a test = his or her true score, plus error.
Errors can be positive (artificially inflate your score) or negative (artificially reduce your score). For any single person, the error score might be very large, but test large groups of people and the error scores become normally distributed with a mean of zero!
In fact, one can use the idea of error to make a fairly ingenious (imo) prediction: If the black white gap is real, then regression to the mean should occur differentially for black and white parents re their kids IQ's. That is, people selected based on an extreme IQ score should have kids whose IQ regresses back to the race group mean IQ.
The original study referenced in the OP (rushton and jensen) reviews lots of data on this issue.
Match two couples (one white; one black) on SES, income, education, etc, and IQ. The key is to match on extreme IQs to then observe how the kids' IQ regresses to the mean.
So, some matched couples will have IQ's of 120; others IQ's of 70.
For the 120-IQ parent group, their black kids' average 100 for IQ; wheras, the white kids average 110. So, regression to the mean is twice as strong for black kids, because the black kids are regressing to their group mean of 85-- a longer distance away; whereas, white kids are regressing to their group mean of 100-- a shorter distance away (and hence less regression).
Just the opposite happens for matched parents with IQs of 70-- regression up to the mean is twice as strong for the kids with white parents!
The black kids' IQ is now 78 (half the difference up to their true mean) and the white kids IQ is now 85 -- more regression for whites because the distance to their true mean (100) is now farther.
Any gouldists out there who can suggest an environmental or cultural explanation that accounts for this pattern of regression to the mean?