The data is ephemeral in the sense that often native speakers are not sure of which utterance they think sounds correct. Many factors enter into the judgements of grammaticality by informants who are not familiar with the problems of descriptive grammars. Some obstacles are purely sociological or anthropological while others are purely logical.
The data are not similar in nature to physics data which are hard measurements or the product of calibrated instruments.
Here are some interesting nuggets to ponder:
John "Haj" Ross:
Just one simple example from the above,
sloppy identity:
1) John scratched his arm and Bob did too.
How do we predict ambiguity?
2) John scratched his arm, and Carol did too.
3) John scratched her arm, and Carol did too.
Can informants easily and reliably tell us which sentences are acceptable for them, and analyze the ambiguities?