Because it's too complicated, and because it's unnecessary.
Why are tests that confirm or discount a theory "unnecessary"? Why are they not very necessary in this, of all cases? Hell, even if it cost $100M to perform these tests, wouldn't you want to see the results? Especially if they demonstrated the official theory to be right?
The tests were not run to failure because that would destroy the test cell. The furnaces are not designed to have multi-ton steel structures collapse upon them. Cleanup after such a test is hazardous. Remember, this is not a tiny benchtop experiment we're talking about -- each of the short-span full-scale truss tests involved roasting a structure bigger than my house.
Yet, NIST cites the failure of these trusses as "imminent." As if that word were some kind of satisfactory criteria of the truth of their theory. What criteria is "immanent" calibrated against? Does it mean it will fail in 5 minutes under the identical conditions? 40 minutes? We do not know because they do not define "imminent". Again, we must take their word for the results in the report.
There's no need to run the tests further, anyway. Elementary structural mechanics will tell you what will happen next to a high degree of accuracy. The difference between the stopping point in the UL furnace tests and what happened in the WTC Towers is whether or not the diagonal elements buckled. Up until that point, the NIST models can be verified against the truss tests in terms of displacement as a function of steel temperature, and they were, and the fit was excellent.
Except in the NIST lab tests, the trusses did not fail, and in the simulations, they did fail. The only meaningful difference imaginable. There are some definite problems with the interpretation of the data here.
All of the computer models created were verified (that means "tested against known cases") against simple test cases, such as the NCSTAR1-6C tests. They were also tested for sensitivity and validated (that means "tested to show the correct behavior within a bounding envelope of conditions") independently, by varying each of the input conditions and verifying the effect of each, giving us a way to estimate the error for any given test case. The structural models, in total, are simply not all that sensitive to the kinds of errors you suppose. Minor variations in heating or even in displacement are not significant to the overall structure.
But when we examine the actual model they used to demonstrate the failure of the truss systems, we see that it is dependent on heating the entire structure to 700C--a premise for which there is absolutely no corroborative evidence.
“A floor section was modeled to investigate failure modes and sequences of failures under combined gravity and thermal loads. The floor section was heated to 700 °C (300 °C at the top surface of the slab) over a period of 30 min…. Figure 6–11 shows that the diagonals at the core (right) end of the truss buckled and caused an increase in the floor system deflection, ultimately reaching approximately 42 in.”(96)
Not only is there no corroborative evidence of this, but in another statement, they clarify that
“
The use of an ‘average’ gas temperature was not a satisfactory means of assessing the thermal environment on floors this large and would also have led to large errors in the subsequent thermal and structural analyses. The heat transferred to the structural components was largely by means of thermal radiation, whose intensity is proportional to the fourth power of the gas temperature. At any given location, the duration of temperatures near 1,000 °C was about 15 min to 20 min. The rest of the time, the calculated temperatures were near 500 °C or below.”(127)
So, given 1000C for 20 minutes, there is no way the floor systems could have entirely heated to 700C for 30 minutes, yet they combine these data as if there were no contradiction between them, in order to produce the lynch pin of their position: that the floor trusses sagged from the heat and initiated the "global collapse."
Does this sound like a consistent integration of data?
Regarding the fireproofing, that's a totally separate issue. It is difficult to be certain of the fireproofing damage. This is why NIST used conservative estimates of the damage, and also ran two fire cases for each tower, verifying the results against the many observations including extent of "hanging objects," appearance and progression of inward bowing at exterior walls, and measured lean of the tower superstructures. What the results show is that there is a minimum of fireproofing damage expected, but that within a wide envelope of performance, the fine details really make little difference.
Sure, that sounds fair. But if anyone asks me to take their results as truth, but is unable to produce a test to back them up because "it would be too costly or difficult," then they are, in fact, asking me to take some results as gospel.
It's certainly possible that I may be at fault for misunderstanding the data. Stranger things have happened, but so far, it is the
structure of the NIST argument to which I take objection. (And that
is my background.)