W.D.Clinger
Philosopher
NIST's 3 parameters vs femr2's 11 parameters: preliminary report, part 2
Three days ago, I used femr2's data for the NW corner of WTC 7 to compare the accuracy of my reverse-engineered version of his Poly(10) model with NIST's nonlinear, nonpolynomial model. Now that I have the correct coefficients for Poly(10), it's time to repeat that calculation. While I'm at it, I'll report on three more versions of NIST's model.
As explained in the first part of this report, I'm using an objective measure for goodness of fit between model and empirical data: the residual sum of squares.
[size=+1]Translation between NIST's and femr2's time scales[/size]
NIST and femr2 use different time scales. NIST chose its time scale so t=0 would correspond to a certain event that NIST interpreted as close enough (for government work) to the beginning of collapse initiation. femr2 chose his time scale so t=0 would arrive 11.8785 seconds before the moment that femr2 interprets as collapse initiation.
To compare NIST's models with femr2's data, we need to know how to translate between NIST's time scale and femr2's. That translation is expressed by a signed number: the offset that must be added to NIST's time scale to obtain the corresponding moment on femr2's time scale.
That offset is needed only to compare NIST's models with femr2's data. For femr2's models we know there is no offset, simply because femr2's model and data are known to use exactly the same time scale.
In what follows, I will refer to the time offset as to (with the subscript "o" abbreviating "offset"). Although the value of to is likely to be close to the value of femr2's T0=11.8785s, those two numbers should not be confused with each other: to describes the translation between NIST's time scale and femr2's, whereas T0 marks the left endpoint of the training set for femr2's models.
[size=+1]The models.[/size]
Poly(10)
femr2 has chosen to model the vertical displacement (position) of WTC 7's north wall by a polynomial. His Poly(10) model uses a polynomial of degree 10. That model has 11 parameters (not the 13 parameters I had assumed before femr2 explained that his polynomial of degree 10 describes the vertical displacement, not the acceleration).
All 11 of the Poly(10) parameters were tuned to describe femr2's measurements for the NW corner of WTC 7, as extracted from the Dan Rather video, during an interval of time that runs from approximately 12 to 17 seconds on femr2's chosen time scale. I will refer to that data as the training set for the model.
NIST original
As I have already explained, NIST's nonlinear model has three parameters. As stated in NCSTAR 1-9 section 12.5.3, NIST's values for those parameters are
A = 379.62
λ = 1/0.18562
k = 3.5126
To obtain those values, NIST used its own measurements of features on the north wall.
In addition to those three parameters, we need to know the value of to. Estimating by eye, I had guessed that to was about 10.9 seconds, but a more careful calculation suggests that to=10.85 seconds produces better results when used with the parameter values shown above.
NIST alternative 1
If we use femr2's data as the training set, we get different values for those parameters. In the first part of this ongoing report, I calculated such values mostly by hand. Now that I have written a little computer program to help with that task, I'm getting different numbers. Using the same training set that femr2 used for his Poly(10) model, I get
A = 422.14
λ = 1/0.17874
k = 3.645
NIST alternative 2
The parameters of NIST alternative 1 assume to=10.85s. Although to is not a parameter of NIST's model (because its definition involves femr2's time scale), we might get better results by tuning our estimate of to along with the three parameters A, λ, and k. Doing so, I get
A = 434.41
λ = 1/0.17978
k = 3.50956
to = 10.96s
NIST alternative 3
The choice of training set can have a profound effect. Although models that have many parameters tend to be more sensitive to the training set, its influence is easy to see even with NIST's relatively spare 3-parameter model. To demonstrate that influence, I used the interval between 11 to 13 seconds to estimate the following values:
A = 1626.4
λ = 1/0.18412
k = 3.73668
to = 11.69s
Note that the above value for to is very close to the value femr2 selected for his T0.
[size=+1]And the winner is...[/size]
It turns out that the most accurate model depends upon the interval of time used for the comparison.
NIST's models are considerably more accurate than femr's near the beginning of the collapse (from 11 to 13 seconds on femr2's time scale), but are a great deal less accurate near the end of femr2's data (at 17.2 seconds on femr2's time scale).
Here are some examples. (Because the residual sum of squares is a measure of error, lower scores are more accurate.)
| 11 to 17 s | 11 to 13 s | 15 to 17 s
Poly(10) | 2008 | 1978 | 12
NIST original | 10858 | 1315 | 7372
NIST alternative 1 | 1946 | 743 | 452
NIST alternative 2 | 1760 | 703 | 346
NIST alternative 3 | 18269958 | 52 | 18034838
If you're more interested in learning what happened near the beginning of the collapse than in what happened several seconds later, then the NIST-style models are more accurate for that purpose. If you're more interested in learning what happened during and after the period of acceleration at approximately 1g, then femr2's Poly(10) model is more accurate for that purpose.
[size=+1]Explanation.[/size]
The choice of training set makes a big difference. femr2 tuned his Poly(10) model on the NW corner data beginning at 11.8785s and continuing to 17.1171s.
Note also that we're evaluating the accuracy of femr2's Poly(10) model on data that are almost (but not quite!) identical to its training data.
As Myriad explained so well:
When we evaluated femr2's Poly(10) model on the interval running from 15 to 17s, the evaluation data were identical to the data for which Poly(10) was tuned. Of course it's going to do well.
When we evaluated femr2's Poly(10) model on the interval running from 11 to 17s, more than 80% of the evaluation data overlapped with the training data. You'd expect it to do extremely well, but it didn't score quite as well as NIST alternatives 1 or 2, both of which were trained on almost exactly the same data as Poly(10).
When we evaluated femr2's Poly(10) model on the interval running from 11 to 13s, we found that it is somewhat less accurate than NIST's nonlinear model as described in NCSTAR 1-9 section 12.5.3, and is considerably less accurate than versions of that model that have been trained on the same data as Poly(10).
Why? Part (but not all!) of the explanation is that femr2's Poly(10) model has too many parameters, which makes it overly sensitive to its training set. The Poly(10) model is spectacularly accurate on the evaluation data that overlap with its training set, but is so inaccurate between 11 and 12 seconds that it loses out to NIST's 3-parameter model (when that model and Poly(10) use essentially the same training set).
NIST alternative 3 demonstrates the same principle. That model is very accurate for the interval on which it was trained, but is spectacularly inaccurate for the collapse as a whole.
[size=+1]Future work.[/size]
NIST was attempting to model the movement of the entire north wall, not one specific NW corner of that wall. The accuracy of its approximations when applied to other features on the north wall remains to be determined, as does the accuracy of femr2's Poly(10) model.
Three days ago, I used femr2's data for the NW corner of WTC 7 to compare the accuracy of my reverse-engineered version of his Poly(10) model with NIST's nonlinear, nonpolynomial model. Now that I have the correct coefficients for Poly(10), it's time to repeat that calculation. While I'm at it, I'll report on three more versions of NIST's model.
As explained in the first part of this report, I'm using an objective measure for goodness of fit between model and empirical data: the residual sum of squares.
[size=+1]Translation between NIST's and femr2's time scales[/size]
NIST and femr2 use different time scales. NIST chose its time scale so t=0 would correspond to a certain event that NIST interpreted as close enough (for government work) to the beginning of collapse initiation. femr2 chose his time scale so t=0 would arrive 11.8785 seconds before the moment that femr2 interprets as collapse initiation.
To compare NIST's models with femr2's data, we need to know how to translate between NIST's time scale and femr2's. That translation is expressed by a signed number: the offset that must be added to NIST's time scale to obtain the corresponding moment on femr2's time scale.
That offset is needed only to compare NIST's models with femr2's data. For femr2's models we know there is no offset, simply because femr2's model and data are known to use exactly the same time scale.
In what follows, I will refer to the time offset as to (with the subscript "o" abbreviating "offset"). Although the value of to is likely to be close to the value of femr2's T0=11.8785s, those two numbers should not be confused with each other: to describes the translation between NIST's time scale and femr2's, whereas T0 marks the left endpoint of the training set for femr2's models.
[size=+1]The models.[/size]
Poly(10)
femr2 has chosen to model the vertical displacement (position) of WTC 7's north wall by a polynomial. His Poly(10) model uses a polynomial of degree 10. That model has 11 parameters (not the 13 parameters I had assumed before femr2 explained that his polynomial of degree 10 describes the vertical displacement, not the acceleration).
All 11 of the Poly(10) parameters were tuned to describe femr2's measurements for the NW corner of WTC 7, as extracted from the Dan Rather video, during an interval of time that runs from approximately 12 to 17 seconds on femr2's chosen time scale. I will refer to that data as the training set for the model.
NIST original
As I have already explained, NIST's nonlinear model has three parameters. As stated in NCSTAR 1-9 section 12.5.3, NIST's values for those parameters are
A = 379.62
λ = 1/0.18562
k = 3.5126
To obtain those values, NIST used its own measurements of features on the north wall.
In addition to those three parameters, we need to know the value of to. Estimating by eye, I had guessed that to was about 10.9 seconds, but a more careful calculation suggests that to=10.85 seconds produces better results when used with the parameter values shown above.
NIST alternative 1
If we use femr2's data as the training set, we get different values for those parameters. In the first part of this ongoing report, I calculated such values mostly by hand. Now that I have written a little computer program to help with that task, I'm getting different numbers. Using the same training set that femr2 used for his Poly(10) model, I get
A = 422.14
λ = 1/0.17874
k = 3.645
NIST alternative 2
The parameters of NIST alternative 1 assume to=10.85s. Although to is not a parameter of NIST's model (because its definition involves femr2's time scale), we might get better results by tuning our estimate of to along with the three parameters A, λ, and k. Doing so, I get
A = 434.41
λ = 1/0.17978
k = 3.50956
to = 10.96s
NIST alternative 3
The choice of training set can have a profound effect. Although models that have many parameters tend to be more sensitive to the training set, its influence is easy to see even with NIST's relatively spare 3-parameter model. To demonstrate that influence, I used the interval between 11 to 13 seconds to estimate the following values:
A = 1626.4
λ = 1/0.18412
k = 3.73668
to = 11.69s
Note that the above value for to is very close to the value femr2 selected for his T0.
[size=+1]And the winner is...[/size]
It turns out that the most accurate model depends upon the interval of time used for the comparison.
NIST's models are considerably more accurate than femr's near the beginning of the collapse (from 11 to 13 seconds on femr2's time scale), but are a great deal less accurate near the end of femr2's data (at 17.2 seconds on femr2's time scale).
Here are some examples. (Because the residual sum of squares is a measure of error, lower scores are more accurate.)
Poly(10) | 2008 | 1978 | 12
NIST original | 10858 | 1315 | 7372
NIST alternative 1 | 1946 | 743 | 452
NIST alternative 2 | 1760 | 703 | 346
NIST alternative 3 | 18269958 | 52 | 18034838
If you're more interested in learning what happened near the beginning of the collapse than in what happened several seconds later, then the NIST-style models are more accurate for that purpose. If you're more interested in learning what happened during and after the period of acceleration at approximately 1g, then femr2's Poly(10) model is more accurate for that purpose.
[size=+1]Explanation.[/size]
The choice of training set makes a big difference. femr2 tuned his Poly(10) model on the NW corner data beginning at 11.8785s and continuing to 17.1171s.
Note also that we're evaluating the accuracy of femr2's Poly(10) model on data that are almost (but not quite!) identical to its training data.
As Myriad explained so well:
It is unusual, but the reason for it is easily understood.
A polynomial model of a short time series with eleven coefficients is essentially a lossy compression of the data itself. The curve fitting procedure acts as the compression algorithm.
When we evaluated femr2's Poly(10) model on the interval running from 15 to 17s, the evaluation data were identical to the data for which Poly(10) was tuned. Of course it's going to do well.
When we evaluated femr2's Poly(10) model on the interval running from 11 to 17s, more than 80% of the evaluation data overlapped with the training data. You'd expect it to do extremely well, but it didn't score quite as well as NIST alternatives 1 or 2, both of which were trained on almost exactly the same data as Poly(10).
When we evaluated femr2's Poly(10) model on the interval running from 11 to 13s, we found that it is somewhat less accurate than NIST's nonlinear model as described in NCSTAR 1-9 section 12.5.3, and is considerably less accurate than versions of that model that have been trained on the same data as Poly(10).
Why? Part (but not all!) of the explanation is that femr2's Poly(10) model has too many parameters, which makes it overly sensitive to its training set. The Poly(10) model is spectacularly accurate on the evaluation data that overlap with its training set, but is so inaccurate between 11 and 12 seconds that it loses out to NIST's 3-parameter model (when that model and Poly(10) use essentially the same training set).
NIST alternative 3 demonstrates the same principle. That model is very accurate for the interval on which it was trained, but is spectacularly inaccurate for the collapse as a whole.
[size=+1]Future work.[/size]
NIST was attempting to model the movement of the entire north wall, not one specific NW corner of that wall. The accuracy of its approximations when applied to other features on the north wall remains to be determined, as does the accuracy of femr2's Poly(10) model.
As I said, quite an interesting insight into your personality.
