Reply to Diogenes vis-a-vis a Wellfed definition
Wellfed's ATTEMPT to define "differences" --
>
3. The improvements are; higher resolution, mildly improved dynamics, improved image separation, lower level events in the mix become more perceptable, lyric intelligibility is improved, the room acoustic become more prominent. All things I would equate to an improved Signal to Noise ratio.
--------------------------------------------------------------------------------
This is hardly a " vaguely worded, imprecise, and incomplete description "..
<
It is not only all of the above, but also factually incorrect and internally illogical, and in direct contradiction to actual engineering concept of "Signal to Noise Ratio".
There is especially no such engineering or scientific term as "lyrical intelligibility". We can only guess if Wellfed means "lyrics of vocals, sung by a singer" or, in the musical definition, "lyricism related to the flow of melody."
Even here, his terms start to break down into possibilities that must be narrowed.
That is the PROBLEM.
In audio engineering, and in digital audio encoding and reproduction, we have heirarchies of differences that have been very thoroughly defined though of course, ultimately, language is Platonically imprecise, ultimately reduced.
A general working construct = an accepted term.
So, we can define "differences" by instrumental means, irrespective of human judgment, using metric analyzers.
Subjectivists reject this; but due to the various uncertainties I just explained, only the carefully controlled measurements and collections of data can satisfy an engineer. Aesthetic hearing analogies and allusions are VERY INTERESTING, and often are sought for by the very same engineers as they refine audio equipment and techniques -- as I said, long ago in this odyssey, DBT'ing is not used at every step of audio development. Nor is ultimate semantical reductionism.
The AB test generically, or specific ABX methodology, is a synthesis of both the aural, and the instrumental, measurement process.
It replaces 'metric analyzer' with 'hearing'. Because of the problems I elucidated about human cognition, the examples must be HIGHLY CONTROLLED and VERY NARROW IN CONTEXT.
It is indeed possible for trained listeners to hear fine distinctions of tone that are DIFFICULT to measure and quantify by simplistic metric processes using what I might call one-dimensional data point acquisition.
For the metric analysis to be as richly complex and sensitive as human hearing, it has to test for MANY variables of sound, not "individual cycles of pressure".
Recent developments in audio metrics include three-dimensional spectral plotting, FFT analysis, false-color data baseline dispersion, and so forth. These multi-dimensional tests and displays of data points are much more flexible and information than most of the practical ways I used, say, in 1976: looking a the meter on a Sound Tech analyzer while checking only (say) THD, IMD, of frequency response.
Some responsible and advanced engineers and investigators do claim that, at present, metrics cannot match the ultimate cognitive ability of certain hypertrophically sensitive listeners but make up for that in repeatability!
If we decide to go another route and just make a very simple declarative definition of what a difference is, eliminating any of the things that Wellfed proposed, we could instead summarize a difference as being "what Wellfed says, if he reports that two CDs do not sound identical." This puts us at the precipice of another slippery slope, though. But it does avoid a debate on "engineering definitions" versus "literary allusions" versus "poetic analogies", versus "what one, or many, musicians say", versus "what Harry Pearson says in ABSOLUTE SOUND", and so forth.
We get closer to "the unknown black box" this way, merely acknowledging that Wellfed must be able to spot his privately held, nonverbal mental conception of a difference.
Let us propose that Wellfed agrees to report the state of a test of a pair of CDs this way:
If he hears a difference, qualities of which he needn't divulge to us, in alternating a pair, he yells "DIFFERENCE!" And he then tells us which CD has been treated. He must, in recognizing the difference, also mentally conclude "one entity is better than the other" and infer that "the better one has been treated": he says, for example, "CD A - treated".
We compare with the secret score and see if there is a corrrespondent.
THIS is the kind of test that I think you, Gr8wight, and others want to conduct.
The problem now becomes the steps of the protocol testing methodologies with respect to "hearing". It is a basic requirement of the test that Wellfed has to be able to hear the pairs; so whether anybody likes it or NOT, we add in another component: human cognitive faculties and variabilities, and tendency to experience systematic and random errors. Sorry!
In ABX, Wellfed stands a *chance* of somewhat systematically recognizing both "DIFFERENCE" and "better or worse, inferring which one was treated and which one was not".
In his PROPOSED protocol, he stands no chance of recognizing those differences, systematically and repeatably, unless the test CD pairs have differences per Beleth's speculative alteration: i. e., unless the differences are really, ultimately, non-subtle and therefore what I would describe as being "tautologically unsuitable for the test" since ALL can hear them, every time. If GSIC is non-subtle, it is non-controversial, and Randi wouldn't have mentioned it and poked fun of it in his Commentary, and Wellfed would not have wanted to challenge him about it.
So, it follows that what has brought us all together here is this: THE DIFFERENCES ARE SUBTLE, and "non Belethlike" per her example.
Believe me, I've been to subjectivists' homes and experienced their reveries at high end audio stores. I have to keep my opinions to myself, and be sociable. I see self-delusional convictions constantly being played out.
And, true, SOMETIMES I seem also, in these social non-blinded tests, to be able to detect interesting fine distinctions and nuances that I suspect might disappear in critical testing: I experience my own aural delusions!
As we progress on an infinitely-variable sliding scale of differences, from "Ultimately Subtle, Platonically True but Below Any Detection", to "Non-Subtle, and Obvious to All", we increase our confidence in recognizing effects repeatably. And at some point we end up at the Belethlike state of difference: "obvious to all potential test subjects". Of course, we also have to define the numerical limits of the test subject set; do we mean "obvious to every single person living on the planet?" or "obvious to everyone immediately involved in the test?" etc. -- we have to agree on limits here to make the test practical.
As I see it, we could test by having (a) Wellfed use his actual comparative test process, sloppy though it is, but only after all parties agree on an EXTENSIVE and completely vetted set of criteria for "differences" (which he has to keep in mind during the test, and not get confused and inconsistent about); or (b) we have to use AB or ABX type testing and suppress Wellfed's definition of difference, and allow it to be a nonverbal concept in his mind; or we could permit variations in a and b to exist, with adjustments made for weighting the significance of whether or not he can identify and quantify the differences, which only help him to make his FINAL binary choice anyway!
Wellfed may prefer to increase his chances of success by reifying discrete differences and concentrate ONLY on one factor.
He could try "holistically", ignoring individual differences but only trying to think "what's better, what's worse" but without a control, I don't see how he can succeed in making that distinction. He has to have a "control" -- a neutral sample. He has to have a treated disk and an untreated disk, and hear ALL THREE SIGNALS: control, treated, and untreated. He actually knows which is the control in one way of doing this. He matches the two unknowns against the control, and then decides "better than the control, or worse than the control, or equal to the control."
Ooops! This turns out not to be precisely applicable to a dichotomous test of alleged GSIC effect, due to the intrusion of the control. How does the control differ from the non-treated disk? Well, we could ALLOW for the control to BE the non-treated disk. It is played twice in three different sample of each pair of disks: once as "the control", then again one time secretly and randomly, as one of the two alternated samples of sound that Wellfed must try to compare. Then, Wellfed tries to match up the following test examples and see which differs from the control.
For a protocol to succeed, "ten correct positives out of ten tries" cannot be the rule to measure pass/fail, using hearing and a human test subject over time. [Note: this is PianoTeacher's practical assertion, which may be falsified. Go study neurophysical testing and read results of a lot of audio tests and start to integrate that information into a process to test the assertion...]
No matter what Wellfed tries to do, and no matter how good a listener he is, even under ABX'ing with "non-Beleth-like" CD pairs (i. e., ones with small magnitude differences in total information content) he can't achieve 10 out of 10. With Beleth-disk pairs, he might -- depends on Beleth's lossy copying scheme for the altered disk.
And his chances of hearing subtleties and eliminating false positives increases if he changes from a dichotomous test, to a test that compares two unknowns against a control.
I think that in (a) above, the preponderance of accumulated knowledge of audio testing asserts today that there is no chance that Wellfed can succeed, no matter how exquisite his definition of difference has been refined in our mutually-agreeable process: because the temporal controls are weak or nonexistent, Wellfed loses concentration and begins to make errors. He admits that he is looking for a sort of global agglomeration of nuances all together, that add up to a perceived difference: a state of sounding better. We *could*, of course, help him get closer to being able to pass, by reducing the magnitude of the challenge: for instance, require him only to report ONE small narrow phenomenal difference (i. e., background hiss.) Even then, we could more easily and repeatably measure that, repeatably, with metrics. His error-prone nature still continues to exist, revealing human fallibility.
In (b) above, with high repeatability factors and greater controls, Wellfed's real perceptions are enhanced. This tends to eliminate loss of focus, confusion over very vague generalities, and to reduce the tendency for musical emotions to come into play. His chance of passing the test by merely identifying a difference and inferring that the (internal nonverbal) criterion of "better" and therefore allegedly treated may be inferred. But, even so, he is still likely to make at least one mistake and generate one false positive; whether he does or not will vary, likely according to chance. Under potentially possible test situations, he could take the 10 out of 10 test ONCE and fail...and take it again, and PASS. It would vary all over the place of course, since he won't always achieve a fixed error rate of only 1 false positive. We may likely experience a very random random sequence of passes and failures, ALMOST EVER achieving "perfect 10 out of 10"; sometimes "one error"; or maybe "no errors at all", given the uncertainties of human response to audio testing.
In Wellfed's protocol, he FAILS when any one of the above circumstances occurs, except the almost predeterministically impossible perfect score. I claim, then, that the test is moot.
(Are you seeing here my inference that the test is simply unfit to be judged via a dichotomous Randi Challenge?! We have to throw away the Wellfed protocol; I need assistance -- and time -- in crafting a better one, let alone the CORRECT one! I think it is unfit; the gestalt of the test probably cannot be reduced to resolve any action or phenomenon whatsoever, using one human test subject and hearing. It must be done, instead, with more subjects, acquiring an extensive data set or sets, and statistically analzying and properly weighting them by methods defined in the protocol. A dichotomous, deterministic result using one proscribed test, and one subject, demanding a perfect score will [probably, or even certainly per PianoTeacher] fail.)
As I said, though ABX *helps* Wellfed to have a greater chance of succeeding, it is obvious from his own resistance, reported to the forum, that it is outside of his experience and he won't undertake to do it. Maybe he will sometimes try it, away from here, and will get acquainted with the powerful abilities he can gain via enhanced focus leading to lower error rates and higher confidence.
Now I am running out of time here, and I admit that though I've tried to refine the explanation above, I may not have done if perfectly and may be overlooking other possibilities, conflating things, or suggesting non-practical distinctions that somebody can think through. However, these ideas came to me as I crafted speculative examples of alternative test methods. Some of these I'd done with at least SOME similarities to the examples I've written here; some I've not done but have only read about in scientific papers or audio journals.
Diogenes, I am sure -- as I said before -- that it is likely for you to find my answers obscurantist. I infer this from the way you have parsed almost every explanation I've given, for the ones that you have deigned to critique.
I tend to conclude that a debate between Diogenes and myself cannot be resolved and considered PROVED by either Diogenes or myself -- at least, I am always tending to miss gaining conviction that he actually corrects me, and since he systematically finds fault with my logic, it either isn't "logic " but rather junk; or it IS logic and Diogenes' analysis is incomplete or even wrong. (I do hope that this is fairly neutral and non-egocentric.)
My operative hypothesis is, then, that we don't communicate efficiently and with sufficient commonality (and that maybe somewhere I am making errors but cannot perceive them.) The fault is certainly just as much MINE as I might allege it is his; he might see it as ALL being my failure; he might also allow for the responsibility of failure to communicate being on a continuum with each of us sharing a part of the blame; and so on: we can now proceed to argue WHERE we want to adjust the slider... but I'd rather pass. I mean to say this with all good collegial respect for Diogenes!
PianoTeacher