DESIGNER CONTINGENCY TABLE ANALYSIS.

G. William Moore, MD, PhD.
http://www.netautopsy.org/desgnrct.htm


Send comments and correspondence to: George.Moore4@med.va.gov
See also: http://www.medparse.com/gwmcv .............

INTRODUCTION.



In its simplest form, a CONTINGENCY TABLE, also known as a MISCLASSIFICATION MATRIX or CONFUSION MATRIX , is a 2×2 table or matrix (2 rows, 2 columns), with NUMBERS OF PATIENTS listed in each row-column box, or CELL, of the table (Cios, 2006). For example, one thousand patients might be distributed as follows:

      Heuristic:
Gold
Standard ↓
NoYes
No650150
Yes15050


This is a 2×2CT with 650 patients in the upper-left cell; 150 patients in the upper-right cell; 150 patients in the lower-left cell; and 50 patients in the lower-right cell, a total of 1000 patients. By convention, we denote each cell-count as <i,j>, where i denotes the row-number and j denotes the column-number. Then <upper,left> = 650; <upper,right> = 150; <lower,left> = 150; and <lower,right> = 50.

It is convenient to speak of the rows as the GOLD STANDARD or NATURE or TRUTH, or our best knowledge of truth; and the columns as the HEURISTIC TEST. Then for the example, 650 patients have gold-standard-no, heuristic-no; 150 patients have gold-standard-no, heuristic-yes; 150 patients have gold-standard-yes, heuristic-no; 50 patients have gold-standard-yes, heuristic-yes;

Typically, it is more expensive and time-consuming to obtain the gold-standard than the heuristic. Therefore, the purpose of performing statistical analysis on a contingency table may be to determine whether the heuristic might serve as a reasonable substitute for the gold standard.

For example, the gold standard may be a complete medical workup for a disease; and the heuristic for the disease may be the concentration of a particular serum chemical. In a medical research project, determining the gold standard may be completely impractical for the care of a particular patient, as for example, major surgery or autopsy. Thus the purpose of the heuristic may be as a substitute for an ethically or economically impractical gold-standard. With this interpretation, the upper-left and lower-right cells represent agreement between the gold-standard and the heuristic; 160 patients represent false-positives (gold-standard-no, heuristic-yes); and 160 patients represent false-negatives (gold-standard-yes, heuristic-no).

Other important features of a 2×2CT are MARGINAL TOTALS (i.e., ROW TOTALS and COLUMN TOTALS) and the GRAND TOTAL. In the example:

      Heuristic:
Gold
Standard ↓
NoYesTotal
No650150800
Yes15050200
Total8002001000


where the upper-row total is 800; the lower-row total is 200; the left-row total is 800; the right-row total is 200; and the grand-total is 1000.

We can label the cell-totals with letters at the beginning of the alphabet; marginal totals with letters near the end of the alphabet; and the grand-total as z, where a+b=v; c+d=w; a+c=x; b+d=y; and v+w=x+y=z.

      Heuristic:
Gold
Standard ↓
NoYesTotal
Noabv
Yescdw
Totalxyz


In the example, the upper-left cell is a=650; the upper-right cell is b=150; the lower-left cell is c=150; the lower-right cell is d=50; the upper-row total is v=800; the lower-row total is w=200; the left-row total is x=800; the right-row total is y=200; and the grand-total is z=1000.

In some investigations, it may be more transparent/clear to regard each cell as a CONTAINER, or SET, containing PATIENTS (人) (, "ren", is the Chinese ideogram for person) or TOKENS corresponding to patients. In a smaller example, with one for each ten patients in the main example.

.NoYesTotal
No 人人人人人
人人人人人
人人人人人
人人人人人
人人人人人
人人人人人
人人人人人
人人人人人
人人人人人
人人人人人
人人人人人
人人人人人
人人人人人
人人人人人
人人人人人
人人人人人
80
Yes 人人人人人
人人人人人
人人人人人
人人
人人人
20
Total8020100


The above picture has the same information as numbers in the cells, but they are displayed differently. The statistical analysis may then proceed by/consist of moving the tokens between the different cells (misclassification paradigm).

PARADIGMS OF CLASSICAL STATISTICS.



In classical statistics (chisquare (χ2) contingency test; or Fisher exact test), typically the question asked by the statistical analysis is whether the gold-standard influences the heuristic. That is, the NULL HYPOTHESIS asserts that the heuristic is completely independent of the gold-standard. In this formulation of the null-hypothesis, the EXPECTED CELL TOTAL for each cell is the product of the marginal totals, divided by the grand total. If we denote A as the expected value for a; B as the expected value for b, etc., then A = (v×x)/z; B = (v×y)/z; C = (w×x)/z; and D = (w×y)/z.

In the chisquare contingency test, the calculated value for chisquare, denoted χcalc2, is the sum of (observed - expected)2/expected, summed over each of the cells, namely, χcalc2 = (A-a)2 + (B-b)2 + (C-c)2 + (D-d)2. The calculated-chisquare, χcalc2, is compared against an idealized chisquare distribution, χideal2 at a particular level of significance, p, say, p=0.05 (the traditional cutoff point in statistics). If χcalc2 > χideal2 at that level of significance, p, then we REJECT THE NULL HYPOTHESIS. For a 2×2CT, the value of χideal2 for p=0.05 is χideal2 = 3.841.... Therefore, χcalc2 = 3.90625 > 3.841... = χideal2, reject the null hypothesis. at the p=0.05 level of significance.

In the Fisher exact test, we calculate an F-probability for every value of a ranging from minimum a to maximum a. In the example, minimum a = 600 and maximum a = 800, because it is necessary to maintain all of a, b, c, d as non-negative. For example, if a < 599, then in order for the marginal totals to remain unchanged, d < -1, which is meaningless. Similarly, if a > 801, then likewise in order for the marginal totals to remain unchanged, b < -1 and c < -1, which is likewise meaningless.

The F-probability for each cell is given by the formula:
F = ...............
All the probabilities for cells more extreme than a are added, to obtain the probability for rejection the null hypothesis. For the example, ......................

But what if the heuristic predicts the gold-standard just by a little bit? Is this really the right statistical question for the medical issue at hand?

DESIGNER NULL HYPOTHESIS PARADIGM.



Perhaps a more reasonable question might be: how far do the observations deviate from the heuristic as a perfect measure of gold-standard? Consider the example:

      Heuristic:
Gold
Standard ↓
NoYesTotal
No8000800
Yes0200200
Total8002001000


In this case, there are no false-positives and no false-negatives. The chisquare contingency test is completely useless, since the expected values B=0 and C=0 are both zero, and division by zero is meaningless. For similar reasons, the Fisher exact test, yields a degenerate result.

Finally, consider the possibility in which the heuristic has a minimum acceptable false-positive-rate, fp, and a minimum acceptable false-negative-rate, fn, say, false-positive-rate, fp = 10%, and false-negative-rate, fn = 0.5%. That is, false-positive patients can be retested by a somewhat more expensive and/or painful test; but a false-negative patient does not return to his/her physician until the next regular screening interval (annual physical examination, or some such), after which the disease may have progressed and become less treatable. Consider this situation:

      Heuristic:
Gold
Standard ↓
NoYesTotal
No7964800
Yes20180200
Total8002001000


For a screening test, such as serum-prostate-specific-antigen or gynecologic-cytology, if you want to be brutally monetary, then you can assign a dollar-value for a false-negative patient (cost of delayed therapy, embarrassment to the medical institution, cost of lawsuits, etc.) versus the dollar-value for a false-positive patient (cost of expensive or painful re-test) and balance the two factors, in order to create a DESIGNER CONTINGENCY TABLE.

Classical chisquare contingency test and Fisher exact test do not fit comfortably into this designer model, but this more flexible model is perhaps more realistic for medical applications. A further issue is that medical institutions are generally not very happy about publishing their failure rates. Crude attempts at estimating and publishing such failure rates, conducted by the U. S. Health Care Financing Administration under penalty of non-payment for medicare patients, were so unfair and unrewarding that they were abandoned after a few years (). Remember that we are asking for IDEALIZED values for the false-positive-rate, fp and false-negative-rate, fn. To save face, one could arbitrarily set fn=0 by fiat, and solve for the dollar-ratio of expensive-retest versus initial-heuristic, in order to estimate fp. Then: ....................

TOKEN SWAP PARADIGM.













TOKEN SWAP TEST OF SIGNIFICANCE. A 2×2 contingency table (2×2CT) also known as misclassification matrix or confusion matrix, is a 2×2 rectangular table, whose contents (cells) contain numbers of patients, or tokens. The two rows (no versus yes) correspond to a gold standard, or best possible knowledge with respect to a particular disease; the two columns (no versus yes) correspond to a heuristic test for that disease. In classical statistics, one employs either the chisquare (χ2) contingency test; or the Fisher exact test. Both classical tests have a standard null hypothesis (namely, that the gold standard is completely independent of the heuristic values. In the token swap test of significance, there is no set null hypothesis, and the user may construct a designer null hypothesis to custom-fit a particular medical application.

Method used to examine pain crisis in sickle cell disease. (Sickle cell crisis; Token swap test; Neyman-Pearson condition; Designer Contingency Table).
.NoYesTotal
No640160800
Yes16040200
Total8002001000

.NoYesTotal
No 人 人 人 人
人 人 人 人
人 人 10
Yes 人 人 人 人 人 人
人 人 人
9
Total8002001000


REFERENCES.



Cios KJ.
Assessment of the Generated Data Model.
2006;:. In press.

Moore GW, Hutchins GM, Miller RE.
Token swap test of significance for serial medical data bases.
Am J Med. 1986 Feb;80(2):182-190.
PMID: 3511687; UI: 86127353.
PubMed Entry

Moore GW, Hutchins GM, Miller RE.
A new paradigm for hypothesis testing in medicine, with examination of the Neyman Pearson condition.
Theor Med. 1986 Oct;7(3):269-282.
PMID: 3798393; UI: 87094863.
PubMed Entry

Last updated: 9/17/2005, by G. William Moore, MD, PhD.