Excerpts from US Prev Svs Task Force

 

EXCERPTS FROM US PREVENTIVE SERVICES TASK FORCE GUIDES TO
CLINICAL PREVENTIVE SERVICES.  READ AS
PREPARATION FOR STEVE MARTIN’S LECTURE ON SCREENING STATISTICS

 

Excerpt from Methods section Of Guide To Clinical Preventive Services,
Second Edition

The methodologic issues involved in evaluating screening tests require
further elaboration. As mentioned above, a screening test must satisfy two
major requirements to be considered effective:

  • The test must be able to
    detect the target condition earlier than without screening and with
    sufficient accuracy to avoid producing large numbers of false-positive and
    false-negative results (accuracy of screening test).
  • Screening for and treating
    persons with early disease should improve the likelihood of favorable
    health outcomes (e.g., reduced disease-specific morbidity or mortality)
    compared to treating patients when they present with signs or symptoms of
    the disease (effectiveness of early detection).

These two requirements of screening
are essential and therefore appear as headings in each of the 53 screening
chapters in this report.

Accuracy of Screening Tests.

The “accuracy of a screening
test” is used in this report to describe accuracy and reliability.
Accuracy is measured in terms of two indices: sensitivity and specificity (Table 2). Sensitivity refers to the proportion of persons
with a condition who correctly test “positive” when screened. A test
with poor sensitivity will miss cases (persons with the condition) and will
produce a large proportion of false-negative results true cases will be told
incorrectly that they are free of disease. Specificity refers to the proportion
of persons without the condition who correctly test “negative” when
screened. A test with poor specificity will result in healthy persons being
told that they have the condition (false positives). An accepted reference
standard (“gold standard”) is essential to the empirical
determination of sensitivity and specificity, because it defines whether the
disease is present and therefore provides the means for distinguishing between
“true” and “false” test results.

The use of screening tests with
poor sensitivity and/or specificity is of special significance to the clinician
because of the potentially serious consequences of false-negative and
false-positive results. Persons who receive false-negative results may
experience important delays in diagnosis and treatment. Some might develop a
false sense of security, resulting in inadequate attention to risk-reducing
behaviors and delays in seeking medical care when warning symptoms become
present.

False-positive results can lead to
follow-up testing that may be uncomfortable, expensive, and, in some cases,
potentially harmful. If follow-up testing does not disclose the error, the
patient may even receive unnecessary treatment. There may also be psychological
consequences. Persons informed of an abnormal medical test that is falsely
positive may experience unnecessary anxiety until the error is corrected.
Labeling individuals with the results of screening tests may affect behavior for
example, studies have shown that some persons with hypertension identified
through screening may experience altered behavior and decreased work
productivity.2,3

A proper evaluation of a screening
test result must therefore include a determination of the likelihood that the
patient has the condition. This is done by calculating the positive
predictive value
(PPV) of test results in the population to be screened (Table 2). The PPV is the proportion of positive test results
that are correct (true positives). For any given sensitivity and specificity,
the PPV increases and decreases in accordance with the prevalence of the target
condition in the screened population. If the target condition is sufficiently
rare in the screened population, even tests with excellent sensitivity and
specificity can have low PPV in these settings, generating more false-positive
than true-positive results. This mathematical relationship is best illustrated
by an example (see Table 3):

A population of 100,000 in which the prevalence of a
hypothetical cancer is 1% would have 1,000 persons with cancer and 99,000
without cancer. A screening test with 90% sensitivity and 90% specificity would
detect 900 of the 1,000 cases, but would also mislabel 9,900 healthy persons.
Thus, the PPV (the proportion of persons with positive test results who
actually had cancer) would be 900/10,800, or 8.3%. If the same test were
performed in a population with a cancer prevalence of 0.1%, the PPV would fall
to 0.9%, a ratio of 111 false positives for every true case of cancer detected.

Reliability(reproducibility),
the ability of a test to obtain the same result when repeated, is another
important consideration in the evaluation of screening tests measuring
continuous variables (e.g., cholesterol level). A test with poor reliability,
whether due to differences in results obtained by different individuals or
laboratories (interobserver variation) or by the same observer (intraobserver
variation
), may produce individual test results that vary widely from the
correct value, even though the average of the results approximates the true
value.

Effectiveness of Early Detection.

Even if the test accurately detects
early-stage disease, one must also question whether there is any benefit to the
patient in having done so. Early detection should lead to the implementation of
clinical interventions that can prevent or delay progression of the disorder.
Detection of the disorder is of little clinical value if the condition is not
treatable. Thus, treatment efficacy is fundamental for an effective screening
test. Even with the availability of an efficacious form of treatment, early
detection must offer added benefit over conventional diagnosis and treatment if
screening is to improve outcome. The effectiveness of a screening test is
questionable if asymptomatic persons detected through screening have the same
health outcome as those who seek medical attention because of symptoms of the
disease. Studies of the effectiveness of cancer screening tests, for example,
can be influenced by lead-time and length biases.

Lead-Time and Length Bias.

It is often difficult to determine
with certainty whether early detection truly improves outcome, an especially common
problem when evaluating cancer screening tests. For most forms of cancer,
5-year survival is higher for persons identified with early-s tage disease.
Such data are often interpreted as evidence that early detection of cancer is
effective, because death due to cancer appears to be delayed as a result of
screening and early treatment. Survival data do not constitute true proof of
benefit, however, because they are easily influenced by lead-time bias:
survival can appear to be lengthened when screening simply advances the time of
diagnosis, lengthening the period of time between diagnosis and death without
any true prolongation of life.4

Length biascan also result
in unduly optimistic estimates of the effectiveness of cancer screening. This
term refers to the tendency of screening to detect a disproportionate number of
cases of slowly progressive disease and to miss aggressive cases that, by
virtue of rapid progression, are present in the population only briefly. The
“window” between the time a cancer can be detected by screening and
the time it will be found because of symptoms is shorter for rapidly growing cancers,
so they are less likely to be found by screening. As a result, persons with
aggressive malignancies will be underrepresented in the cases detected by
screening, and the patients found by screening may do better than unscreened
patients even if the screening itself does not influence outcome. Due to this
bias, the calculated survival of persons detected through screening could
overestimate the actual effectiveness of screening.4

Assessing Population Benefits.

Although these considerations
provide necessary information about the clinical effectiveness of preventive
services, other factors must often be examined to obtain a broader picture of
the potential health impact on the population as a whole. Interventio ns of
only minor effectiveness in terms of relative risk may have significant impact
on the population in terms of attributable risk if the target condition is
common and associated with significant morbidity and mortality. Under these
circumstances, a highly effective intervention (in terms of relative risk) that
is applied to a small high-risk group may save fewer lives than one of only
modest clinical effectiveness applied to large numbers of affected persons (see Table 4). Failure to consider these epidemiologic
characteristics of the target condition can lead to misconceptions about
overall effectiveness.

Potential adverse effectsof
interventions must also be considered in assessing overall health impact, but
often these effects receive inadequate attention when effectiveness is
evaluated. For example, the widely held belief that early detection of disease
is beneficial leads many to advocate screening even in the absence of
definitive evidence of benefit. Some may discount the clinical significance of
potential adverse effects. A critical examination will often reveal that many
kinds of testing, especially among ostensibly healthy persons, have potential
direct and indirect adverse effects. Direct physical complications from test
procedures (e.g., colonic perforation during sigmoidoscopy), labeling and
diagnostic errors based on test results (see above), and increased economic
costs are all potential consequences of screening tests. Resources devoted to
costly screening programs of uncertain effectiveness may consume time,
personnel, or money needed for other more effective health care services. To
the USPSTF, potential adverse effects are considered clinically relevant and
are always evaluated along with potential benefits in determining whether a
preventive service should be recommended.

 

EXCERPTED FROM “CANCER SCREENING” ON-LINE MODULE

Cancer is the second leading cause of death in the United States. The major
intent of cancer screening is to reduce morbidity and mortality, typically by detecting
disease earlier than when the disease becomes clinically apparent.  However, earlier detection does not
necessarily result in decreased morbidity and mortality. Because the test
detects cancer before it is clinically apparent, the cancer is usually detected
at an earlier stage, and a stage shift is said to occur. Although earlier
diagnosis is intuitively appealing, if no effective treatment is available,
detection at an earlier stage will be of real benefit. Although 5-year survival
may appear to increase (referred to as lead-time bias, discussed below),
earlier detection will be of no real benefit if no effective treatment is
available or if treatment does not affect outcomes. This issue has been raised
with several screening tests, including mammography and prostate cancer
screening. Five-year survival rates will also appear to increase when there is
increased detection of indolent types of cancer, although there is no real
benefit to those patients who have the aggressive cancer form. The best standard
by which to judge a screening test is reduction in disease-specific mortality
rate. 

Because screening implies subjecting an apparently healthy population to
testing, rigorous criteria should be applied in assessment of any new screening
test. Criteria for screening for a disease include that the disease should be
medically important, and should cause significant morbidity and/or mortality.
The disease should be detectable at a pre-clinical phase, and have a sufficient
window of time before becoming clinically apparent (to allow a time for
screening to occur). The test should be accessible and acceptable to the
general public and have a low complication rate, since it will be applied to
large numbers of patients.

 

Sensitivity, specificity, positive predictive value and negative
predictive value

The test characteristics are another crucial component in determining the
usefulness of a screening test. The sensitivity of a test, its ability to
correctly identify those who have the disease (i.e., of those with the disease,
what percentage tests positive), should be adequate to detect a reasonable
portion of disease. Sensitivity can be remembered by the mnemonic: sensitivity
= PID = positive in disease. Tests that screen for cancers with a short
preclinical phase require a higher sensitivity, while tests for cancers with a
long preclinical phase may have low sensitivity, since testing can be repeated
multiple times. The specificity of a test (i.e. the ability to correctly
identify those who do not have the disease), should be high for all screening
tests to reduce the number of false-positive results. Specificity can be
remembered by the mnemonic: specificity = NIH = negative in health.

Positive and negative predictive value refers to test results, and addresses
the likelihood that a person with a positive test result has disease, or a
person with a negative test result does not have disease. The positive
predictive value of a test refers to the proportion of those patients who test
positive that actually have the disease (i.e. true positives rather than false
positives). The positive predictive value depends heavily on the prevalence of
the disease in the population. When a disease is infrequent, even tests with
high specificity will have a poor positive predictive value.

The negative predictive value of a test refers to the proportion of patients
who test negative that truly do not have the disease (i.e. true negatives
rather than false negatives). Negative predictive value is best when disease
prevalence is low.

 

Lead-time bias and length-time bias

Two types of bias that must be considered in evaluating screening tests:
lead-time bias and length-time bias. If a screening test detects a tumor at an
early stage, but the cancer remains incurable, it will appear as if screening
has increased survival time merely by finding the tumor earlier. For example,
suppose a new ultrasound screening test for pancreatic cancer was developed,
and a group of adults over the age of 50 underwent ultrasound screening every
three months. In the unscreened population, life expectancy for pancreatic
cancer is six months. In the screened population, tumors were found three
months earlier, but were already inoperable. In this screened group, life
expectancy will be nine months, and it will appear that screening increases
lifee xpectancy by three months (or 50%), although there has been no benefit to
the screened population. This type of bias is known as lead-time bias.

Another type of bias is caused by the heterogeneous nature of cancer. Consider
two different types of prostate cancer: an aggressive, rapidly-fatal disease
that progresses from no disease to symptomatic to death in three years; and a
relatively slow-growing type that takes seven years before becoming clinically
apparent, if at all. Suppose that a prostate ultrasound screening test is
implemented in a sample population, to be performed at five-year intervals. In
the screened population, many slow-growing prostate cancers will be found (as
survival is seven years), while many of those with aggressive prostate cancer
will have gone from no disease to death between the five year screening
intervals. Many of those indolent cases will go undetected in the unscreened
group, and survival from prostate cancer in the unscreened group will be
shorter, as those with aggressive disease will predominate. The bias introduced
by screening, in which slow-growing tumors are detected and appear to lengthen
the survival time, while aggressive tumors causing death between screening
intervals are missed, is called length-time bias.

Because of lead-time and
length-time bias, survival time should not be used as a surrogate outcome for
mortality reduction. One simple way to reduce the potential impact of lead-time
bias is to look at prospective studies with prolonged follow-up. The longer the
follow-up, the less likely lead-time will substantially affect observed
differences. However, this does not allow you to quantify the impact of any
lead-time bias; rather, it allows you to assume that the impact of the bias is
less. The only way to avoid these biases completely is to perform randomized
controlled trials and use mortality as the outcome.