NATIONAL CANCER INSTITUTE MONOGRAPH 67 May 1985 Selection, Follow-up, and Analysis in Prospective Studies: A Workshop NIH Publication No. 85-2713 U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES PUBLIC HEALTH SERVICE NATIONAL INSTITUTES OF HEALTH NATIONAL CANCER INSTITUTE, BETHESDA, MARYLAND 20205 NATIONAL CANCER INSTITUTE MONOGRAPHS Vincent T. DeVita, Jr., Director, National Cancer Institute The Editorial Board welcomes proposals for the publication of monographs. The subject matter must be relevant to cancer research, have long-term interest, appeal to a wide readership, and be of quality meeting the standards of the Journal of the National Cancer Institute. Most monographs report the proceedings of conferences. Proposals should be sent to the Editor in Chief as early as possible, preferably several months before a conference (for conference proceedings) or before preparation of a final draft (for other monographs). The Board of Editors does not review manuscripts for the Monograph Series. However, the Board may request scientific review of specific papers or sections in a proposed monograph, or it may seek advice on whether the proposed monograph meets the criteria mentioned above. BOARD OF EDITORS Peter Greenwald, Editor in Chief Elizabeth K. Weisburger, Assistant Editor in Chief Stuart A. Aaronson, Associate Editor Kurt W. Kohn, Associate Editor William J. Blot, Associate Editor Arthur S. Levine, Associate Editor Michael J. Boyd, Associate Editor Lance A. Liotta, Associate Editor Joseph W. Cullen, Associate Editor John R. Ortaldo, Associate Editor Charles H. Evans, Associate Editor Jeffrey Schlom, Associate Editor Janet W. Hartley, Associate Editor Richard M. Simon, Associate Editor George S. Johnson, Associate Editor Jerome W. Yates, Associate Editor INTERNATIONAL CANCER INFORMATION CENTER Susan Molloy Hubbard, Director PUBLICATIONS BRANCH Jean Griffin Baum, Chief EDITORIAL STAFF Edwin A. Haugh, Managing Editor Pamela T. Allen, Assistant Managing Editor Florence I. Gregoric, Monograph Editor All articles appearing in this monograph published by the Journal of the National Cancer Institute are in the public domain and may be reproduced or copied without requesting permission from the authors or the Editor in Chief. However, notice of intent to use material is appreciated. For sale ONLY by the Superintendent of Documents, U.S. Government Printing Office, Washington, D.C. 20402. Selection, Follow-up, and Analysis in Prospective Studies: A Workshop Proceedings of a Conference held at the Waldorf-Astoria Hotel, New York, N.Y. October 3-5, 1983 Sponsored by: Scientific Editors: The American Cancer Society, Inc. Lawrence Garfinkel National Office Oscar Ochs New York, N.Y. Margaret Mushinski — PUB TABLE OF CONTENTS Dedication: In Appreciation of Dr. E. Cuyler Hammond Lawrence Garfinkel Opening Remarks William Haenszel Session |. Life Table Analysis Underlying Theory of Actuarial Analyses Bernard Benjamin Examples of Early Mortality Follow-up Studies Richard B. Singer Early Studies of Tuberculosis George W. Comstock Actuarial Contributions of Life Table Analysis Edward A. Lew Discussion | Session ll. Chronic Disease Studies: Nonoccupational Cohorts Chairman's Remarks Abraham Lilienfeld Co-Chairman’s Remarks Ernst Wynder Selection, Follow-up, and Analysis in the American Cancer Society Prospective Studies Lawrence Garfinkel Selection, Follow-up, and Analysis in the Atomic Bomb Casualty Commission Study Seymour Jablon The Framingham Study: Sample Selection, Follow-up, and Methods of Analyses Manning Feinleib Page 23 29 37 45 47 49 53 59 TABLE OF CONTENTS Selection, Follow-up, and Analysis in the Health Insurance Plan Study: A Randomized Trial With Breast Cancer Screening Sam Shapiro, Wanda Venet, Philip Strax, Louis Venet, and Ruth Roeser Discussion Il Session lll. Chronic Disease Studies: Occupational Cohorts Chairman's Remarks Philip J. Landrigan Selection, Follow-up, and Analysis in the Birmingham Study J. A. H. Waterhouse Selection, Follow-up, and Analysis in the Coke Oven Study Howard E. Rockette and Carol K. Redmond Statistical and Practical Problems of Cohort Study Design: Occupational Hazards in the Health Care Industry Jeanne M. Stellman Discussion lll Session IV. Methods of Follow-up and Classification Problems Chairman's Remarks Leonard Kurland Selection Factors in Cohort Studies William J. Nicholson Use of Computerized Record Linkage in Follow-up Studies of Cancer Epidemiology in Canada Geoffrey R. Howe Problems in Classification of Cancer for Epidemiologic Research John W. Berg Rewards for Cancer Control With Use of Biologic Markers and End Points Paul Kotin Co-Chairman’s Remarks Leon Gordis Discussion IV 65 75 83 85 89 95 101 109 1m 17 123 129 133 155 TABLE OF CONTENTS Session V. Data Analysis in Cohort Studies Chairman's Remarks Steven D. Stellman Multivariate Cohort Analysis Norman Breslow Matched Groups Analysis Method E. Cuyler Hammond Strategies for Validation Robert A. Lew Avoidance of Bias in Cohort Studies Nathan Mantel Co-Chairman’s Remarks David Schottenfeld Discussion V Session VI. Models of Chronic Disease Chairman's Remarks Nicholas Wald Cancer Risk Factors in Human Studies John Higginson Biologic Banking in Cohort Studies, With Special Reference to Blood Nicholas L. Petrakis Epidemiology and the Inference of Cancer Mechanisms David G. Hoel Age at Exposure Versus Years of Exposure Herbert Seidman Co-Chairman’s Remarks George Hutchison Discussion VI Participants 145 149 157 161 169 173 175 183 187 193 199 205 21 213 219 4 Dedication: In Appreciation of Dr. E. Cuyler Hammond ' Lawrence Garfinkel ? Many of the world’s most noted epidemiologists held a workshop with Dr. E. Cuyler Hammond in October 1983 to honor his major contributions and to present papers on and discuss the conduct of cohort studies. Dr. Hammonds imagination and bold approach to science have been evident since the early days of his career with the American Cancer Society. With Dr. Daniel Horn, he was convinced that investigators could use volunteers to obtain base-line data for long-term epidemiologic studies, and, furthermore, those volunteers would be able to report the status of each person enrolled for several years. The first such study was launched in 1952 when 22,000 American Cancer Society volunteers in 10 states enrolled 187,000 men in a study of smoking habits and traced these men for 4 years. This landmark study, known as the “Hammond and Horn Study,” was the first prospective one on smoking in the United States and provided important information about the influence of smoking habits on total death rates, as well as on death rates from coronary heart disease and lung cancer. As Sir Richard Doll has recently written in the Journal of the American Medical Association (251:2854-2857, 1984): “The success of this study opened up new vistas for epidemiologists and, although it is still possible to count on the fingers of one hand the studies that have subsequently involved larger numbers of subjects, no one now ques- tions the practicability or value of conducting studies on this scale in appropriate circumstances.” In the late 1950s, Dr. Hammond planned and organized a similar but much wider ranging epidemiologic study involving 68,000 volunteers in 25 states; over | million persons were enrolled in this study, which was named Cancer Prevention Study 1. The subjects were traced over a 12-year period with a loss to follow-up of less than 29%. Many other types of data were collected besides smoking information, including family history, history of diseases, occupational exposures, diet, drinking habits, etc. From this rich and varied data base, more than 80 papers have been published including major studies on the dose-response effects of smoking, smoking and health in women, effects on disease rates of cigarettes with reduced tar and nicotine, obesity and cancer, risk factors in breast and cervical cancer, factors related to heart disease and stroke, and many more. Cohort studies and chronic disease epidemiology gained wide acceptance because of the results of the Hammond-Horn Study and Cancer Prevention Study I. Nowadays, prospective studies of a similar though smaller scope are conducted routinely by investigators throughout the world. Many of the studies reported at this Workshop were modeled after Hammond's studies, and all were influenced by him in one way or another. Dr. Hammond realized early that public and scientific acceptance of the relationship of smoking to disease would be greatly increased if evidence from studies of tobacco use and histologic changes in human tissues supported the epidemiologic data. Thus in 1955, he collaborated with pathologist Dr. Oscar Auerbach in designing and analyzing a series of studies of smoking and histologic changes in the lung, esophagus, larynx, and heart from specimens obtained at autopsy. These studies did indeed fully support the dose-response relationship with smoking found in human epidemiologic studies. Later, he joined Dr. Irving Selikoff in planning and executing pioneering occupational cohort studies. Investigations of lung cancer and other diseases among asbestos workers in relation to their exposure to asbestos were followed by studies of various occupational groups, such as roofers, cotton textile workers, and electrical workers. To all of his investigations, Dr. Hammond brought a breadth of vision, insights into possible confounders, and scrupulous integrity. He was and continues to be concerned with the proper classification of variables, the smallest details of coding, the careful checking of results, the biologic and scientific merit of the findings, and the importance of the results from a public health point of view. At this Workshop conducted October 3-5, 1983, in New York City, Dr. Hammond’s contributions were noted. The papers presented included the characteristics of the design, selection factors, and problems in analysis for the major cohort studies undertaken in the United States in the last 30 years. Other speakers addressed issues relating to life insurance studies, problems of classification, methods of data analysis, and models of chronic disease. The wide range of papers presented by Dr. Hammond’s peers and colleagues at this Workshop attest to the long-range soundness of Dr. Hammond’s original study design, even while exploiting newer methods of statistical analyses that have evolved during the past three decades. Many of those statistical methods were developed precisely because of the expanding interest in prospective studies and the need for more effective analytic tools. The fact that these newer techniques have made it possible for us to account for more variables at one time, which has led to more sophisticated cause-effect interaction models of chronic disease, is yet another example of the ripple effect of Dr. Hammonds seminal work. "Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2Epidemiology and Statistics Department, American Cancer Society, 4 West 35th Street, New York, N.Y. 10001. Opening Remarks The proposal to hold a Workshop on Cohort Studies was made at a 1981 meeting of the American Cancer Society’s Committee on Intramural Research. This committee advises on studies conducted by the Society’s epidemiology and statistics staff, several of which have been conducted in collaboration with Dr. Irving Selikoff (Mt. Sinai School of Medicine, New York City) and Dr. Oscar Auerbach (Veterans Administration Medical Center, East Orange, N.J.). The best known of these studies and the centerpiece of work conducted over the past 20 years is the Cancer Prevention Study I, which involved the surveillance of a cohort of 1,000,000 men and women and was under the direction of Dr. E. Cuyler Hammond. At this same meeting, the committee also put forth the suggestion that the success of this study had posed new questions for research that could best be answered by launching a new cohort study. The advice was accepted by the American Cancer Society staff, and Cancer Prevention Study II is now under way. No one can review the publications emanating from Cancer Prevention Study I without gaining a keen appreciation of the many contributions to the methodologic aspects of cohort studies and the analysis of the resulting data made by Dr. Hammond and his colleagues. The proposal to hold this Workshop is timely because it provides us an opportunity to gain a fresh perspective on the contributions of the American Cancer Society staff to the subject of cohort studies. It is also opportune in view of the changing status of cohort studies compared with that of case-control studies. In the 1950s, epidemiologists used case-control techniques to investigate a variety of issues, the most important one at the time was the relationship between tobacco use and lung cancer. Even though numerous case-control studies of smoking and lung cancer had been conducted, some skepticism emerged about these study findings and their interpretation. There remained a distinct need on the part of many investigators to confirm, if you will, the findings in a cohort study. This led to the initiation of 3 major cohort studies: the Cancer Prevention Study I, the National Cancer Institute-Veterans Administration Study, and the British Physicians Study. As the methodology of case-control studies was developed and the points of congruence between the retrospective and prospective study approaches were identified, the need to invoke cohort studies to replicate case-control study findings diminished. An interesting fact is that the relationship between conjugated estrogens and endometrial cancer was detected and elaborated by a series of case-control studies conducted in rapid succession. I do not recall any serious proposals for conducting a prospective study to investigate this topic. The evidence from the case-control studies sufficed to reach a consensus that the observed association reflected a cause-effect relationship. Still no one would claim that prospective studies are outmoded. Workshop participants may wish to consider if the pendulum has swung too far in the direction of case-control studies and to define the circumstances for which cohort studies are indicated. Another objective is to be served by this Workshop. A substantial literature exists bearing on the design, conduct, and analysis of cohort studies, but it is scattered throughout many journals specializing in contributions from actuaries, demographers, statisticians, and epidemiologists. No single source provides a comprehensive treatment. Part of this need should be met by a book prepared by Drs. Breslow and Day as a companion volume to their text on the analysis of case-control studies. The proceedings of this meeting may also serve as a valuable reference source. The presentations we will hear over the next 3 days are based on actual experience with earlier cohort studies and deal with the issues of selection, classification, and follow-up. These reports should provide useful guides to future investigators confronted with the design and conduct of a cohort study. William Haenszel SESSION I Life Table Analysis Chairman: Bernard Benjamin Co-Chairman: Edward A. Lew - Underlying Theory of Actuarial Analyses! Bernard Benjamin 2 ABSTRACT —The developments in theory governing the calcu- lation of mortality rates for use in survival measurements working through the initial basic concept of exposure to risk to the later introduction of stochastic elements are reviewed. I have indicated the way in which actuaries and statisticians who work closely with those in the fields of medicine and biology have, by the exchange of methodologic ideas, come to an identity of approach. Recent new actuarial work and likely future developments in actuarial interests are reviewed.—Natl Cancer Inst Monogr 67: 7-14, 1985. THE ACTUARIAL ATTITUDE The actuary owes his original professional existence to the introduction of life insurance and the need for insurance underwriters both to estimate the risk of occur- rence in a specific future interval of an event carrying financial liability to an insurance office and the likely length of survival of the insured to continue paying premiums. The summations of these risks in combination with the monetary values involved and discounting factors (for allowance for the interest earning capacity of those values) represent two sides of an equation that must be balanced if the insurance underwriter is to stay in business. Although the life table provides the obvious model for this calculation, the actuary has always loaded the premiums for profit and for the contingency of a poor outcome in mortality terms and, consequently, he has been concerned with making conservative rather than precise estimates of the mortality rates from which the life table is constructed. Moreover, until comparatively recently, actuaries had the advantage of large bodies of data either as initially accumulated in their offices over a period of several years or as later provided by arrangements for the pooling of data from a number of offices. In the United Kingdom, the data for the Assured Lives Table for 1967-70 of the Continuous Mortality Investigation, maintained as a joint bureau by the actuarial bodies, provided information on over 1,000 deaths in all single year age groups except both extremely young and old. Stochastic elements of variation would not in such circumstances have been regarded as being of practical consequence, nor would homogeneity, if one assumes that observed rates of mortality progressed fairly smoothly with age within broad groupings of policyholders. The broad groups would, of course, distinguish between the 2 sexes, ' Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. * Tait Building, Room CM 514, Northampton Square, London EC1V OHB, United Kingdom. between those initially subject to medical examination and those accepted without examination, and between those insuring against early death and those with expectations of survival. BASIC THEORY However, the underlying theory was concerned with these sources of error. The theory was that, in a homo- geneous population, i.e., those subject to the same risk factors to the same extent, the risk of death progressed smoothly with age and also that overall risk of death over an interval of age could be estimated by observation of such a group of lives over that interval. If E, lives are observed over the year of age x to x + | (those who are lost to observation in the year either by transfer to another homogeneous group or by leaving, except by death, being given fractional exposure proportional to duration of observation) and 0, deaths between age x and x + | are recorded, then 6./ E, is an estimate of q., the so-called probability of dying in the year of age x to x + 1 [the concept gq, itself (the probability of dying in a year) is not itself concerned with variation in mortality during the year]. Other rates used by actuaries are the “force of mortality,” u., representing the rate of mortality over an infinitesimal interval of time (age) dx, so that it may be regarded as the intensity of mortality at a precise moment of age x; and m, the central (average) rate of mortality over the interval of age x to x + 1. If we put the life table function I, of survivors at exact age x into continuous form, then 3.2 — log. | 1 = dx OT dx 0% ) On the basis of a smooth age progression of mortality, it may be assumed that m= uy. Important principles deriving from this theory are that: 1) Lives must go out of observation if for any reason they cease to be within the class observed, but not on death, the risk of which is being measured; 2) “observation” means that, if dying, the subject would actually be counted as a death. There must be strict correspondence between 0, and E,; lives must be counted in E, for the proportion of the year of age x to x + | during which if they died they would be included within the definition of 8, and only for that proportion. The Measurement of Exposure Before the arrival of computers, it was impractical (with such a large body of data) for anyone to count the precise exposure times within each rate interval attributable to individual life insurance policyholders, and it was necessary 8 BENJAMIN to resort to grouping and approximation. Actuaries had to resort to the development of so-called “exposed-to-risk formulas” to match the format of the data, and actuarial textbooks have had, hitherto, much space devoted to this mental discipline which, though now of less practical necessity, is still valuable in the sharpening of actuarial thinking (7). It was all part of the need that we ensure as near as possible exact correspondence between the numera- tor and the denominator of the estimated death rate g.. Now, of course, computers have made approximation unnecessary, but correct information must be programmed into computers, and the mental discipline is still con- siderably important. Moreover, actuaries have become increasingly concerned with the investigation of smaller experiences, e.g., with studies of the prognosis and the insurability of those with particular health impairments. In these circumstances, especially when there is much move- ment in and out of the experience, an actuary must take care to credit the exposure to each life within each interval over which a death rate or its complement, the survival rate, is to be estimated. Graduation The underlying theory, as stated above, also leads to the assumption that (given that conditions of homogeneity have been satisfied) departures of the estimated values from a smooth progression with age may be regarded as sampling errors and that these errors may be eliminated by the fitting of a smooth curve through the observed values, a process termed “graduation.” Financial reasons account for the actuarial interest in smoothness, in that it is undesirable for the derived insurance premiums to progress irregularly with age. Interest in graduation has been great from the earliest days of actuarial practice, and much textbook material is devoted to the suitability of different methods in varying situations (/). The 3 main methods of graduation used in actuarial practice are: 1) The graphic method in which a hypothetical curve is drawn by inspection through the area bounded by the confidence intervals of the observed rates. Clearly, and perhaps rightly, an element of subjective judgment is present because use is made of actuarial experience as to what a curve of deaths should look like. The method is technically more difficult for one to apply than it may seem to be; it requires preliminary grouping of ages to eliminate large irregularities and thus gives an opportunity for prejudgment; a high degree of smoothing is not achieved so that it is unlikely that tables based on extensive data which are already fairly smooth will be improved. 2) Summation or adjusted average formulas which depend on the principle that the standard error of the weighted mean of 2 or more independent (or imperfectly correlated) random errors is less than the sum of the correspondingly weighted individual standard errors. This is the well-known process of moving averages used exten- sively in time series analysis. These methods were intro- duced at a time (as early as 1823 in the United Kingdom) when the available calculating equipment was limited; even multiplications were basically additions (as in the early arithometer) and, consequently, it was computationally highly advantageous for one to use more additions and fewer multiplications. The method is purely mechanical and does not require a highly skilled operator as does the graphic method. However, there is no allowance for individual judgment. Thus allowance for the retention of an irregularity which might be thought to be an essential feature of the experience is impossible. The method is only satisfac- tory when the ungraduated rates already progress fairly smoothly, i.e., for large experiences. A further disadvantage is that the rates at the ends of the age range covered have to be graduated by another method. Recent work by Greville (2) dealt with this difficulty. The development of non- parametric methods, e.g., that of Kernel, allows for the retention of particular irregularities. 3) Curve-fitting methods are based on the assumption that the underlying values have a particular mathematical form, the parameters of which may be estimated from the observed values. Use of the graphic or summation formula methods requires testing both for smoothness and adher- ence to data; a mathematical curve is smooth and only requires testing for its error-reducing quality. The earliest example is that of Gompertz (3) who produced u.= B-C", though the purpose of this formula was not solely that of graduation. Makeham (4) added a constant to take care of the level of mortality from chance as distinct from deterioration of the body, giving u,=A + B-C". In recent times, the curves proposed have become more complicated because most mortality experiences are not of pure cohorts but of mixed generations, and the age progression of rates reflects the historical (generation and secular) changes in social and economic conditions affecting mortality. In 1932 in the United Kingdom, Perks (5) produced some new formulas which at the time represented (there was a bulge at middle ages in the otherwise logistic rise in u,) the most promising attempt to fit a single curve to the whole range of the table. His principal formulas were: um=(A4A + B-CH/(1 + D-C") and u=(A+ BCH/(KC*+ 1+ DC). 2) For English Life Table No. 11 (1951), the then Govern- ment Actuary used a combination of a logistic and a symmetrical normal curve involving the estimation of 7 parameters. Heligman and Pollard (6) have covered the entire life-span with: q./p.=AYTB + D exp [-E(log, — logp)?] + G-H'. 3) They attempted to cover 3 distinct features: the mortality of a child adapting to his or her new environment, the mortality associated with the aging of the body, and the superimposed accident mortality. It is suggested that A (almost the same as g,-C) measures the rate of decline in mortality in early life, G indicates the level of senescent mortality, and H measures the rate of increase in that mortality, D represents the intensity of the accident hump, whereas F indicates the location of the hump, and E its spread. More recently, spline functions (/) have become NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 UNDERLYING THEORY OF ACTUARIAL ANALYSES 9 popular. McCutcheon and Eilbeck (7) used this method to graduate English Life Table No. 13. LATER THEORETICAL DEVELOPMENT This activity has had a profound effect on the developing theory underlying actuarial analysis. Firstly, it has forced actuaries to pay much more attention to stochastic ele- ments in mortality experience. Secondly, it has made actuaries 1) think more about the nature of the age progression in mortality, about the biologic, social, eco- nomic, and even psychological factors affecting mortality; 2) question their earlier simple concept of homogeneity of risk within the observed population; 3) examine the statistical methodologic implications of that question; and, for a minority of us represented here at this meeting 4) to listen to what physicians have to say about disease, the fatality of disease, and the aging process. Finally, because of their increasing involvement with small experiences, actuaries are recognizing the need to achieve a meeting point with statisticians working in the field of biology and medicine. The biostatisticians may in turn gain some- thing from the practical rigors, and, occasionally even rarely, the imagination of the actuary. Theoretical development has also led inevitably to a switch away from rate interval exposure as such to a total length of exposure and to the concept of the length of life, beginning in the United Kingdom with Phillips in 1935 and continued by Clarke (8) and Redington (9) and later by me (10). Thus the cooperative efforts of various disciplines with biology have become known as survival analysis. The basic concept here is still actuarial and that is why the Kaplan and Meier (/7) generalization (and limiting case of the actuarial estimate) has had such appeal to actuaries, especially to those concerned with small experiences with relatively large in-and-out movements [see also Greville (2) and Brillinger (/2)]. The Kaplan and Meier generalization may be explained briefly. Suppose that we have random samples of subjects of size N (number), then the product limit estimate of P(r), the probability of surviving ¢ years is defined in the following way: List and label the N observed lifetimes (whether due to death or loss) in order of increasing magnitude (0 < t{ < 5 .... < tk). Then P(t)=11, [(N—r)/(N—rt1)], where r assumes those values for which #; < r and for which ¢; measures the time to death. This estimate is the distribution which maximizes the likelihood of the observation. Multiple Decrements: Homogeneity This movement also received impetus from the develop- ment of the theoretical basis of multiple decrement tables that actuaries had to begin to use early in the history of their profession, first in life insurance with the competing risks of death and lapses and later in pension fund management with not only competing risks of death but also retirement of two kinds, voluntary and ill health. In relation to mortality itself, this movement led to the closer examination of the causation of death and the competing risks of different diseases. Furthermore, it led to considera- tion of the health statuses within populations of those who had passed from disease free to other statuses of curable SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES and incurable disease. The whole question of homogeneity of populations in relation to mortality risk was laid wide open. As a particular example, one may cite the work of Beard (/3) who explored the relationship between the trends in the incidence of smoking and lung cancer mortality. Beard represented the age- and sex-specific mortality for cancer of the lung by the formula: pl =®(T—- x) x (1) fix), 4) where T represents the calendar year of experience, x the attained age, and the functions ®, x, and f are defined by their numerical values (initially avoiding the introduction of hypotheses). The broad rationale of the assumed form for u7, is that the mortality is a function of the year of birth (perhaps representative of genetic factors), a function of the year of experience (representing environmental conditions), and a function of age (representing the constitutional “resistance” of the individual). Beard next assumed that & could be looked upon as measuring the proportion of people born who will ulti- mately become “susceptible” and then equated “sus- ceptible” to the proportion of cigarette smokers. Then x, being the relative weight for the calendar year of experience, could be looked upon as a measure of the average consumption of cigarettes per smoker in the various years. This provides: S Pl ¢ (T— x), 5) X as a measure of the number of smokers in population P at time 7, and SPI p(T) x (1) 6) X as a measure of the relative consumption of cigarettes at time 7. If the mortality was in fact related to the cigarettes smoked some time previously, then equation 6 would be related to the relative consumption of cigarettes at this earlier time. After the experiment, it was hypothesized that the lag was 10-15 years. The eventual fit of the model to the national rates suggested that the mortality from lung cancer was independent of sex and directly dependent on the quantity of cigarettes smoked over a period of 15 years and with a pronounced increase in risk with age. This immediately provided an explanation of the discrepancy in trends of cancer mortality between males and females. Osmond et al. (1/4) have now developed a formal procedure for fitting age-period-cohort models. Beard (/3) went further and examined the possibility that f(x) might be described by some form of a stochastic process. On the assumption that the survivors 1, of a cohort born x years earlier can be stratified according to a function n, such that there is probability (pn) that a unit of n is lost in the interval dt, he developed a pure death process: dl? dt =—pnlt+pn+1) 17"! I7=1). 7) If the initial value of nis r, and death is assumed to occur 10 BENJAMIN when n=a, it can be shown that the deaths at time ¢ are: wh=pa(j)ere [1 = err, 8) and d _ 1 dus _ pr — a) dr nfpy= a Empat tm Because (1/u)du — up can be found from f{x), this last relationship provides a convenient method for one to judge whether the process is reasonable for the data. The relationship shows that the function is infinite at 7=0. The actual values derived from f{x) increased from the earliest age (22) to a maximum at x = 35 and then.declined steadily, so that at first sight the process appeared unsuitable. However, if a number of deaths were not associated with smoking, this could be the expected pattern. Beard refers to work that suggests that the adenoma type of cancer is not associated with smoking. The model did support the suggestion that the role of smoking was that of an agent which persistently damaged the tissues in such a way that when the damage reached a certain critical stage a cancer could develop rapidly. Stochastic Implications: Sample Variances As we have seen, the rationale of graduation has been the elimination of random observational error. What are these sample variances? If D, (the deaths in the age interval x to x + 1) is a binomial random variable, then g, (the estimate of g.) is a binomial proportion, and the sample variance of q. (which is also the variance of p,=1— q,) is: <= mx(l —a.m,) “pl + —agm] 9 where P, is the population on which m, is calculated, i.e., m,=D,/P. and a, is the average proportion of the year lived in the year of death. It can also be shown that the variance of the proportion surviving from age x to age (x + 1) is: r=t-1 Fp=hY YX Pu) Sho, I) r=0 and the variance of the expectation of life e, is, as Chiang (15) indicates: Wx 5:= 2 pleat -an)¥ Shor 12) r=0 Most mortality investigations which form the basis of actuarial calculations are large, and the sampling variances are of little practical consequence. However, mortality tables are prepared for groups with particular medical impairments that may be based upon much smaller numbers. For these, more serious attention may be required not only to the sampling variance but also to the stochastic aspects of the underlying model of transitional states. FOLLOW-UP STUDIES A particular type of experience usually involving small numbers arises when a group of persons, who are suffering from a particular disease and are treated in a particular way, are observed for a period so that the effect of treatment in regard to improved survival can be evaluated. The method is the classical approach to the assessment of the efficacy of treatment for diseases, such as cancer or tuberculosis, which, if untreated, normally take several months to lead to death rather than of acute diseases which kill rapidly in the absence of treatment. The statistic usually calculated is ,p,, where tis commonly 2, 5, 8, 10 and may be calculated either from diagnosis or treatment, for groups homogeneous with regard to type and stage of disease, type of treatment (kind of drug, surgical process, etc.), sex, and age (when numbers are small, use of broad age groups may be necessary). The figures presented in table 1 relate to a typical small series of patients treated by surgery for cancer of the thyroid. In this series, the numbers are so small that males and females of all ages have been combined. The figures are as calculated from observed deaths without graduation or other correction. It is obvious that the sampling errors here are large. What do actuaries do and what should they do? It is not true to suggest as does Bartholomew (16) that actuaries do not deal with the competing risks of mortality from causes other than the disease, the treatment of which is being assessed, nor is loss from observation from causes other than death ignored. Even nonactuarial medical workers normally make rough corrections for both these factors. It is true that actuaries do not normally see this situation as calling for a stochastic model incorporating a continuous time, i.e., Markov process. The theoretical basis of this particular application has been developed by a number of statisticians, but it does not seem to have occupied the serious attention of actuaries. In other types of insurance, such models are commonly used by actuaries. In the model commonly developed, the population under examination is distributed among a number of states, e.g., suffering from cancer, death from cancer, recovery, death from causes other than cancer, and lost from observation. A certain number of transitions are allowed: r;, where the movement is from class i to class j; e.g., ri; is allowable. Other transitions clearly would not be allowed, e.g., r2;. We denote by p,; (1)(i, j=1,2 .... k) the probability that an individual in state i at zero time is in state j at time ¢, where t>0. Pij (0)=0, if i, 13) =1ifi=j. We distinguish and denote by r(1) dr the infinitesimal transition probability (in actuarial terminology, this is the TABLE |.— Survival of patients with cancer of the thyroid who were treated with surgery, by cell type Proportion surviving to end of yr: No. of Cell type 2 P patients 3 5 8 10 15 Anaplastic 30 023 023 023 023 0.19 Papillary 33 094 091 0.88 0.82 0.69 Follicular 32 094 0.81 081 075 0.75 NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 UNDERLYING THEORY OF ACTUARIAL ANALYSES 11 force of mortality if the transition is simply from life to death). The general equation for the system is then: k k pit + 61)=py(t) |1— bt 3 nt) | — 3 pin(t)ra(2)8t. 14) = h=1 h#j h#j Subtracting P(r) from both sides, dividing by 6r, and letting 8¢ tend to zero, leads to: dpi(t 0 = 2pin(t) ru; (1), 15) where i©j=1,2.... k, and the summations are from h=1 to h=k, h#j. The solution to give nt), the expected number in state i at time ¢, is lengthy but straightforward, if matrix representation is used. The model has the advantage that, if death from other causes is eliminated, n(¢), as a proportion, does approach unity as ¢ tends to infinity. Moreover, compared with values provided by the model, the normal actuarial method tends to overstate n,(t) but by amounts that are small compared with errors involved in estimating the underlying transition rates from restricted data. Most comparisons are unfair to the normal actuarial method, which merely deals with the competing risk g, of death from other causes by dividing the probability of death from cancer gq. by (1 — '%q,) to produce an independent cancer death rate without eliminating death from other causes. The stochastic model eliminates altogether the transition to “death from other causes” by equating the rate to zero. The other advantage of the model is that it permits computer simulation of the consequences of different assumptions as to transition probabilities. On the other hand, it is complicated and adds no information. One has to bear in mind that decisions about treatment are not likely to be made on small changes in 7(¢), unless, of course, it is a question of substituting a treatment, which, although no more efficacious, is less uncomfortable in any sense of that word. Clinicians usually and wisely take the view that there is little point in abandoning a well-practiced regimen for a marginal improvement in the expectation of survival especially if, as sometimes happens, the new regimen is more, not less, comfortable for the patient. The important advances, such as the introduction of strepto- mycin for the treatment of tuberculous meningitis which increased the chance of survival from almost 0 to almost 100%, need little probability theory to commend them. If statistical theory needs to be extended to demonstrate an improvement in treatment, that improvement is unim- portant. SURVIVAL ANALYSIS Another stochastic aspect must be considered. In the usual application of actuarial methods to the assessment of the treatment of a slowly progressing disease, a cohort of treated patients may necessarily be kept under observation for several years before a firm evaluation of survival prospects can be made. Similarly, if construction of a stochastic model of the type described in this paper is attempted, observational data over an extended period are SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES required to establish the transitional probabilities. Ob- viously, reliable estimates would be desirable (i.e., with relatively narrow confidence limits) of the likely outcome, without the wait for several years to elapse. The problem, explored by Boag (17), is as follows: Given a cohort of treated patients (homogeneous in respect of known differ- entials such as sex, age, type of disease, and treatment) and the distribution of deaths over a short initial period, can the whole distribution be projected? Another way of proposing the problem is for one to ask for a system of recording results that would reduce to a minimum the period after which the outcome could be assessed at a given level of confidence. This takes us into the field of sequential analysis, a different field from both statistical and clinical standpoints, which is outside the scope of the present discussion. Boag (I7) showed that for most types of cancer, the distribution of deaths over time since treatment was log normal. [A large number of durational risks, e.g., waste from employment in particular establishments or failure of components in machinery (reliability curves), do generate log normal distributions or exponential distributions, which are close approximations to a log normal distribu- tion.] This being so, one can estimate the parameters of distribution at a fairly early stage. Armitage (18), following Littell (/9) but with sequential analysis particularly in mind, examined the relative effici- encies of various methods of comparing survival time distributions when these are exponential and individuals enter the study at a uniform rate (2n)T during the interval (0-7), with the analysis taking place at time 7. He considered the asymptotic efficiency as n — oo. If the survival time distributions are exponential, then the prob- abilities of death before time ¢ are: Fi=1—¢eN, 16) Fz=1 = eM, where the subscripts refer to the 2 different treatment groups A and Band A — A =6A. Armitage (18) considers the asymptotic relative effici- ency of the different methods for testing the difference between A and A’ as A — 0 and n — oo. Each test yields, asymptotically, an expected normal deviate x, which can be expressed in the form x =(8A/A)n”[¥(AT)]”. He then calculates the efficiency index ¥ (AT) =x (A/SA)n 17) The asymptotic relative efficiency is the ratio of the 2 values of W(AT) and is itself a function of AT. Armitage considers 4 methods of estimation: 1) The maximum likelihood solution is: If d deaths have occurred at periods ¢;, the n — d survivors at time 7 have been observed for periods 7; (where j=1,2,....d and i=1,2....n-d). The log likelihood L=— NST, + dlog A — AX), 18) SL d ra XT + x 34, 12 BENJAMIN and the maximum likelihood estimate of A is given by 1 XT +3 A d 2) The sign method is based on the nonparametric “sign test” of Dixon and Mood (20): Patients enter the trial in pairs at a rate of n/ T pairs per unit time; members of the pair are allocated at random to each treatment group. For each pair, we observe which of 2 survival times (r) is greater. A and B are the 2 treatments: 14 > fz is an 4 preference; t4 < tp is a B preference; 14=15 is “no preference.” If 1, and tp have differentiable distribution functions: Fi(O)=EPiUa=1) and Fy(t)= P(1p=1), with fa(r)= Fat) 19) and Sa(1)= Fp(1). This implies P(14=15)=0. The probability that a preference occurring at time fis an A4 preference: _ Sa) [I=Fa0)] f(r) [1=F3(1)] A necessary and sufficient condition that 6 should be independent of f, so that the preferences form a random binomial sequence with constant probability parameter, is that + f3(1) [1=Fa(1)]. Sat) [1 —Fs()]/fs(t)[1 — Fa(1)] is constant, K; 20) where F4(t)=[1 — Fp(1)]X. One survival curve must be a constant power of the other. 3) Direct comparison of proportions of survivors involves the simple comparison of the proportions of survivors at time x after treatment. Only those patients who entered in the interval (0-7—x) can be used. These number m=n/(1—x/T) on each treatment. The efficiency index is: w= x (Ax)? ex 2 T 200 —er) ) 4) In the actuarial method, i.e., the actuarial approach of constructing a life table for each treatment group, every patient under observation is included and contributes appropriately to the total exposure to risk. Armitage used a measure called the product limit estimate of the proportion of survivors at time x defined as follows (1/7): Arrange the n observations for any | treatment in order of increasing exposure (to death or termination of observation) and denote these by 0 Syr >4yr >15yr 22 3.3 3.3 0.7 7.3 48 1.1 32 4.2 3.7 0.8 8.4 49 1.2 42 4.9 4.7 1.4 9.8 6.2 27 52 7.9 9.7 2.6 144 125 75 62 17.1 223 5.9 28.5 29.1 19.7 72 - = 657 723 47.5 a Dashes indicate absence of published data. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 EARLY MORTALITY FOLLOW-UP STUDIES 17 TABLE 2.— Limited extract of “Exposures and Deaths” TABLE 3.— Limited extract of “Actual and Table Deaths” table of SMI“ Insurance yr Age at | 2 3 entry, yr Expo ge Expo- Deaths EXPO Deaths sures sures sures 39 463 4 347 5 254 3 40 461 6 346 2 263 3 41 403 4 298 1 229 5 42 416 3 323 239 1 29-42 6917 44 5,185 22 4,528 26 43 376 4 281 2 210 4 15-70 13,467 100 10,015 60 8,673 73 4 Class 46 (“Have Had Asthma”) was selected as an example. into the asthma category. The large volume of the experience in the SMI is evidenced by a total exposure of 87,162 policy-years for the asthma category alone. The next large table in the SMI presents observed and “table” deaths (standard expected deaths) by 30 individual years of duration. Aside from the total, only 4 age groups are used: 15-28, 29-42, 43-56, and 57-70 years. In the extract shown in table 3, the expected deaths in the first 5 years have not been adjusted with the select rate factors. The final section of tables in the SMI consists of summa- tions of observed and table deaths for each of the age groups mentioned above and all ages combined, with data for 25 of the impairment categories on a single page. One set of these tables presents total deaths for durations 1-30 years combined, with expected deaths in the first 5 years calculated from select mortality rates. The second set of tables restricts the totals to durations 6-30 years, thus excluding the select period. No mortality ratios were included in the detailed data of the SMI. I have summarized the asthma experience in table 4, a summary that includes mortality ratios and also numbers of policies and exposures. The overall mortality ratio of 102%? is probably underestimated because accurate mortality rates reflecting the standard experience from 1870 to 1899 had not been derived. The mortality ratio exceeded 100% in only 1 age group, 43-56 years, and the 3 In the life insurance literature, mortality ratio is always given as a percentage. table of SMI" Age at entry, yr tren ance 43-56 15-70 Actual Table” Actual Table” 1 40 39.8 100 132.3 2 19 31.0 60 101.6 3 30 28.1 73 90.7 4 29 25.0 78 79.8 5 23 23.5 49 73.4 6 30 22.1 63 67.0 “ See footnote a, table 2. " Table deaths not yet adjusted for select rates, yr 1-5. Summations, actual and expected deaths, all yr 1-30 and 6-30, are given in separate tables. duration pattern in this age group (upper part of table) should be noted. ASTHMA MORTALITY IN LATER INSURANCE STUDIES To provide a comparison of the asthma mortality found in this early SMI study, 1 have summarized overall mortality, all ages and durations combined, in the MAMI and 4 later studies [(6-8); New York Life Insurance Company: Unpublished report]. These results are given in table 5, with substandard and standard experience shown separately because this separation has been the practice in all intercompany studies subsequent to the MAMI. The range of the mortality ratio has been from 102 to 125% in the standard experience, but with an even more favorable mortality of 89% in the most recent New York Life study, which, however, has the smallest exposure. As seen in the right-hand column of the table, the excess death rate ranged from —0.3 to 2.0 extra deaths per 1,000 per year. Although a small degree of excess mortality is observed in the standard experience, this has been within the limits considered acceptable for standard issue by most insurance companies. The substandard experience, on the other hand, has shown a significant excess mortality, with a mortality ratio above 200% and an excess death rate above 6 extra deaths per 1,000 per year in the 1929 and 1938 studies, with lower but still abnormal values in the 2 most recent studies. The TABLE 4.— SMI Study: Summary of asthma mortality No. of deaths Entry Duration of No. of Exposure Mortality age, yr insurance, yr policies policy-yr Ohserved Expected ratio, % 43-56 1-5 3,214 10,980 141 114.9 123 43-56 6-15 1,395 8,270 179 167.7 107 43-56 16-30 332 1,995 120 88.9 135 15-28 1-30 2,962 17,717 87 124.7 70 29-42 1-30 6,917 45,782 412 424.9 97 43-56 1-30 3.214 21,245 440 371.5 118 57-70 1-30 374 2,418 89 91.5 97 15-70 1-30 13,467 87,162 1,028 1,012.6 102 SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES 18 SINGER TABLE 5.— Overall asthma mortality and various insurance mortality studies, published 1903-75 No. of deaths Excess mortality” , Observation Duration, Exposure ] Difference Policy type $ ; . yp Study period yr policy-yr Observed Expected Rane, z 1,000 PD" (DDYE Standard issue SMI 1870-90 1-30 87,162 1,028 1,012.6 102 +0.2 MAMI 1885-1908 1-24 18,094 176 140.3 125 +2.0 Impairment Study, 1929 1909-27 1-19 35,383 212 189.0 112 +0.6 Impairment Study, 1938 1925-35 1-12 24,158 147 117.4 125 +1.2 Impairment Study, 1951 1935-49 1-25 53,714 135 126.9 106 +0.2 New York Life Insur- 1954-70 1-28 5,327 10 11.4 89 —-0.3 ance Co. experience (unpublished) Substandard Impairment Study, 1929 1909-27 1-19 20,295 238 102.3 233 +6.7 issue (extra Impairment Study, 1938 1925-35 1-12 15,740 181 77.8 233 +6.6 premium) Impairment Study, 1951 1935-49 1-25 46,853 152 99.1 153 +1.2 New York Life Ins. Co. 1954-70 1-28 12,560 49 24.9 197 +1.9 “ Mortality ratio is calculated from the No. of deaths observed/No. of deaths expected X 100. The difference is calculated from the No. of deaths observed (D) — No. of deaths expected (D’)/exposure to risk in policy-years (E) X 1,000. underwriting classification and the charging of an extra premium have, in general, been justified. MORTALITY EXPERIENCE OF OVERWEIGHT PERSONS A remarkable study of the mortality experience of a single company is that of the complete records of the New England Mutual Life Insurance Company from 1844, the first full year of operation, to 1905. The study was initiated in 1903 by the Medical Director, Dr. Edwin W. Dwight. The company report consists of 642 charts of observed mortality rates by 5-year age groups, with each chart giving the experience for a single category of all policies issued during 60 years of operation coded for the characteristic under study (e.g., build, occupation, family and medical histories, residence, and type of application). No publica- tion was ever made of the complete results or a digest of them, but, in 1914, Dr. Dwight read a paper at the annual meeting of the Association of Life Insurance Medical Directors of America in which he used the overweight mortality charts to illustrate the thesis of his paper (9). | have selected | of the charts, for the class of policyholders with 40% overweight, to show how the results were presented (fig. 1). The vertical scale gives mortality rates per 1,000, all policy durations combined, by 5-year entry age groups on the horizontal scale. The irregular line represents the mortality rates calculated for the class under study, and the smooth solid line represents mortality by the American Experience Table, widely used by insurance companies for calculating reserves but not representative of the actual mortality experienced even before 1900. No definition is given for the smooth, broken-line curve either in the article or in the volumes containing the original charts. However, it can be assumed that the actuary working with Dr. Dwight used this to show the aggregate company mortality experience over the entire period from 1844 to 1905. Virtually all of the issues were on a standard basis, but it is not known whether any of the coded classes were excluded or not. Some of the codes were for normal characteristics. For example, the class “Normal Weight” contained 19,096 cards, with an exposure of 136,957 policy-years and a mortality ratio of 76.0%, and the observed mortality curve was close to the “expected” curve of company experience. Because of the close correspondence of the 2 curves, one can deduce that the standard level of the New England Mutual’s overall mortality was about 76% of the American Experience Table, with the rates weighted according to the attained age distribution of the total exposure. In contrast, for the 409% overweight class there is an irregular random distribution of mortality rates about the smooth curves to age 40-44 years, but at age 45 and up, excess mortality is observed in all age groups. The summary information on the chart shows that this was a small class, with only 356 cards, 2,762 policy-years of exposure, and a mortality ratio of 147%. Unfortunately, numbers of deaths are 75 (>65) 65 Average Mortality — 147.3 % 7 No.of Cards — 356 { 55k No.of Exposures — 2762 ! MORTALITY RATE PER [,000 AGE FIGURE 1.—Mortality by attained age (yr), policyholders 40% overweight. New England Mutual Life Insurance Co. experi- ence, 1844-1905. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 EARLY MORTALITY FOLLOW-UP STUDIES 19 not given on the charts, which otherwise provide a most useful summary of comparative mortality by class and attained age. Dr. Harold S. Diehl, long a staff member of the American Cancer Society, wrote to me when he was writing his book “Tobacco and Your Health,” and, from Dr. Dwight’s charts (10), I was able to derive results showing that, in relation to those who never used tobacco, mortality was 25% higher in those who never used tobacco, 45% in temperate users, and 61% in moderate users. The results quoted in (1/0) may constitute one of the earliest studies of mortality in relation to the use of tobacco. NORTHWESTERN MUTUAL STUDY OF MORTALITY DUE TO HYPERTENSION At the Twenty-second Annual Meeting of the Association of Life Insurance Medical Directors, Dr. J. W. Fisher presented what must be the earliest mortality results on subjects with hypertension (/7). In 1907, Dr. Fisher, as Medical Director of the Northwestern Mutual, began to require blood pressure readings as part of the insurance examination at the home office in Milwaukee and in several other large cities. Average systolic readings are presented by age, and mortality on applicants examined 1907-11 was followed to 1912. Even in such a brief follow-up, excess mortality was observed with systolic pressures of 150 mm or higher when allowance is made for the lower select mortality that prevailed in this study (expected rates are described only as being derived from the “Actuary’s Table”). From the text and several tables in the article, I have assembled the mortality data in table 6. It is clear that rates in the Actuary’s Table must be extremely conservative, as mortality ratios are not high, even with a select adjustment factor of 0.5. However, a special follow-up was successfully conducted on the rejected applicants with systolic pressures of 170 mm or more; their adjusted mortality ratio was over 300%. This is the early forerunner of several exhaustive intercompany studies of mortality in relation to blood pressure (6, 12-14). Dr. Fisher clearly recognized the importance of the sphygmomanometer not only for the insurance examiner but also for all medical practitioners. a TABLE 6.— Mortality experience, applicants with hypertension Systolic \., of Estimated __ NO. of deaths Adjusted pressure, li mortality mmHg? Ives €Xposure Actual Expected’ ratio, 9 140-149 2,668 5,576 31 81.9(41.0) 76 150 and up ~~ 525 1,097 12 22.2(11.1) 108 Average: 722 1,400 32 20.6(10.3) 311 171 “ Study was conducted by the Northwestern Mutual Life Insurance Co., 1907-11. " Applicants with systolic pressures of 140-149 and 150 mm and up were accepted; those with an average of 171 mm were rejected. “ Expected deaths in parentheses have been adjusted for effects of selection. POST-SANITORIUM MORTALITY IN PATIENTS WITH TUBERCULOSIS In 1908, a remarkable article was published by Brown and Pope (15), describing in detail a follow-up study of patients with pulmonary tuberculosis, who had been discharged from the Adirondack Cottage Sanitarium (later renamed the Trudeau Sanitarium), over the first 20 years from its opening in 1885. Despite such an early date, the authors nevertheless included all the essential elements of a modern follow-up study: clearly defined categories of subjects, life table arrangement of data, calculations of annual mortality rates and cumulative survival rates, and comparisons of these rates with those of a chosen standard. The presentation of data is unusually complete; both tabular and graphic methods are used. A sample of the basic life table data has been extracted and reproduced (with allowance for monograph format) in table 7. The captions for the column headings have been copied as closely as possible to those given in the authors’ (15) table, but data have been omitted for the follow-up years 5 through 21, and for the third severity category, those of patients whose tuberculosis was considered to be arrested at the time of discharge. Note that 114 deaths were determined of those with active disease but without complete information as to the time of death. After due consideration, the authors (1/5) assigned these to follow-up years in proportion to the deaths of known date in each follow-up year. These adjustment calculations I have TABLE 7.— Follow-up of patients with tuberculosis at Adirondack Cottage Sanitarium, 1885-1905 Condition at discharge” Time Apparently cured Active discharge, Known living Not heard Died Known living Not heard Died yr at beginning from sub- during at beginning from sub- during of yr sequently yr of yr sequently yr 0-1 519 57 8 835 137 219 1-2 445 43 4 365 24 97 2-3 398 36 7 244 16 40 3-4 355 44 6 188 28 28 45 305 49 7 132 23 12 Unknown dates 9 114 Total 460 59 290 545 “ Data on patients whose tuberculosis was arrested were omitted, as were those of follow-up years 5 to 21. SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES 20 SINGER TABLE 8.— Life table data: Patients with active pulmonary tuberculosis discharged from sanitarium, 1885-1905" No. of deaths Adjusted No. Withdrawn Time ive at alive Adjusted No. after Date Date beginni f dus exposed to discharge, pe Hot Adjusted eginning o during ny yr known interval interval (nH (2) B)=M+®2 4) (5) ©)=4)- (5) 0-1 219 57.9 276.9 835 137 698 1-2 97 25.7 122.7 421.1 24 397.1 2-3 40 10.6 50.6 274.4 16 258.4 34 28 7.4 354 207.8 28 179.8 4-5 12 32 15.2 144 4 23 121.4 “ Numbers in parentheses refer to column number; see text. illustrated in table 8, using the authors’ explanatory text and a less complete table from the published article, one that gives only columns 3, 4, and 5. The known deaths in column 1 have been taken from the first table, and the deaths in column 2 have been derived in accordance with the sample calculation in the text. An adjustment process of this sort is necessary for one to account for all the deaths and to provide for a reasonable estimate of the mortality and survival rates. The adjusted number exposed to risk, as used by the authors, is given in column 6 for patients with active disease, up to 5 full years of duration. Note that the full number withdrawn alive is subtracted from the adjusted number alive at the beginning of each year. The actuarial convention generally used is subtraction of only one-half of these withdrawals, on the assumption that, on the average, they do contribute to this extent to the exposure. The more conservative calculation used by the authors produces a slightly higher estimate of the annual mortality rate and slightly lower survival rates. Although Brown and Pope (15) also give tables of annual mortality and cumulative survival rates, I have chosen to show only the graphs. Figure 2 gives the survival curves for the 3 main groups of patients: the apparently cured (11), the apparently arrested (111), and those discharged with active disease (IV). The comparative curve (I) is for a cohort matched by age and sex with the patients at the time of discharge, with rates taken from Farr’s English Life Table No. 3. These survival curves demonstrate the wide dif- ferences in mortality in the 3 groups of patients with 1234567 89I10I1121314151617 181920 1000 ru TTT TTT TTT TTT 900 f SN ~~. 1 800 | MN rr 1 3 A ———— oo — ce sr an. 700% “se t N 600 I ~~ 500 Nn 400 | uy 300 Naty 2001 Tee \ 100 JL \ 0 Numbers surviving at the end of each year FIGURE 2. Survival curves of 3 categories of patients with pulmonary tuberculosis who were discharged from the Adi- rondack Cottage Sanitarium, 1885-1905. I = general popula- tion; Il = discharged apparently cured; Ill = discharged “arrested”; IV = discharged with active disease. tuberculosis. Even the best group (lI), those considered “cured” clinically at the time of discharge, showed a survival curve definitely lower than that estimated for the general population, although the 2 curves tended to converge after about 10 years. Mortality differences among the 3 groups are even more dramatically shown in the curves of annual mortality rates for the first 10 years of duration, which I have redrawn from the graphs in the article and reproduced in figure 3. An extremely high rate is seen in the first year in the active cases. Although this falls steeply in the next few years, the 400 0——O0 Active Cases ~ ®—@ Arrested Cases lL ¥—X Apparently Cured Cases 300} P——P General Population 200 ANNUAL MORTALITY RATE PER 1,000 1 2 3 4 5 6 7 8 9 10 YEARS AFTER DISCHARGE FIGURE 3.—Mortality rates of 3 categories of patients with pulmonary tuberculosis who were discharged from the Adirondack Cottage Sanitarium, 1889-1905. Fitted parabolic curves are of the fourth order. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 EARLY MORTALITY FOLLOW-UP STUDIES 21 curve has a complex form. The authors (/5) tried to fit curves to the points of the observed rates, using parabolic equations of the fourth order. The rate for arrested cases started at about 40 deaths/ 1,000 in the first year, rose to a maximum over 120/1,000 at 3 years, then decreased through the rest of the 10-year period shown. Mortality in the apparently cured patients started under 20/1,000 in the first 2 years after discharge, rose slowly to a rate of the order of 40/1,000 at the eighth year, and then decreased to a level below that in Farr’s Table, according to the path of the survival curve after 10 years. It would be hard for one to imagine 3 more distinctive patterns of annual mortality variation with duration than the ones in this extremely interesting figure taken from the article by Brown and Pope. DISCUSSION The basic principles of actuarial methodology have changed little since publication of the “Specialized Mortality Investigation” in 1903 (4). The format of life tables is similar, and there is the same emphasis on the necessity of a standard of expected mortality with which one can compare observed mortality, generally as a mortality ratio of observed-to-expected deaths. The selec- tion process was recognized as resulting in a lower mortality in the select period, the earlier as compared with the later years of policyholder experience. However, actuaries have since developed more detailed and accurate tables of select and ultimate mortality. Methods of proces- sing the data have changed from a completely manual process in the SMI to the introduction of punch cards in the MAMI, in which Dr. Hollerith served as a consultant, to the handling of millions of punch cards (described as enough to fill a box car) in the 1959 Build and Blood Pressure Study, to the electronic data processing of today. Despite the limitations of medical knowledge almost a century ago, actuaries, medical directors, and underwriters were able to demonstrate mortality differences in medical conditions, such as asthma and overweight and thus to develop the art of underwriting. It is extremely difficult for one to find examples of follow-up studies in the medical literature before 1920. The highly skillful study of tuberculous patients by Brown and Pope is one that I found in the reference list of a 1954 article by Bosworth and Alling (16). The senior author, SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES Lawrason Brown (1871-1937), was a notable physician, scientist, and medical figure of his day (17). REFERENCES (1) SINGER RB, LEVINSON L (eds): Medical Risks: Patterns of Mortality and Survival. Lexington, Mass.: Lexington Books, 1976 (2) FLINT A: A Treatise on the Principles and Practice of Medicine (Ist ed). Philadelphia: Henry C. Lea, 1866 (3) OSLER W: The Principles and Practice of Medicine (8th reprinting). New York: D. Appleton, 1918 (4) Actuarial Society of America: Specialized Mortality Investi- gation. New York: Actuar Soc Am, 1903 (5) Ad Hoc Committee: Medico-Actuarial Mortality Investiga- tion, vol I-V. New York: Actuar Soc Am, Assoc Life Insur Med Directors Am, 1912-1914 (6) Joint Committee: Medical Impairment Study (1929). New York: Actuar Soc Am, Assoc Life Insur Med Directors Am, 1931 (7) Joint Committee on Mortality: Impairment Study (1938). New York: Actuar Soc Am, Assoc Life Insur Med Directors Am, 1939 (8) Society of Actuaries: Impairment Study, 1951. New York: Soc Actuar, 1954 (9) DWIGHT EC: The value of small classes. Proc Assoc Life Insur Med Directors Am (1913-1915), pp 210-218 (10) DIEHL HS: Tobacco and Your Health: The Smoking Con- troversy. New York: McGraw-Hill, 1969, pp 38-39 (11) FisHER JW: The diagnostic value of the use of the sphyg- momanometer in examinations for life insurance. Proc Assoc Life Insur Med Directors Am (1908-1912), pp 393-406 (12) Joint Committee: Blood Pressure Study (1939). New York: Actuar Soc Am, Assoc Life Insur Med Directors Am, 1940 (13) Society of Actuaries: Build and Blood Pressure Study (1959), vol I, II. Chicago: Soc Actuar, 1960 (14) Ad Hoc Committee on the New Build and Blood Pressure Study: Blood Pressure Study (1979). Chicago: Soc Actuar, Assoc Life Insur Med Directors Am, 1980 (15) BROWN L, POPE EG: The ultimate test of the sanatorium treatment of pulmonary tuberculosis and its application to the results obtained in the Adirondack Cottage Sani- tarium. Z Tuberk 12:206-215, 1908 (16) BoswORTH EB, ALLING DW: The after-history of pulmo- nary tuberculosis. I. Methods of evaluation. Am Rev Tuberc 69:37-49, 1954 (17) Obituary, Lawrason Brown 1871-1937. Am Rev Tuberc 37:361-366, 1938 . Ee 5 l= | Early Studies of Tuberculosis ! George W. Comstock ? ABSTRACT —Cohort studies have been of great importance in the establishment of what is known about the epidemiology of tuberculosis. The individuals who conducted these studies pro- vided useful models for the application of life table, person-time, and cohort analyses to the study of diseases. These tuberculosis workers not only have shown that follow-up of large cohorts can be virtually complete, but, even more importantly for future cohort studies, they have also shown how cohort investigations can be done at minimal expense.—Natl Cancer Inst Monogr 67: 23-27, 1985. The application of life table analysis and its alter ego, person-time analysis, to the study of tuberculosis appears to have taken place shortly after the start of this century. That these analytic methods were applied to tuberculosis relatively early is not surprising. Its chronicity and the fact that it spared neither young nor old made it necessary for those studying the fate of tuberculosis patients to account in some way for variable periods of observation at all ages. The difficulties they had in obtaining appropriate control groups made it necessary for them to rely on standard life tables to obtain expected numbers of deaths and survivors. Koch’s demonstration of the tubercle bacillus in 1882 must also have influenced the timing. Not only did Koch’s discovery finally bring about the unification of apparently unrelated and disparate conditions into a single rubric, tuberculosis, but by 1900 bacteriologic examinations had been available long enough to make possible identification of cohorts of tuberculosis patients who had been accurately diagnosed and observed for a decade or more. However, another possible explanation for the early application of actuarial methods to tuberculosis is the paper published by Brown and Pope (/) in 1904. Although they did not apply life table methodology directly, they did use Farr’s English Life Table No. 3 to derive expected survivorship among discharged tuberculosis patients ac- cording to their sex, age, and length of observation. It seems likely that Pope, an actuary who was then a patient at the Sanatorium, was the one who called Brown’ attention to the use of standard life tables. Brown and Pope’s findings are summarized in table 1, which shows survivorship among discharged patients as a percentage of expected survivorship based on the English ABBREVIATION: BCG =bacillus Calmette-Gueérin. ! Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Training Center for Public Health Research, Washington County Health Department, Box 2067, Hagerstown, Maryland 21742. life tables. As one might expect, patients in the best condition on discharge fared the best subsequently. The findings also show the unremitting tendency of patients with tuberculosis to relapse in the days before antibiotics. Only among the apparently cured is there even a suggestion that the added risk of death imposed by this disease might diminish with time. Two additional comments seem warranted. Brown and Pope (2) were unable to trace nearly a quarter of their 1,542 discharged patients. After investigating 16 previously un- traced patients who were apparently cured on discharge, they concluded “that to assume that all the untraced were dead, would be very far from the truth. The most reasonable conclusion is, therefore, that the untraced are living and dead in practically the same proportions as the traced of the same age, sex, and class.” Brown and Pope also recognized that, for comparative purposes, it did not matter much which life table was used for calculating expected survivorship, and they implied that, as long as the life table was not based on a highly selected population, it would give reasonable estimates of absolute survivorship. Life table data were again used in 1910 to assess the benefits of sanatorium treatment by 2 actuaries, Elderton and Perry, working in Pearson’s Laboratory of Applied Mathematics at the University of London (3). Like Brown and Pope, who they called pioneers in this field, they compared observed numbers of deaths among discharged patients with the numbers expected on the basis of English Life Table No. 6 and found a similar gradient of excess deaths among the patients according to the status of their disease on discharge (table 2). In attempting to draw comparisons with 2 other series of discharged tuberculosis patients from earlier years, Elderton and Perry detected a fatal flaw. In both of these early series, the observation period started at the onset of symptoms, but information was gathered only from patients who survived from onset to sanatorium admission or discharge, thus all deaths during that interval were missed. Pearson noted two purposes in publishing the study: to stress the importance of collecting data on tuberculosis suitable for actuarial analysis, especially as it related to the evaluation of treatment as administered in a sanatorium, and to moderate the extreme dogmatism in some quarters regarding the value of such treatment. Despite its auspicious introduction by the prominent phthisiologist, Lawrason Brown, the life table method appears to have excited little if any interest among his fellow tuberculosis workers in America (4). Its reintroduc- tion to the study of tuberculosis came through a side door. Around 1930, Wade Hampton Frost, then Professor of Epidemiology at The Johns Hopkins University School of Hygiene and Public Health, became concerned with the 23 24 COMSTOCK TABLE 1.— Percent expected survivorship to 1903 of discharged sanitorium patients by discharge status’ Yr of Apparently Disease Disease admission cured arrested active 1899-1901 100 80 39 1896-98 89 67 28 1893-95 100 52 14 1890-92 79 24 7 “ See (2). problems of collection and analysis of data on chronic, recurrent diseases and selected tuberculosis as an appro- priate example to bridge the apparent gap between chronic and acute disease (5, 6). The population of his pilot study consisted of 132 black families in Kingsport, Tennessee; data were collected by his collaborators in the Tennessee State Department of Health (7). To avoid the long period of observation needed to study the future experience of a currently identified cohort, Frost hit upon the noncon- current prospective or historical cohort approach. In essence, this involved interviewing a family informant and recording 3 sets of data: the date of establishment of each household, a list of all persons currently in each household and pertinent study information, and a list of former members of each household with pertinent study informa- tion. After reconstructing the cohort backward in time, Frost wisely decided to check its mortality experience by a person-years analysis, comparing age-specific mortality rates with those of the black population of Tennessee. His study cohort was found to have unexpectedly low death rates in the age group 20-49 years. Frost eventually found that the “joker,” as Maxcy (5) calls it, was the fact that there had to be a living informant to provide information about a household, whereas this was not true for the general population. When an adult member, randomly selected from each household, was excluded from the denominators, the adult mortality rates in the study population closely approximated those for the state (5). Although Frost (7) gives credit to Elderton and Perry (3) and to Weinberg (8), another early user of the life table method, his slightly different presentation of data and his failure to recognize that his joker was analogous to those in the earlier studies discussed by Elderton and Perry make one wonder if Frost did not develop the person-time method independently and only later realized that their life table methods and those of Weinberg were essentially the same as his. This time the life table spark caught fire, almost entirely because of its use in various cohort studies by Frost’s students. Notable among them were Ruth Puffer of the Williamson County, Tennessee, Tuberculosis Study (9); Persis Putnam at the Henry Phipps Institute in Phila- delphia (10); and Miriam Brailey at the Harriet Lane Tuberculosis Clinic of The Johns Hopkins Hospital (11). Their work, in turn, stimulated still further applications of the life table and person-time methods to follow-up studies of tuberculosis (12-17). As the program for this Conference indicates and the title assigned to me implies, there is more to early cohort studies of tuberculosis than the analytic methods designed to take account of variable periods of observation. Tuberculosis workers have also been among the pioneers in many of these other types of cohort studies. One is the simulation of a cohort study with the use of age-specific mortality tables spanning a considerable period, i.e., cohort analysis. A pioneer in this area was a Norwegian, K. F. Andvord, who in 1930 published tuberculosis mortality data for the Scandinavian countries and for England that were re- arranged to reflect the experience of individuals as they passed through successive periods of age (/8). His cohort curves for Norway are shown in figure 1. Andvord was impressed by the similarity in the shapes of the cohort curves and believed that this characteristic, observed in each of the countries he studied, could be useful by making predictions of future mortality more accurate. Again, it appears to have been Frost (/9) who brought cohort analysis to the attention of epidemiologists. Figure 2, taken from Frost’s paper in 1939, shows tuberculosis death rates among Massachusetts males. Note that the cross-sectional curves for 1880 and 1930 both suggest a relatively constant death rate after age 30. When the data were rearranged by birth cohorts to simulate the experience of individuals as they passed through time and life, each cohort showed a pattern similar in shape to that shown here for the birth cohort of 1880, i.e., a peak in infancy and another in young adult life, with the rates diminishing thereafter with increasing age. Frost concluded that: 1) The consistency of the curves suggested changes in resistance related to age; 2) decreased infection rates in early life did not, as some feared, lead to more serious disease later in life; and 3) cross-sectional mortality data did not neces- sarily reflect the experience of persons as they passed through time and life. Andvord and Frost were lucky in selecting tuberculosis as the disease for their cohort analyses. Few other diseases show the cohort effect so impressively, for few diseases show changes in mortality with both calendar time and age that combine to produce such dramatic differences between cohort and cross-sectional mortality. For example, had Andvord and Frost selected mortality from cancer of the breast between 1940 and 1970, they would have found essentially no differences between the 2 approaches. TABLE 2.- Ratio of observed-to-expected deaths in 7 years after discharge from 2 sanatoria by discharge status Investigator Sex Apparuitly Arrested Active Reference cured Brown and Pope Both 3.3 15.8 45.4 (2) Elderton and Perry Males 2.1 4.3 29.2 3) Females 1.7 5.2 25.5 NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 EARLY STUDIES IN TUBERCULOSIS 25 a0 Death Rates Per 10,000 w le] 20}06 ii SE § oe ge-specific death : N rates, 1921-25 10 == N= Death rates for specified cohorts 0 i L L L + 0 5 10 15 20 25 30 Age In Years FIGURE 1. Average annual tuberculosis deaths per 10,000 for specified birth cohorts and for the period 1921-25, by age, Norway (/8). Cohort studies have often provided the epidemiologic bases for tuberculosis control. Notable among these studies are 3 from widely scattered areas of the world. Basic data for the developing countries were furnished by a longi- tudinal study of tuberculosis in the Tumkur district of South India (20). In addition to illustrating the relation- ships of the prevalence and incidence of tuberculosis infection and disease in an area with little in the way of tuberculosis control, the investigators reported the sur- prising findings that the risk of persons becoming infected was greater over the age of 35 than among younger persons and that, despite a high rate of new infections (1.6%/ yr), only 30% of the new cases developed among newly infected persons. The Danish Tuberculosis Index, based on an initial survey of children and young adults in all of Denmark except Copenhagen, yielded information on the groups in whom high risks might be expected in most of the industrialized nations where BCG vaccination is routinely used (27). In the United States, the Muscogee County Tuberculosis Study in Georgia reported on persons identi- fied in a private census and 2 community tuberculosis surveys (22, 23). The finding that uninfected persons had an extremely low risk of subsequently becoming infected and then developing tuberculosis had a major influence on tuberculosis control policies in this country, particularly because this finding indicated little need for vaccination and potentially greater benefits from treating infected persons. An investigation of tuberculosis among nurses who had been examined as students in 1943-49 led to the develop- ment of many of the modern methods of tracing persons in follow-up studies (24). Among 25,752 nurses in the study, only 9 could not be located in 1953-54, and only 40 failed to answer the mailed questionnaire, a record of response that will be surpassed with difficulty. Cohort studies have also done far more to identify risk factors for tuberculosis than case-control studies, a marked contrast to the situation with most other chronic diseases. Weinberg, whose major work was published as early as 1913, was the first of many to document the increased risk associated with the presence of a tuberculous member in the family (8). Using family registers and death certificates in Stuttgart, Germany, as his sources of information, he found that children in families in which a parent had died SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES of tuberculosis had twice the expected risk of dying of tuberculosis themselves. This risk was further increased if the deceased parent was the mother or if the family did not belong to the upper classes. The first major cohort study initiated by Frost and his students was the Williamson County Tuberculosis Study in Tennessee. In 1930, after reviewing the pilot study in Kingsport and the tuberculosis experience in the rest of Tennessee with Dr. E. L. Bishop (the State Commissioner of Public Health and one of his former students), Frost (6) stated that “The basic material for the study should be an unselected series of cases of tuberculosis which have come to the attention of the State Department of Health through official morbidity reports or through examinations made in the chest clinic.” Observations of tuberculosis patients and their families in Williamson County began in 1931 and continued until 1955. Some of the major findings are summarized in table 3, in which the original data have been adjusted by means of multiple binary regression so that the effects of other variables in the table are removed (25). Among persons who had the risk factor of living in households with an infectious case of tuberculosis, the findings suggested that being black, female, or poor conferred some additional risk. Peak risks were indicated for persons under the age of 5 years and during adolescence and early adult life, findings similar to those of several other researchers (8, 13, 15, 18, 19). However, each of these other studies suffered from various limitations: No observa- tions were made of patients more than 20 years old (8, 13); data were limited to general populations whose tuberculosis exposure and infection status were unknown (18, 19), or only small numbers of subjects were studied (//, 15). The Williamson County study (8) had its limitations as well, particularly the use of household exposure to indicate tuberculous infection and, despite long-term follow-up, numbers too small to allow the differences listed in table 3 to achieve the usual levels of statistical significance. Nevertheless, the consistent suggestions from several studies 80 4 T ! ales’ 72001 7] § T 1 per 100, 000 3 T ¥ 400 : 3 <0 - ~ -~ ~ g ~ {otiost af 1880 zoo So - -~, ~~ 1001 Year 1930 8] 1 1 1 1 06 2030 #0 20 60 Age in years FIGURE 2.—Massachusetts death rates from tuberculosis (all forms) by age in the years 1880 and 1930 and for the cohort of 1880 (79). Figure is reprinted with the permission of the publisher (5). 26 COMSTOCK of peak risks in infancy and around 20 years of age, buttressed by similar clinical observations (26), supported the reality of the phenomenon. Stronger support came from a large-scale cohort study in Puerto Rico (27). Established in the course of a controlled trial of BCG vaccination, the tuberculosis experience of 82,269 tuberculin reactors aged 1 to 18 years was followed for nearly 19 years thereafter. The results shown in figure 3 clearly confirm what Frost had found in his cohort analyses, i.e., that persons infected with tubercle bacilli experience 2 periods of peak risk, 1 in infancy and the other in late adolescence and early adult life. The advantages of the Puerto Rican cohort were that the numbers were sufficient to give statistical stability to the rates and that a purified tuberculin was used to establish with reasonable certainty that the children had been infected with tubercle bacilli. In today’s economic and political climate, economies in research become increasingly desirable. A number of large, long-term tuberculosis studies have been completed at minimal cost by considerable or total reliance on routinely collected data. The pioneers in this field were Sergent and associates (28) in Algiers. Subjects for a controlled trial of oral BCG vaccination were recruited from births routinely registered in Algiers from 1935 to 1947. Follow-up con- sisted simply of matching reported deaths of persons in the appropriate age groups to the list of registered subjects. The reduction in total mortality associated with vaccination was consistent with a marked reduction in tuberculosis. More relevant to modern research was the controlled trial of BCG vaccination in Puerto Rico by Palmer et al. (23). They found recording the tuberculosis experience of nearly 200,000 children during nearly 2 decades was possible with only a small clerical and statistical staff by matching routinely collected tuberculosis reports to the list of study participants. Their deliberate avoidance of any individual follow-up had the added advantage of virtually TABLE 3. Attack rates among contacts of sputum-positive tuberculous patients® oo Person- Adjusted Standard Characteristic I" rate/ 1,000 risk y person-yr ratio Age, yr 0-4 446 9.1 152 5-14 2.112 5.3 88 15-24 2.325 8.0 133 25-34 1,419 7.0 117 35+ 4.854 4.8 80 Total 11,156 6.0 100 Race White 7,376 5.1 85 Black 3,780 7.8 130 Sex Male 5,579 5.1 85 Female 5.577 6.9 115 Socioeconomic status High, mid 4,503 49 82 Low 6,653 6.8 113 “ Study was conducted in Williamson County, Ténnessee. 400 o o Oo o o 300 [4 w a 1 87 89 Total 100 100 ? Data source is the Population Association of America and E. Kitagawa. Table is reproduced with their permission. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 ACTUARIAL CONTRIBUTIONS TO LIFE TABLE ANALYSIS 31 TABLE 2.— Mortality differentials by income in the United States, 1960" TABLE 4.— Mortality differentials by employment and health status in males 15-64 yr old, United Kingdom, 1971-75 Mortality ratios in percent Standardized standardized for age and educational Status mortality Income attainment: ratios, % Men, Women, Employed 82 25-64 yr 25-64 yr Not working because of illness 309 Under $2,000 140 116 Sesiing work I $2,000-3,999 13 106 Potired 130 $4.000 5.999 9% 98 Permanently sick 382 $6,000-7,999 89 99 $8,000-9,999 96 96 =$10,000 90 93 It is essential that one keep in mind that a group of Total 100 100 persons selected for a mortality study may be either “ See table 1, footnote a. occupations which were excluded from the experience under standard ordinary policies. At attained ages under 40, the difference was minimal in female death rates. However, at attained ages 40 years and older, the women with group life insurance experienced significantly lower mortality than women under standard ordinary policies, apparently because the working women were a more select group. The experience data presented in this section illustrate the marked effects on mortality of socioeconomic status, employment status, physical screening, and other factors affecting health, such as adverse habits. The term “class selection” refers to the permanent attributes exemplified by occupation, life-style, and educational attainment. The term “temporary initial selection” refers to the effects of various screening measures that exclude those who do not meet physical, moral, or financial standards (3). The subjects selected for epidemiologic investigations meet criteria similar to those used in life insurance studies. For example, class selection yields a sample with well- defined socioeconomic characteristics. Itinerants and those unable to function in the normal social environment do not make satisfactory subjects for epidemiologic study. Neither do the obviously ill and those unable to answer questions or complete questionnaires. For these reasons, data based on mortality experience among insured persons are valuable indicators of mortality experience in the middle class population at large (/0). TABLE 3.— Mortality differentials by line of life insurance of white policyholders” Mortality ratio all ages combined, % Type of policy Large ordinary 89 Small ordinary | 80 Small ordinary II 100 Industrial 1 123 Industrial 11 149 unrepresentative of its universe or be significantly affected by confounding factors. In my judgment, there is inade- quate awareness of the magnitude of these biases in the selection process in epidemiologic studies. We should regard selection as a means of defining a population for study. By deliberately selecting a population in ostensibly good health except for the characteristic under study, an investigator may eliminate the more common confounding factors. Those conducting the Cancer Pre- vention Study found it feasible to select ostensibly healthy persons at least through age 90 (10). DURATIONAL EFFECTS In interpreting the findings of cohort studies, one must consider the incidence of mortality by duration because the length of time covered by a study may introduce a serious bias. For example, if the incidence of mortality by duration is sharply downward, as is true in moderately and markedly underweight individuals, then a short-term study will overstate the death rates. Conversely, when mortality increases significantly over time as in those markedly overweight, then an investigation limited to a brief period will understate the death rates. All the major medico- actuarial mortality investigations made since the turn of the century have included analyses of the experience by age at entry and duration since issue of the insurance (/17). TABLE 5. Comparison of the mortality experienced, 1975-80" Death rates/ 1,000 Males Females Attained age, yr Standard Group Standard Group ordinary Too ordinary fi insurance i insurance policies policies 25-29 1.3 1.0 5 4 30-34 1.1 1.0 6 .6 35-39 1.3 1.4 9 8 40-44 1.9 22 1.6 1.0 45-49 33 3.7 2.6 1.9 50-54 5.6 6.2 3.7 2.8 55-59 9.0 99 5.6 4.0 60-64 15.0 15.6 9.0 6.5 “ Study was conducted by the Life Insurance Company of Georgia, 1964 73. SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES “ Comparisons were made of standard ordinary policies issued 16 or more yr ago and group life insurance, all industries combined. 32 Mortality tables developed by actuaries for the study of death rates or other purposes have been in select form, with select periods of varying duration. Table 6 presents the so-called “1965-70 Basic Tables,” which show select death rates by age group at entry for 15 years and ultimate death rates for the sixteenth and later policy years by attained age groups (12). The effects of selection in the 1965-70 Basic Tables are small at entry ages under 35 years but increase with advancing age at entry and become appreciable at entry ages of 50 years and older. These selection effects are concentrated largely in the first 3 years following issue of insurance but persist for many years afterward. The persistence of temporary initial selection is attributed mainly to medical examinations and other screening at the time a policy is issued. Other factors, such as the gradual withdrawal of healthy persons and secular decreases in mortality, complicate the durational effects observed (13). Class selection usually persists for a long time; it is frequently difficult for one to separate the effects of class selection from those of temporary initial selection. For E. LEW instance, the process of selecting men for military service may continue to affect mortality rates for as long as 20 years. Select life tables used in the calculation of premiums and for other practical purposes have arbitrarily limited the select period to 2 or 3 years. It has been observed that the select period on policies issued with a medical examination lasts longer than that on nonmedically issued policies (8). In special investigations of mortality that focus on the extra mortality associated with physical impairments or occupations, it has generally been assumed that the select period may extend for 20 years or longer. In such investigations, select life tables are customarily constructed for calculations of expected deaths on the basis of contemporaneous death rates among persons insured at standard premium rates. This approach is designed for measurement of the different selection effects in the experience under study (7/4). Analyses of mortality by duration in the comprehensive intercompany investigations of physical impairments reveal patterns in the incidence of extra mortality characteristic of TABLE 6.— Basic tables, 1965-70 Issue Policy yr age Sex — LJ 2 [3 Jas Je [7 [8 own] ]ui]a]ois yr Death rates/ 1,000 Males 0 s80 | 133] 8a] es| s3| a8] a2] 39 3s] 32] 3] mu] | a] 5 1] 133] 84] 65| 53| a8| a2] 39) 35| 320 | mm 33| a2 s2| 67 24 | 65| s3| 48] 42| 39| 35| 32 31| 31] 33| 4] s2| 67] 81] 95 sol 30 3s5| 32| 31] 31| 33| 42| s2| 67] 81 95| 107] 116] 119] 1.16 10-1 33] 42| 52| 67 81 95) 107] 16] 119] tae] 13 | a1] 110] 108] 107 15-19 | 92 | 99] 1.03] 107 101 | 1.10] 109] 1.08] 107] 106] 104 | 109] 1.13] 118] 1.23 202 | 6 | 74| 81| s84| 83| 82| 83] 86| 90] 95) tor | rar] 122] 136 150 2520 | 57) 63] 75| 77] s0| 4] 91] 100] 112] 124] 141] 163] 184] 208| 237 3036 | 75 | 87) 98 | 100] 1.09] 133] 146] 168] 192] 217] 247 | 275 | 306| 346| 3.96 3530 | 86 | 111 | 1.41 | 166] 190 | 216 | 246 | 277 | 314] 358 | 405 | 453 | 499| 564] 654 40-44 | 136 | 189 | 235] 2.80 | 3.10 | 3.67] 4.13 | 464 | 520] 579| 651 | 721 | 815] 923] 10.41 45-49 | 194 | 270 | 354 | 423 | 485 | 557| 6.40 | 7.13 | 7.80| 8.66 | 9.77 | 11.05 | 12.76 | 14.55 | 16.43 50-54 | 2.61 | 403 | 531 | 629 | 7.01 | 8.58 | 10.15 | 11.26 | 12.29 | 13.74 | 15.18 | 17.05 | 19.54 | 22.08 | 24.17 55-59 | 3.65 | 5.48 | 7.67 | 9.28 10.22 [11.80 | 13.87 | 15.30 [16.81 | 19.16 | 21.74 | 24.91 | 28.23 | 32.27 | 35.91 60-64 | 589 | 8.53 [11.92 [15.04 [16.35 [17.39 | 19.05 | 21.37 | 24.34 | 28.08 | 33.30 | 39.82 | 44.31 | 47.57 | 50.36 65-69 | 9.74 [13.68 [17.51 | 20.69 | 23.88 | 25.94 | 28.42 | 30.77 | 35.38 | 41.98 | 50.09 | 56.90 | 61.79 | 66.73 | 72.64 >70 [10.38 [14.37 [18.56 | 23.07 | 30.10 | 38.15 | 46.91 | 56.67 | 67.32 | 79.21 [90.55 |103.80 | 115.22 | 123.75 | 133.86 Female 0| 48 | 122] 72| 55) a8| 42| 37 33| 200 27] 25] 20 27] 290] 33 1] 122] 72) ss| a8| 42 37] 33] 200 27] 25) 20| 27 29 33] 36 24 | 55| 48| 42| 37] 33| 20 27| 25| 26] 27] 29) 33| 36| a1] 47 59| 33) 200 27] 25) 20] 27| 29| 33| 36| a1| 47] 3] 57| S59 58 10-14 | 27] 29| 33| 36| 41| 47 53| 57| 59] 8) s6| ss| s4| 53] sa 1519 | 4s| 49| s0| 51| 53| 56| 55| 54| 53] sa| s7| e0| 63] 66] 67 2024 | s50| s54| 52| 53| sa] 57] 60] 63] 66] 67] 70 76| 83] 92| 1.02 252 | s4| so| 63| e6| 67] 70| 76| 83| 92] 102] 116] 131 | 145] 160] 176 303 | 70) 76| 83 92) 102] 16] 131] 145] 160] 176] 191 | 207 | 226] 248] 270 3530 | 77 | 98 | 121] 144 | 157] 1.78] 199] 2.19 | 2.41 | 262 283 | 308 | 336] 366| 4.00 40-44 | 87) 120] 155| 189 | 2.12 | 244 | 273 | 297 | 322| 348] 3.71 | 395| 458| 5.14] 563 45-49 | 102 151] 217 | 249 | 280 | 3.18 | 361] 396 | 435] 475] 5109 | 562 | 644| 710] 84 50-54 | 188 | 2.62 | 3.51 | 4.15] 460] 503 | 559 | 605 | 651| 673| 777 | 859 | 9.80 | 11.00 | 12.50 55-50 | 198 | 2.78 | 3.68 | 426 | 470 | 503 | 562 | 621 | 692| 8.161004 | 11.84 | 14.16 | 1566 | 17.16 60-64 | 286 | 4.43 | 621 | 692 | 8.16 (10.04 | 11.56 | 12.48 | 13.40 | 14.40 | 1591 | 18.03 | 2038 | 22.48 | 24.83 65-69 | 4.08 | 638 | 8.191093 | 13.03 | 14.89 | 16.81 | 18.69 [20.56 | 22.70 | 24.15 | 25.81 | 28.34 | 30.58 | 32.40 >70 | 6.15 | 9.00 | 12.11 | 15.65 | 21.14 | 27.68 | 35.11 | 43.63 |53.23 | 63.28 | 75.29 | 85.05 | 96.23 | 105.84 | 114.33 NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 ACTUARIAL CONTRIBUTIONS TO LIFE TABLE ANALYSIS 33 the more important diseases and abnormalities. Broadly speaking, acute diseases and conditions treated by surgery, notably cancer or heart surgery, show an incidence of extra mortality that decreases with duration. Chronic conditions, such as overweight and diabetes, are characterized by an incidence of mortality increasing with duration (15-20). More specifically, the following conclusions about the incidence of mortality have been drawn from the various studies: 1) The impairments characterized by relative mortality decreasing with duration include: Coronary disease Heart surgery Malignant tumors Underweight Asthma Psychoneurosis Gastric and duodenal ulcers Infected gall bladder Fractured skull Hysterectomy 2) The impairments characterized by relative mortality increasing with duration include: Overweight Moderate elevations in blood pressure Diabetes Rheumatoid arthritis Family history of cardiovascular disease In mortality investigations covering longer periods, consideration of durational effects in relation to underlying secular mortality trends is essential. From 1950 to 1965, the underlying death rates in the United States remained virtually at the same level, so that selection effects could readily be inferred from durational effects. However, the sharp downtrend in death rates which began in the late 1960s makes the separation of selection effects from underlying mortality difficult. . One can approach this problem by developing cohort life tables, i.e., life tables for each of several calendar years of birth. Such tables are available for the general population and can readily be compiled for those insured, given the year of issue of the insurance and their ages at the time of issue. The relationship between changes in cohort and period life tables can be investigated by a display of the death rates during a particular period by attained age groups and relating them to the death rates of the same cohort 10 years earlier (27). If 10™{ represents the death rate for the age group x to x+9 during the period ¢, then the ratio of this death rate to that of the same cohort 10 years earlier can be expressed as: 107% 107% 10" 10 107 10m 10 In other words, you can obtain the ratio of the cohort death rates for successive age groups (x—10 and x) by multiplying the ratio of the death rate in the younger age group (x—10) during the later period (7) and the death rate for the same younger age groups (x—10) during the earlier SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES TABLE 7.— Truncated cohort mortality analysis of U.S. white males: Death rates 100,000 and derived ratios for respiratory cancer, 1945-79 Ages, yr Period 25-34 35-44 45-54 55-64 65-74 75-84 1975-79 1.2 124 740 208.5 399.8 501.1 1965-69 1.5 13.3 61.1 176.3 304.8 283.5 1955-59 1.5 9.6 49.1 137.0 1934 165.0 1945-49 1.5 76 33.7 80.8 925 84.2 Ratios” Same age (i) 8 9 1.2 1.2 1.3 1.8 (ii) 1.0 1.4 1.2 1.3 1.6 1.7 (iii) 1.0 1.3 1.5 1.7 2.1 2.0 1975-79 (iv) 12.0 10.3 6.0 2.8 1.9 1.3 1965-69 (v) 7.0 9.5 4.6 29 1.7 9 1955-59 (vi) 5.0 6.4 5.1 2.8 1.4 9 Cohort (vii) 6.0 83 5.6 34 23 1.6 (viii) 4.7 8.9 6.4 3.6 22 1.5 (ix) 25 6.6 6.5 4.1 24 1.8 ¢ (i) Ratio of 10", 1975-79 to 10™, 1965 69. (ii) Ratio of 10™, 1965 69 to 10™, 1955-59. (iii) Ratio of 10™, 1955-59 to 10", 1945 49. (iv) Ratio of 10™, 1975 79 to 10"1y 1975-79. (v) Ratio of 10", 1965 69 to 10" _;y 1965-69. (vi) Ratio of 10”, 1955 59 to 10™_;o 1955-59. (vii) Ratio of 10", 1975 79 to 10", jo 1965-69. (viii) Ratio of 10™, 1965 69 to 10™_;y 1955-59. (ix) Ratio of 10™, 1955-59 to 10™,_jo 1945 49. period (1—10). Under conditions of declining mortality, cohort death rates would generally rise less rapidly with advancing age than period life table death rates. Table 7 presents such a truncated cohort mortality analysis for death rates from respiratory cancer among white males between 1945 and 1979. ASSESSMENT OF RISKS Few problems have precipitated so much controversy as the assessment of risks from mortality data. This kind of problem has been handled by actuaries in a limited context for over 150 years. The actuarial approach has been concerned with the assessment of various kinds of risks that increase death rates among insured persons. The objective has been an evaluation of the major risk factors as accurately as possible without resort to the kind of supplementary inquiries that might be helpful in the tracing of causal relationships (/4). The methodology used in medico-actuarial mortality investigations has focused on cohort studies in select form, with emphasis on analyses of mortality by age at entry, analysis of mortality by duration since entry, and analysis by cause of death (22). The special actuarial contributions to the methodology of cohort studies lie in the careful selection of the subjects for particular investigations, in the use of appropriate standard cohorts as controls, in emphasis on durational effects, and in the interpretation of the findings by reference to the characteristics of the subjects (22). The goal of actuaries is to obtain a population whose experience would 34 E. LEW represent only the mortality associated with the factor under study. To isolate the effects of a specific factor, medico-actuarial investigators of life insurance records exclude all individuals who present other elements of risk, such as a medical impairment, hazardous occupation, or questionable habits. Thus the population under study represents an ostensibly healthy population except for the presence of the factor under study (/4). Life insurance records are usually ample enough to afford exclusions without resort to multivariate analyses. In life insurance studies, multivariate analyses have been used when the number of subjects has been small. Compilation of life tables appropriate for the calculation of expected deaths in studies based on life insurance records has been easy; current death rates are deter- mined among standard life insurance risks in select form. In studies based on clinical or other medical records, life tables were compiled that would represent contem- poraneous death rates among ostensibly healthy persons drawn from a population with similar socioeconomic characteristics. The experience among actively employed persons covered by group life insurance offers a much more appropriate standard of expected mortality for those 20 to 64 years old than death rates in the general population (23). At ages 65 and older, the mortality among ostensibly healthy subjects in the Cancer Prevention Study provided a more suitable basis for calculation of expected deaths at these ages than the death rates in the general population (10). Table 8 presents a comparison of the mortality rates among actively employed persons covered under group life insurance with corresponding death rates in the general population in the 20- to 64-year age range. Also compared are the mortality rates among the ostensibly healthy subjects 65 and older in the Cancer Prevention Study with the corresponding death rates in the general population. The death rates among persons covered by group life insurance represent the experience among actively em- ployed men and women in the age range of 25-64 years. The ostensibly healthy persons 65 and older who partici- pated in the Cancer Prevention Study represent that portion of the total experience which pertains to persons who at the time of enrollment were not ill; had no history of heart disease, stroke, or cancer; and were not markedly overweight. The 200,000 such persons followed up to 20 years provide the largest known body of data on mortality in a middle class population aged 65 years and older (10). The death rates of the actively employed men under 65 were only about two-thirds of the mortality risks for white males in the general population; the death rates of the actively employed women 40-64 years old were less than 60% of the mortality rates for white women in the general population. The death rates of ostensibly healthy middle class men 65-80 years old were less than two-thirds of the mortality rates of white males, whereas the corresponding death rates of ostensibly healthy middle class women were only about one-half those of white women in the general population. Obviously, general population death rates do not provide reasonable standards of expected mortality when the objective is the estimation of the departures from normal mortality produced by physical impairments or special hazards to health in an otherwise healthy middle class population (23). The death rates among those under 65 who are covered by group life insurance and the mortality rates among the ostensibly healthy 65 and older in the Cancer Prevention Study are far better yardsticks of extra mortality in such circumstances. The use of population death rates as a standard of expected mortality in studies of occupational hazards may underestimate the extra mortality involved by as much as 50%. A like underestimate may occur in mortality studies designed to determine the excess mortality of the medically impaired over healthy persons. In studies with internal controls, great care must be taken that the matched controls are themselves not subject to mortality distinctly higher than normal for healthy persons. This is the reason why studies in which hospital patients are used as controls are automatically suspect. Other biases in matched controls have been described by Berkson. I want to emphasize that the subject population may be at risk with respect to confounding factors that influence mortality to a degree which significantly distorts the effects of specific factors under investigation. The most serious of these confounding factors is the inclusion of individuals with significant health hazards, other than the specific factors under investigation, for whose extra mortality allowance cannot easily be made. Whenever possible, it is highly desirable that one follow the actuarial practice of TABLE 8.— Comparison of death rates in the U.S. white population with persons covered by group life insurance and ostensibly healthy persons in the Cancer Prevention Stud)” . U.S. white Persons covered by group } U.S. white Ostensibly healthy Attained population, 1977 life insurance, 1974-79 Attained population, 1969 71 persons, 1960-78 ages, yr ages, yr Males Females Males Females Males Females Males Females 25-29 .00167 .00061 .00100 (.60) .00040 (.66) 65-69 03977 01883 .02240 (.56) .00920 (.49) 30-34 .00164 .00078 00100 (.61) .00060 (.77) 70-74 05655 .03048 03559 (.63) 01510 (.50) 35-39 00219 00116 .00140 (.64) .00080 (.69) 75-79 08472 05264 95445 (.64) 02810 (.53) 40-44 .00340 .00192 .00220 (.65) .00100 (.52) 80-84 12127 .08702 .08790 (.72) 05380 (.62) 45-49 00565 .00310 .00370 (.65) 00190 (.61) 85-89 17688 13944 .14420 (.82) .09600 (.69) 50-54 .00925 .00480 .00620 (.67) .00280 (.58) 90-94 24152 20617 .20895 (.87) 15890 (.77) 55-59 .01440 .00726 .00990 (.69) .00400 (.55) 60-64 02338 01144 01560 (.67) 00650 (.57) “ Numbers in parentheses are ratios to death rates in the general population. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 ACTUARIAL CONTRIBUTIONS TO LIFE TABLE ANALYSIS 35 excluding from study individuals with health hazards other than those under investigation. Smoking is a common specific confounding factor of this kind (24). Medico- actuarial studies have until recently been based on data which do not exclude individuals with smoking habits. Virtually all studies of occupational hazards, particularly those focusing on cancer, are defective if they do not classify the findings by smoking categories. Investigators who conducted some of the life insurance studies have suggested that in certain occupations, notably in the entertainment field, the abuse of alcohol and drugs may be an even more serious confounding factor than is smoking. EXAMPLES OF ACTUARIAL LIFE TABLE ANALYSES Medico-actuarial studies began in the United States in the 1890s as part of a concentrated effort by investigators to improve the underwriting of life insurance risks. Mortality investigations were accordingly undertaken, and the effects of specific risk factors, such as medical impairments, build, occupational hazards, family history, and habits were measured. The underlying hypothesis was that each of the factors or certain combinations of factors influencing mortality could be regarded as an independent variable. The total mortality risk was treated as a linear component of its independent elements. The broad lines for such studies were laid down in the Medico-Actuarial Investigation of 1912 sponsored jointly by the Actuarial Society of America and the Association of Life Insurance Company Medical Directors (25). The instructions called for the tracing of cohorts of policy- holders underwritten under similar rules over long periods and for analysis of the results by age groups at entry, by duration since issue of insurance, and by cause of death. Attention focused on the mortality in the years immediately following issue of insurance so that the effects on mortality of the companies’ underwriting rules, especially the kinds of medical examinations and other screening used, could be evaluated. Comparisons of the patterns of mortality over longer periods have indicated the incidence of the extra mortality as either temporary, decreasing over the years, relatively level over time, or increasing with the passage of time. Analyses of mortality by cause have shed light on the diseases mainly responsible for the excess mortality and on whether mortality from some causes could be controlled to a degree through modified underwriting rules. The 1912 Study, which included about 500,000 policies, was done on the basis of the number of policies in each cohort rather than on the basis of persons. The results were reported as ratios of the number of policies actually terminated by death in the cohort to the expected number of policies terminated by death among the insured indi- viduals who were accepted at standard premium rates. The select life table used in the calculations of expected deaths assumed a 4-year select period. In 1926, the Actuarial Society of America and the Association of Life Insurance Medical Directors completed a major study of occupational hazards, now referred to as the Joint Occupation Study 1926 (26). It covered about 1,500,000 policies that were issued at standard and substan- SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES dard premium rates for occupational hazards in 430 job groups; the 1915- to 1926- and the 1920- to 1926-periods were separated so that the effects of the influenza epidemic on the experience could be evaluated. Inasmuch as life insurance records classify the policyholder’ job at the time of issue, those conducting the next (1937) study reviewed death claim papers to determine the proportions of policyholders who were in the same occupation at time of death as at time of issue (27). In both of these occupational mortality investigations, the tables used in calculations of expected deaths were in select form for the entire period of the experience. Another study of occupational hazards followed in 1967, and data were developed on deferred occupational hazards as it extended the follow-up to a maximum of 15 years (28). Comprehensive mortality investigations of medical im- pairments were conducted in 1929, 1936, 1938, and 1951 (15, 16, 18, 29, 30) on the pattern of the Medico-Actuarial Mortality Investigation. Special studies of body build were done in 1931 (16) and 1959 (19) and of blood pressure in 1938 (17, 19) and 1959 (26, 28); a number of methodological advances were incorporated (/4). The mortality ratios were accompanied by their probable deviations or confidence limits. The experience was developed separately for persons with no significant impairments, those with minor ones, and for those with common combinations of impairments. Considerable effort was spent in the delineation of the special characteristics of the populations under study, to ongoing changes in such populations, and to differences in the classification of heart and other diseases. Two new comprehensive studies of build and blood pressure were completed in 1979, i.e., the 1979 Build Study and the 1979 Blood Pressure Study (3/, 32); about 4,250,000 policies were traced over the period 1954-73 for each. Investigators explored the mortality associated with underweight to estimate the probable ranges of optimal weights and to differentiate between underweight as a symptom of underlying disease and as a normal charac- teristic (32), whereas in the investigation of blood pressure, the focus was on the evidence that treatment of hyperten- sion had generally been highly effective. Both studies showed it was essential that one begin with healthy populations to reach meaningful conclusions about the mortality associated with characteristics found mainly among ostensibly healthy people (33). Pending completion at this time is another comprehen- sive medico-actuarial investigation of the experience on various physical impairments. including abuse of alcohol, based on 2,400,000 policies that were traced for the period from 1952 to 1976. Minor impairments and early stages of disease are being examined separately from the more serious impairments or later stages of disease. Special attention is being given to the effects of the decline in mortality during the study period (20). With the sharp rise in the cost of medico-actuarial studies based on life insurance records, actuaries have turned to other sources of information bearing on mortality asso- ciated with various diseases and health hazards. Singer and Levinson (34) converted the quantitative findings of many clinical and other studies into life table analyses by age at entry and duration. Greater recourse to quantitative data 36 E. LEW from all sources is believed to have much promise for better information. To this end, a sequel to the above-mentioned volume is currently under preparation and primarily will draw on medical and actuarial literature published since 1976. REFERENCES (I) BENJAMIN B: Actuarial methods of mortality analysis: Adaptation to changes in the age and cause pattern. Proc R Soc Lond [Biol] 159:38-65, 1963 (2) ELsTON JS: Sources and characteristics of the principal mortality tables. /n Actuarial Studies No. 1 (2d ed). Phila- delphia: Actuar Soc Am, 1932 (3) BENJAMIN B, POLLARD JH: The Analysis of Mortality and Other Actuarial Statistics. London: Heinemann, 1980 (4) BATTEN RW: Mortality Table Construction. Englewood Cliffs, N.J.: Prentice-Hall, 1978 (5) Office of Population Censuses and Surveys, Registrar General for England and Wales: Occupational Mortality 1970-72, Ser DS, No. 1. London: Govt Stat Serv, 1978 (6) BRAGG JM: Mortality differences and trends in the United States, taking account of color, sex and socioeconomic status. Trans 20th Int Cong Actuar 2:391-401, 1976 (7) Fox J, GOLDBLATT P: Socio-demographic differences in mortality. Popul Trends 27:8-13, 1982 (8) Committee on Ordinary Insurance and Annuities: Mortality under standard ordinary insurance issues between 1979 and 1980 anniversaries, Report No. 1. In 1981 Reports of Mortality and Morbidity Experience. Soc Actuar Trans 1:1-52, 1982, 1983 (9) Committee on Ordinary Insurance and Annuities: Group life insurance mortality. /n 1980 Reports of Mortality and Morbidity Experience. Soc Actuar Trans 2:45-115, 1981, 1982 (10) LEw EA, GARFINKEL L: Mortality of ages 65 and older in a middle class population. Trans Soc Actuar. In press (11) LEw EA: Insurance mortality investigation on physical impairments. Am J Public Health 44:641-654, 1954 (12) Committee on Ordinary Insurance and Annuities: 1965-1970 basic tables, Report No. 4. In 1973 Reports of Mortality and Morbidity Experience. Trans Soc Actuar 3:199-224, 1974 (13) SEAL HL: A statistical review of the evidence for the existence of temporary selection. J Inst Actuar 85:165-207, 1959 (14) LEw EA: Some observations on mortality studies. J Inst Actuar 104:221-225, 1977 (15) Joint Committee: Medical Impairment Study (1929). New York: Actuar Soc Am, Assoc Life Insur Med Directors Am, 1931] (16) Joint Committee: Supplement to Medical Impairment Study (1929). New York: Actuar Soc Am, Assoc Life Insur Med Directors Am, 1932 (17) Joint Committee: Blood Pressure Study (1939). New York: Actuar Soc Am, Assoc Life Insur Med Directors Am, 1940 (18) Committee on Mortality of the Society of Actuaries: Impair- ment Study, 1951. New York: Soc Actuar, 1954 (19) Society of Actuaries: Build and Blood Pressure Study (1959), vol I, II. Chicago: Soc Actuar, 1960 (20) Ad Hoc Committee: Impairment Study (1983). New York: Soc Actuar, Assoc Life Insur Med Directors Am, 1985 (21) LEw EA, SELZER F: Uses of life tables in public health. Milbank Mem Fund Q 7:15-36, 1970 (22) TENENBEIN A, VANDERHOOF IT: New mathematical laws of select and ultimate mortality. Trans Soc Actuar 32: 119-184, 1980 (23) Fox AJ, GOLDBLATT PO, ADELSTEIN AM: Selection and mortality differentials. Epidemiol Community Health 36: 69-79, 1982 (24) GARRISON RJ, FEINLEIB M, CASTELLI WP, et al.: Cigarette smoking as a confounder of the relationship between relative weight and long-term mortality in the Framing- ham Study. JAMA 249:2199-2203, 1983 (25) Joint Committee: Medico-Actuarial Mortality Investiga- tion, vol I-V. New York: Actuar Soc Am, Assoc Life Insur Med Directors Am, 1912-1914 (26) Joint Committee: Occupation Study (1926). New York: Actuar Soc Am, Assoc Life Insur Med Directors Am, 1926 (27) Joint Committee: Occupation Study (1937). New York: Actuar Soc Am, Assoc Life Insur Med Directors Am, 1937 (28) Committee on Mortality Under Ordinary Insurance and Annuities: Occupational Study (1967). Chicago: Soc Actuar, 1968 (29) Joint Committee: Impairment Study (1936). New York: Actuar Soc Am, Assoc Life Insur Med Directors Am, 1937 (30) Joint Committee: Impairment Study (1938). New York: Actuar Soc Am, Assoc Life Insur Med Directors Am, 1939 (31) Ad Hoc Committee on the New Build and Blood Pressure Study: Build Study (1979). Chicago: Soc Actuar, Assoc Life Insur Med Directors Am, 1980 (32) Ad Hoc Committee on the New Build and Blood Pressure Study: Blood Pressure Study (1979). Chicago: Soc Actuar, Assoc Life Insur Med Directors Am, 1980 (33) LEw EA, WILBER J: Supplementary observations on 1979 Build and Blood Pressure Studies. Rec Soc Actuar 8:1157-1163, 1982 (34) SINGER RB, LEVINSON L: Medical Risks: Patterns of Mor- tality and Survival. Lexington, Mass.: Lexington Books, 1976 Discussion |! A. Lilienfeld: In view of the fact that the previous presentations provided some historical background, I think I would be negligent not to mention the fact that William Farr, the first vital statistician in England, played an edifying role not only in epidemiology but also in the development of life tables. In 1837, Farr used life tables to study daily survivorship for both smallpox and cholera over a period of weeks after the onset of disease. He also used the person-year concept in studying mortality experi- ence of mentally ill patients. He compared mortality rates of the mentally ill who were in asylums with those whose care was in what today we would call foster homes. Throughout the 19th century, various investigators utilized this concept to analyze mortality experience among the mentally ill in various mental institutions both in England and the United States. I think that during the 19th century mortality experience expressed in the form of life tables was a continuously developing practice, which gradually became transformed into prospective or cohort studies. Farr suggested in 1837 that “tables of sickness for the entire population would be formed by taking 100,000 persons, of given ages, in- discriminately, and observing them for one, two, three, etc., years . . .” Thus the concept of a follow-up study was enunciated about 150 years ago. From the viewpoint of seeing relationships among actuarians, statisticians, and epidemiologists, we have ample historical precedents. It was common in England for the medical and other journals of professional societies to publish the names of individuals who attended the annual banquets given by these professional societies. Membership lists of these societies were also published. For example, the Royal Statistical Society had an annual banquet, and the list of all those who attended was published. Other groups included the Institute of Actuaries in London, founded in 1837, at about the same time as the Statistical Society, and the London Epidemiological Society, founded in 1850. In the 1850s and 1860s, memberships in these profes- sional organizations were overlapping. William Farr attended and presented papers at meetings of the Statistical Society and the Institute of Actuaries, and also attended meetings of the London Epidemiological Society. A core of individuals with mutual interests were affiliated with all these societies. They had this type of relationship because the different professional groups were essentially dealing with similar concepts. Here today we have epidemiologists, actuarians, and I Conducted at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Address reprint requests to Lawrence Garfinkel, Epidemi- ology and Statistics Department, American Cancer Society, 4 West 35th Street, New York, N.Y. 10001. statisticians discussing cohort studies in the same tradition established in England 130 years ago. J. Higginson: I would like to ask a couple of questions which were raised in previous presentations. One is always interested in life tables and actuarial experiences, but the accuracy of the basic data is always troublesome. Someone mentioned that a variation of 25% in younger ages may mean nothing in relation to SMR. R. B. Singer: A mortality ratio? Higginson: Yes. Secondly, we have the statement that two-thirds of the average population in such studies is middle class. In other words, we have 2 populations being used as a reference, i.e., the middle class life insurance population and the total general population. Furthermore, the experience of women has not been discussed, only that of men. The third point I wanted to make was that in those days you eliminated heart disease and active tuberculosis, etc., when you insured a person. Thus you increased population selection. What would happen if you did the same thing today, i.e., excluded smokers and persons with other habits, factors, or markers that affect insured life? Probably high blood pressure is the most important marker now- adays because rheumatic fever and tuberculosis are dis- appearing. Or should we classify the reference population into smokers and nonsmokers and forget about other illnesses? Would smoking eliminate most social class and related changes for practical purposes? E. A. Lew: You raised several questions. Let me begin with your last question. We now have available several studies of insured persons according to smoking habits. We also have a large body of data on this subject from the first Cancer Prevention Study, subdivided between nonsmokers and various classifications of smokers. Therefore, we can investigate in considerable detail how the mortality of smokers and nonsmokers is affected by various physical impairments and other factors. The Cancer Prevention Study I and several life insurance studies indicated that overweight, underweight, hyper- tension, and some of the indicators of heart disease have different effects on mortality of smokers and nonsmokers. This is true in large part because the class of smokers usually includes more persons with other physical im- pairments than does the class of nonsmokers. Social class affects mortality independent of smoking habits. Both the Cancer Prevention Study I and the studies of insured people reflect the experience in middle class populations. Therefore, it is not surprising that the death rates derived from these studies are similar. The biases in the studies of the insured arising from the selection of subjects for study are similar to those in the selection of subjects for the Cancer Prevention Study I. Higginson: I think that insured individuals and the American Cancer Society’s population are a selected group and not representative of the total population. 37 38 DISCUSSION | E. A. Lew: I agree. Only those insured under the so- called “industrial policies” in the past were reasonably representative of the general population. In recent years, industrial policyholders have been drawn largely from the lowest socioeconomic segments of the general population and hence could not be regarded as representative of the total population. Perhaps the persons covered by group life insurance in all industries combined belong to the most representative category of the insured. Sometimes, information on the experience of persons covered by group life insurance can be obtained by amount of insurance, which is a function of the insured person’s income. From the experience by amount of income we can estimate the mortality dif- ferentials between executive or administrative personnel and the lowest paid wage earners. In a typical insured group, the experience among the top administrative personnel might run 70% of the average mortality in the group, whereas the lowest paid wage earners might run 115% of the average. I believe that such mortality differentials would be even wider in the general population compared with persons insured under group life insurance policies. E. C. Hammond: In our experience with the Cancer Prevention Study, our study population was certainly weighted toward the middle class. It was by no means exclusively middle class, but only weighted in that direction. The criteria you use determine the weighting. If you take occupations in which an exposure to dust, fumes, chemicals, etc. is involved, which by and large are those in the lower class groups, you would also include some chemists and radiologists. Another example is education. In this latter classification, we found that educational weighting was not as strong as it might be in reference to social class. A considerable weighting, however, was found in the variable: “married” versus “single;” the weighting was in the direction of the married, which makes for a significant difference in death rates. The mortality rates in Cancer Prevention Study I and among insured persons, a comparison Mr. Edward A. Lew and 1 made many years ago, showed that the Cancer Prevention Study I population was close to ordinary life insurance policyholders. Mortality in early years was similar and increased in about the same way. As I recall it, the Cancer Prevention Study I showed higher death rates than those in the general population. However, I think the general population death rates are suspect, considering the estimates made by the New York Times of illegal immigrants not counted in the population. Generally speaking, such immigrants try to avoid the census taker and for good reasons. They apparently have high death rates, and when they die, they appear in the death counts only. I cannot take the total population death rate, one which is reported as being reasonably accurate to be all that exact, because the numerators and denominators do not cor- respond. I think the difference between death rates in the Cancer Prevention Study I and these in the general population is not of the same order as the small difference between the Cancer Prevention Study I subjects and standard ordinary life insurance policyholders, although it is in that direction. “ E. A. Lew: To elaborate on the points made by Dr. Hammond, I would add that in effect we found the death rates in the Cancer Prevention Study were almost identical to those of standard ordinary life insurance policyholders who had policies issued for more than 15 years. This was not so, however, for policyholders who had policies issued within § years, whose mortality rates were lower than those of Cancer Prevention Study subjects in the first S years following enrollment. This last observation reflects the less stringent physical screening in the Cancer Prevention Study compared with the physical screening for life insurance. After excluding from the general body of the cancer study subjects those individuals who presented a history of heart disease, cancer, stroke, or marked overweight, the remainder were designated as “ostensibly healthy.” It is the death rates of these ostensibly healthy persons in the Cancer Prevention Study that showed almost identical death rates with the insured under corresponding circumstances. Hammond: Yes, they were almost identical. I. Selikoff: Another approach to the question that Dr. Higginson raises is the one that Dr. Hammond and Mr. Seidman used in Cancer Prevention Study I. For analytic purposes, they established a subgroup of white males. These men were not farmers; but were exposed to dust, fumes, vapors, radiation, and chemicals; had no more than a high school education; and their smoking habits were known. Using the death rates of these men, Hammond and Seidman made smoking-specific comparisons, i.e., mor- tality rates of these men were compared with those of various observed groups, with smoking categories taken into account. Before I ask a question, may I express appreciation for finally learning what the word “conservative” means, as it has been used today? Dr. Benjamin points out that conservative means data that would allow insurance companies to stay in business and make money. I am not sure that this definition of conservative is an appropriate one from a public health point of view. I would like to ask Mr. Edward Lew about the importance given to the question of using group life insurance death rates rather than national mortality rates. The latter rates, as we know, tend to eliminate to a considerable extent the “healthy worker effect.” How far have we been going wrong in using national rates and not using group insurance or such similar rates? Is this part of the reason for some complaints about the insufficient sensitivity of epidemiologic studies? Have we been wrong in the last 10 or 15 years? You make a substantial point and I wonder if we have been inadequate in our use of general population rates. E. A. Lew: We have been wrong in the last 15 years by not using the group life insurance experience as a standard of expected mortality in occupational studies. We under- estimated the extra mortality in occupational studies when general population death rates were used as the yardstick of normal mortality for gainfully employed persons. However, death rates among actively employed persons, such as the NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 DISCUSSION | 39 group life insurance experience, became available only about 15 years ago. The population covered by group life insurance about 30 years ago was not large enough to be representative of the death rates among those gainfully employed. Beginning in the 1970s, life tables have been calculated for persons covered by group life insurance in all industries combined, and they provide a far more adequate yardstick of the mortality to be expected among the gainfully employed. These tables exclude the experience among the gainfully employed not covered by group life insurance and to this extent may underestimate the mortality among all gainfully employed, including those not covered by group life insurance. The key issue is that the general population includes an increasing proportion of persons advancing in age who are unable or unwilling to find employment because of illness or inability to adapt to our industrial society. Such persons are demonstrably subject to much higher death rates. Hence general population death rates, especially past midlife, seriously underestimate the death rates in occu- pations with significant hazards. The understatement may not make a great deal of difference at ages under 30, but when you get into the 50s it makes a sizable difference. S. Jablon: I have been troubled by this recent discussion. I can understand why those who are interested in insurance are concerned only if somebody is alive or dead, because that determines the financial aspect. Those of us in epidemiology are, of course, concerned with whether somebody is alive or dead but much more importantly, if he dies, we want to know the cause. We are, I think, aware of the importance of selective factors and those of us who have studied it (and I guess that is all of us) are also aware that selective factors vary remarkably with respect to different causes of death. I know that a physical examination can affect mortality rate for cardiovascular disease for 25 years thereafter, whereas if you are concerned about the cancers, the duration of the effectiveness of that screen is much shorter. You recommended the group life experience as a kind of universal solvent or universal control group, which would be nice if we could rely on it. I have two questions about that. Are the data adequate with respect to cause of death? Can the data be subdivided by geography? For example, we know that cancer death rates in Utah are considerably lower than they are in the state of Louisiana. If we could use group life experience as a standardizing device, would we be able to learn to what extent socioeconomic circumstances are responsible for that kind of geographic difference? These are examples of the type of information I think we would want to know about and need. E. A. Lew: First of all, I would comment that I thoroughly agree with your general remarks. In the paper 1 presented, 1 quoted from your study with regard to the mortality of army personnel that selection may extend over a long period. In a paper on mortality in the Cancer Prevention Study with Mr. Garfinkel, we make the point that selection or medical screening has little effect on cancer death rates SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES whether among the ostensibly healthy or those in impaired health. On the other hand, selection produces radically different death rates, initially much lower death rates from heart disease. No broad compilations of mortality by cause of death of those covered by group life insurance policies have been done. Of course, many large corporations whose personnel are covered by group life insurance do make analyses of their experience by cause of death, especially for selected plants where occupational hazards are believed to exist. In group life insurance, the experience for people insured for less than $25,000, $25,000 to $50,000, and so on can frequently be analyzed separately. Thus the socioeconomic gradient in mortality may easily be pinpointed. Enough geographic data are available so that we can confirm what you said about low death rates in Utah, Idaho, and the Northwest central states. Without pushing the proposition that the group life insurance experience is a source of new information, I would put it forward as a valuable yardstick of expected mortality for gainfully employed populations. Higginson: 1 want to make one point regarding the concept of socioeconomic status being related to money only. For example, in this country until recently, General Motors workers and airline hostesses had high incomes, whereas many school teachers received lower incomes than other groups regarded as being of similar socioeconomic status and life-style. Thus income alone may be an inadequate expression of socioeconomic status. The large low-smoking industries, such as refineries and petro- chemical plants, etc., which have some of the best health experiences, are not representative of other workers from the same socioeconomic background. G. Comstock: With respect to the effect of socio- economic status and smoking, we studied a large cohort in Washington County, Maryland, using education as a surrogate for socioeconomic status. Even after adjusting for smoking, the socioeconomic association with mortality persisted. Consequently, adjustment for smoking would not remove the necessity of investigating socioeconomic status in some way. E. A. Lew: I can confirm the last comment on the basis of the mortality experience of the policyholders of Metro- politan Life Insurance Company that was analyzed both by their income and smoking habits. Although a definite interrelationship existed between these factors, each must be viewed primarily as an independent element of risks. One of the reasons why lower socioeconomic segments experience higher mortality is that more individuals in these socioeconomic levels smoke. Higginson: 1 must protest. Thirty years ago, when I worked in Africa, although the population was of low socioeconomic class, stomach cancer was rarely seen. Conversely, lung cancer was common in the higher economic group because people in that group could afford to smoke, the poor could not. This is the opposite of the present experience in the United States. Thus our compar- ing the socioeconomic status and life-style of an African population with a North American population is pointless. Morris’ work in the United Kingdom that shows marked 40 DISCUSSION 1 differences in diet and many other factors is worth remembering as reflecting life-style habits. However in many studies, the economic factor is used as a parameter or a marker for other social factors. E. A. Lew: I agree with you that income is not always a good indicator of socioeconomic status. I used it to illustrate a point when data on income were available. In my paper, I stated that the best indicator of socioeconomic status in the United States is education. This has been demonstrated clearly at the University of Chicago by Dr. Kitagawa who found that education was a more predictive variable than income. Lilienfeld: I fail to understand why Dr. Higginson is worried about what kind of an indicator is used for determining socioeconomic status. Everything depends on a specific question that is being asked in the study. Sometimes you want a surrogate that may be crude, and sometimes you want one that is refined. In fact, in some of the studies done in this country, one finds a high degree of correlation among characteristics such as median rental, occupation, and education, as indexes of socioeconomic status. It is about 95%. Other studies have also found a high correlation. Obviously, in other countries you probably find different kinds of problems, depending on the various conditions encountered. N. Mantel: I am happy to recognize how far ahead of the statisticians the actuarial scientists have been in this. If one were to read the statistical literature, one would think that life table methods began with Kaplan and Meier. All that Kaplan and Meier did was to replace the sensible discrete intervals that the actuaries used with continuous time, which was an obvious extension. Many of us thought of it before but did not bother with it. However, in classroom work, 1 have attempted to explain the life table method. I find that I can do this without making use of the symbolisms that ordinarily go with actuarial computations. One has only to postulate the concept of the truncated distribution and then to say that, for an individual lost observation, lifetime belongs in the truncated distribution to the right of the time of loss. Then whatever was observed to the right of that loss time represents a sample of what might have happened. There- fore, with the simple concept of the truncated distribution, one could explain life table methods. One could also use the truncated distribution to explain Goodman’s concept of quasi-independence. One could even use the truncated distribution concept to explain Bayesian analysis. Actually, the unifying idea behind diverse statistical concepts comes in through just the simple notion of truncated distributions. Jablon: Gentlemen, I have to comment on that. Kaplan and Meier did not pretend to invent the life table. What they did do was produce the first sound formula for the variance of the life table estimate. Greenwood had produced a similar formula, but he made an error in his derivations. That was the fundamental importance of Kaplan and Meier. Mantel: It is not that they claim to have done anything new but that is the way statisticians treat this, as though it began with Kaplan and Meier. R. Peto: Earlier it was mentioned that the published death certification rates, by age, would be substantially in error because of the noninclusion of illegal aliens and certain other unregistered groups. Obviously, in theory one could take a random sample of deaths and try to find out if they had been listed in the census, but I do not know if that would be practical. Is there an estimate as to the extent to which such an error is a problem? Is one talking about 19%, 10%, or what? Is it an order of magnitude or does it vary? Is it perhaps as high as 209%? Selikoff: Let us take Newark, New Jersey, as an example; one estimate would have such an error in rates as high as 20%. Death data are not reported by census tract in New York City, which means errors in census tract comparisons could be high. Dr. Hammond, I know, believes that 13% is by no means out of focus. Peto: Is this true exclusively for nonwhites or do the errors also pertain to other segments of the population? Hammond: Of course, it is difficult for anyone to count the number of illegal cases, precisely because they are illegal. That is the major reason for such a wide range here. Although some people estimate the underreporting to be as low as 5 or 6%), others say it could be as high as 30-50%. As a matter of fact, several years ago a New York Times investigative reporter thought that more than one-half of the total population of Newark was not being counted. The underreporting is largely, but not entirely, restricted to immigrants from the Caribbean and South America, i.e., persons referred to as either blacks or Hispanics. I grant you it would be interesting to know the amount of underreporting for the country as a whole, but I have my doubts about how this could be accomplished. Many studies have been made in particular areas of the country, and, in these situations, it is important that allowance be made for the underreporting in that area. I think it would be impossible to estimate the proportion accurately for the entire country. Many studies have been conducted in New Jersey because of the purportedly high occupational risks there. When we studied various issues in that state, we tried to use rates for Newark or those from northern New Jersey as control rates for our cases. This obviously presented a real problem, particularly because we did not know the magnitude of the underreporting. Should we adjust the rates by 5 or 50%? Peto: What is a census undercount? This is only a small percentage of whites, but in blacks, as you say, it can be of the order of a 20% difference in particular age groups. I do not understand the American population estimates because of the confusion between the estimates corrected for census undercounts and the uncorrected estimates. The correction for census and the undercounts vary grossly between one S-year age group to the next as you go within one census. Then if you take the next census, the correction also varies grossly from one age group to the next but not in the same way. What is the best estimate of what is sensible here for the nonwhites? Would you say 10%, or 20%? Lilienfeld: I do not know the details but the estimate of the undercount comes from a postcensus enumeration survey, which the Bureau of the Census conducts on a sample of the population. On the basis of this enumeration survey, they first make an estimate of the undercount in the regular census and then make the necessary corrections. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 DISCUSSION 1 41 Peto: Is there any way that the fact of death could be linked back to a census population registry? I do not know what kind of linkage might be possible, but if you could say for each death that this was a death of a person who was enumerated in the present census, then you could get some idea of the proportion of deaths among those not enumerated. This answer is not complete but you would have some idea of the magnitude of the problem. Lilienfeld: 1 think that solution would not be possible because the detailed census information is confidential. However, it was done in the preparation of the American Public Health Association Monograph Series, edited by Mortimer Spiegelman. Kitagawa, who selected deaths for 3 months that were then related back, individually, to the census returns, analyzed the relationship of selected socio- economic factors to mortality. Later, a system was de- veloped for linking death certificates to census returns. Whether this could be done again is problematic. S. Stellman: I would like to amplify a question that Dr. Selikoff raised. During the past 10 years or so, we have seen a burgeoning of cohort studies by investigators whose analyses were done using a small number of readily available packaged computer programs. These programs have been developed largely at Government expense. The advantage of using them is that epidemiologists are spared the time and expense of development and programming costs, and the documentation is reasonably good. They have built-in features which permit a variety of exploratory analyses of the kind epidemiologists are likely to want, well-documented statistical tests can be made, and above all, they are easy to use. Thus the epidemiologist can spend more time thinking about data and is less entangled in the problems of programming and debugging. On the other hand, all such programs need a set of standard population mortality or incidence rates. Most use a subset of United States vital statistics, commonly stratified by age, sex, race, and calendar year. Such a set is built into the Monson and other programs. These packaged programs usually permit the investigator to substitute some other set of rates, but as the expense and effort required to construct an alternative set and then incorporate them into existing programs written by others is not small, few authors have made use of this option. Furthermore, alternative standard rates suitable for an individual study are not readily located. Thus we are seeing an entire generation of graduate epidemiologists brought up to believe that use of these packaged programs, with their built-in standards, is per- SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES fectly acceptable, particularly for occupational cohort studies, even though in many if not most studies, death rates for the United States population at large are completely inappropriate. They are becoming an actual standard, despite both a powerful healthy worker effect and regional differences in incidence and death rates. Hammond: 1 fail to see any proper comparison or control group. Any group you could name has its inherent disadvantages. 1 think the only answer and the best approach is to use a variety of control groups to get some idea of the range of effects. One could use standard ordinary life rates, group insurance rates, total country rates, state rates, etc., depending on the area being studied and the population-at-risk. Once you determine the dif- ferences, you can make adjustments accordingly. You cannot get anything much better than that. As far as obtaining such rates on a computer tape, many such sets of data are already available on tape; Mr. Garfinkel has some in his office, as a matter of fact. Do you not see that all these questions and potential solutions depend on the nature of your study? I think there is no one perfect answer to this because each study has its set of problems that requires a little bit of thinking, which is harder than having access to a lot of data on tape. Selikoff: I have a question for Dr. Comstock. In the life tables that you mentioned and their use in cohort studies, I did not hear that it was necessary to establish what happens to individuals in the cohort after enrollment; that may be an important factor. For example, if study participants stop smoking cigarettes, what happens to them afterward is considerably different from what happens to nonsmokers. In all the life tables that we discussed, we have not mentioned the necessity of tracing individuals and making repeated observations to see what happens after enrollment. Secondly, Dr. Comstock, you have correctly shown that the social changes that can be reflected in the use of year-of- birth cohorts can be important. In occupational studies, however, we use year-of-birth cohorts frequently. Do you consider it useful or important to use a year-of-birth cohort in studies of that type? Comstock: I do not have a general answer. 1 think it depends on the particular question you are asking. Peto: Can I give a partial answer? In any first 5 years of an industrial cohort study, what happens is of no interest. Those first years are not informative because if a carcinogen or some danger is present, you would have no measurable effects. Thus you could discount or forget about the first 5 years of employment. SESSION II Chronic Disease Studies: Nonoccupational Cohorts Chairman: Abraham Lilienfeld Co-Chairman: Ernst Wynder Chairman’s Remarks! Abraham Lilienfeld #3 I want to express my personal appreciation to the committee for organizing and inviting me to participate in this Workshop in honor of Dr. E. Cuyler Hammond. 1 was pleased to do so because my personal relationship with him has extended over a period of 20 or 30 years. His contributions to epidemiology have been extensive, as you all know. I became acquainted with Dr. Hammond when 1 was at Roswell Park Memorial Institute, and we were both involved in the cigarette smoking and lung cancer debate. Then 1 became involved in various committees of the American Cancer Society. In fact, Harold Dorn, Mort Levin, and 1 were advisors to the first Cancer Prevention Study, which will be one of the presentations in this Workshop. Dr. Hammond's contributions to epidemiologic meth- odology are not limited to the individual studies he has done, but he has also had a stimulating influence on a great many people in this country. I am indeed pleased that we are honoring him in this way. We shall discuss chronic disease and nonoccupational cohort studies. Back in the 1930s or so, the concept that one should do a follow-up study of a group of individuals to evaluate an hypothesis was proposed by Pierre Charles Alexander Louis who popularized what is known as “la method numérique” or the “numerical method.” He emphasized the need for quantification in clinical medicine as well as in studying disease in populations. He essentially suggested prospective study as a means of determining whether tuberculosis is inherited. He stated: “The tenth part of the subjects who fell under my observation were born of parents, either father or mother, who according to all appearances, had died of phthisis. But, as this disease might have been ! Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Department of Epidemiology, School of Hygiene and Public Health, The Johns Hopkins University, Baltimore, Maryland 2120S. 3 Dr. Lilienfeld died in Baltimore on August 6, 1984. transmitted in these cases, or have been developed independently of such influence, and as 1 knew nothing of the manner of death of the brothers and sisters of these patients, it follows in reality that I have observed nothing decisive in favor of the hereditary character of phthisis. I may remark that the propor- tion of phthisical patients born of parents who died of tuberculosis, is probably below the truth in my notes; inasmuch as it is far from being always possible to ascertain from hospital patients the nature of the affliction to which their parents fell victim. But it is obvious that in order to substantiate the exact amount of hereditary influence, it would be necessary to draw up tables of mortality, by means of which we should have the power of comparing an equal number of subjects born of parents who were phthisical and who were not so.” However, he did not conduct such a study. A longer period had to elapse before such studies were done. Earlier we heard about the actuarial influence on epidemiology. Before the involvement of epidemiologists in prospective studies, sociologists were conducting such research. Back in the early 1920s, F. Stuart Chapin, who was professor of sociology at the University of Minnesota, and other sociologists were conducting what they called “pro- jective studies,” during which they followed groups of individuals. These studies were done with respect to juvenile delinquency and similar types of sociologic problems. How does this relate to epidemiology? I believe this kind of thinking probably got involved in the disease area because several individuals with sociologic backgrounds became interested in epidemiology, including Edgar Synderstricker, Harold Dorn, and others. We will now have reviewed for us 4 long-term studies which stand as models upon which other studies have been based. I understand that the idea of these presentations was not to report results because many of us know them but to emphasize the various methodologic aspects, such as the methods of selection, follow-up, analysis, and problems associated with these methods. 45 oh Co-Chairman’s Remarks! Ernst Wynder 2 Longitudinal cohort studies have played an important role in identifying the noninfectious factors which con- tribute to human disease. Most of the clues which lead to these cohort studies have been generated by retrospective case-control studies. Thus it is well recognized that the prospective studies by Hammond and Horn, and those of Doll and Hill would never have been conducted if retrospective studies had not already shown a high correla- tion between cigarette smoking and lung cancer, other types of cancer, and cardiovascular disease. We may ask what additional information these longi- tudinal studies have contributed beyond providing absolute, rather than relative, risks? Of course, establishing a relationship derived from both of these different ap- proaches provides even firmer evidence of an association. In certain cases, risk factors can most appropriately be determined by longitudinal studies. This was exemplified in establishing lipid and blood pressure levels as risk factors for coronary artery disease, inasmuch as both values could be affected by a heart attack and could not be measured in the event of sudden death. Cohort trials would also provide the best means for one to examine the possible benefit of mammography in improving breast cancer survival. Where indicated, recognizing the relative cost-effective- ness and limitations of each approach, we need to conduct both case-control and cohort studies. Pasteur stated that a scientist needs not only to discover but also to become involved in the application of the discovery. Today's epidemiologists should become increas- ingly involved in the application of their studies. After all, I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 American Health Foundation, 320 East 43d Street, New York, N.Y. 10017. the basic goals of epidemiology and prevention are the reduction of risk factors and ultimately the decline of disease. Those who have participated in the smoking and health issue recognize that the application of findings may have proved to be more difficult than the discovery itself. To accomplish such application, we suggest increased establishment of centers for disease prevention. The Ameri- can Cancer Society’s support of Special Centers for Cancer Cause and Prevention is certainly a step in the right direction. It is to be hoped that the National Cancer Institute will also support the establishment of such centers. To be fully effective, Cancer Prevention Centers require a critical personnel mass of epidemiologists, chemists, bi- ologists, psychologists, health educators, and experts in health promotion. We can take satisfaction in the signifi- cant decline of cigarette smoking that has occurred, particularly in special population groups, a decline followed by a reduction in tobacco-related disease. We can also take credit for a reduction of average serum cholesterol levels in the American population; undoubtedly, this is one factor which contributed to the reduction of death from coronary artery disease in our country. Because it is now well established that risk factors, such as serum cholesterol and blood pressure, tend to be set early in life, greater stress must be placed on the prevention of risk in our children. It is also well known that the smoking habit and obesity have their inception in child- hood. The “Know Your Body” School Health Education Program of the American Health Foundation is designed to reduce these risk factors in our young people. The phases of epidemiology, which we labeled “applied epidemiology” and others as “operational epidemiology,” work to reduce risk factors to determine what is effective, and, through retrospective and prospective studies, to bring about the decline of disease. This decline is obviously the basic goal of epidemiology in preventive medicine and for that matter in medicine itself. 47 Selection, Follow-up, and Analysis in the American Cancer Society Prospective Studies ' Lawrence Garfinkel ? ABSTRACT —The organization and selection characteristics of the American Cancer Society’s prospective studies are reviewed, and problems connected with the follow-up procedures are discussed. Also included are descriptions of some of the features of analysis in cohort studies.—Natl Cancer Inst Monogr 67: 49-52, 1985. The American Cancer Society prospective epidemiologic study, CPS I, begun in October of 1959 and continued through October 1972, was designed and directed by Dr. E. Cuyler Hammond. A unique feature of this study was the use of American Cancer Society volunteers to obtain the questionnaire data. This was the same design used in the Society’s study of smoking habits conducted in 1952-55. The 4-page confidential questionnaire contained data on numerous factors suspected as being related to cancer, e.g., family history; surgical operations; habits, such as smok- ing, drinking, and diet; use of drugs and medicines; occupa- tions; and occupational exposures. The organization of the volunteers was by the pyramid structure used in fundraising activities: A volunteer chair- man was appointed in a county unit, who, in turn, recruited group leaders (usually from a club or an organization). The group leader was responsible for enlisting 5-10 volunteer researchers who were given the task of distributing and collecting the questionnaires. The volunteer researchers were asked to contact friends, relatives, neighbors (people they knew well) and others with whom they would expect to be in contact over the course of the study. Each year, for 6 years, the researchers were asked to report the vital status of the persons contacted (alive or dead), record the date and place of death of those who died, and to note changes of address or changes of names through marriage or divorce. Every other year (in 1961, 1963, and 1965), they again contacted the subjects to ask them to complete a short questionnaire. These question- naires served 2 purposes: 1) obtained additional informa- tion including changes in smoking habits, and 2) obtained ABBREVIATION: CPS I (II)=Cancer Prevention Study 1 (II). ! Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Epidemiology and Statistics Department, American Cancer Society, 4 West 35th Street, New York, N.Y. 10001. 3 These were Arizona, California, Florida, Georgia, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maryland, Massa- chusetts, Michigan, Minnesota, Mississippi, Missouri, New York, North Carolina, Ohio, Oregon, Pennsylvania, South Carolina, Tennessee, Texas, and Virginia. an independent check on the accuracy of the volunteers’ reports on status. We then secured death certificates from state depart- ments of health for those reported dead. In addition, for the first 6 years of the study, physicians who certified the deaths were contacted for all cancer deaths and asked to supply additional information about the bases of the diagnoses and to certify the primary sites of the cancers. SELECTION OF SUBJECTS Volunteers were recruited from all social classes so that we would get a broad spectrum of replies. The researchers tended to recruit subjects in the same socioeconomic class as themselves. We did not recruit illiterates, institution- alized populations, itinerant workers, or illegal aliens. We asked researchers to avoid recruiting people who tend to move often (military personnel, construction workers, oil field workers, etc.) because they might be difficult to trace. Enrollment in the study was conducted through divisions of the Society (a division usually corresponds to a state). The divisions selected to participate in the study had to meet several criteria: 1) be large enough to yield a large number of replies; 2) have a well-established volunteer organization; and 3) have a state health department that would cooperate with us and furnish death certificates. The divisions of the Society have autonomous boards of directors who had to vote approval for the division’ participation in the study. Each division was given a quota based on population and an assessment of the strength of its volunteer organization. The division leaders based their decisions on which county units would participate on such factors as degree of urbanization, extent of volunteer organization, and availability of volunteer leadership. Some decided that all of their organized units would participate, others restricted participation to the more urbanized units, and some believed they could recruit volunteers more readily in small towns and rural areas. DESCRIPTION OF POPULATION Twenty-five states (29 American Cancer Society divi- sions) were included in the study population,’ with only a few from the Southwest and the Rocky Mountain States. Recruitment was fairly evenly divided into large cities, small cities and suburbs, small towns, and rural areas. Overall, we recruited about 3% of the population over the age of 45 in the 1,121 counties in which subjects were enrolled. However, recruitment was less successful in large cities. Many subjects living in the inner city areas of our larger cities were not enrolled. Generally, the educational level was much higher than the country as a whole. We 49 50 GARFINKEL found that 37.6% of the males and 36.19% of females had some college education or were college graduates. Because enrollment was by family groups, 78% of the enrollees were married, 6% single, 14% widowed, and 2% divorced or separated. More than 979% were white, 2.2% black, and less than 19% Oriental. The rule for enrollment was that at least | member of the household had to be over the age of 45; every member in such a household over the age of 30 was asked to complete a questionnaire. This resulted in an age distribution in which the largest class was 45-49 years old (21% of sub- jects), after which the percentage in each group declined rapidly. Only 149% of the subjects were under 45. FOLLOW-UP METHODS In the general plan of the study, the volunteer researchers were to report annually on the status (alive or dead) of the subjects they enrolled. Because the instructions were to enroll people they knew well, this would seem to have been an easy task, and indeed it was for most researchers. However, some researchers moved away or died in subse- quent years, many of the subjects moved (an estimated 20%, moved to a new address in the first 2 yr of the study), and some researchers did not follow instructions. They enrolled some subjects they did not know well. Therefore, at the start of the study each researcher was requested to provide a substitute researcher who knew the same people and could report on them if the researcher would be unavailable. In some instances, the group leader was the substitute. As support for the volunteer organization, the Society staff in local units and branches had the responsibility of ascertaining that follow-up forms were delivered and re- turned promptly, asking a substitute researcher to partici- pate if the researcher was not available, and assisting the group leader in finding a new volunteer if neither researcher was available. If the volunteer could not trace some of the persons on the list, then staff personnel were responsible for the tracing. After follow-up forms were sent to the national office, a reminder to trace all subjects checked “unknown” was sent to the units. Unit volunteers were urged to make several attempts to trace the missing subjects before a report of “not traced” was accepted. In a few instances, the administrative structure in a county unit broke down because a unit chairman or a unit staff executive was not available. In these instances, teams of national office specialists were sent to a unit to recruit new volunteer leaders, call researchers, and sometimes do the tracing of subjects. The unit staffs prepared a card for each person reported to be dead, complete with identification numbers, name, address, date, and place of death. For those reported to have died in other states or out of the country, they sent the cards to the respective state health departments to obtain the death certificates. Although most of these cards were filled out accurately, many of the subjects were reported to have died several days, months, or even years at variance with the actual date of death. Such reports were returned to county units for verification. If no further information could be found, a considerable amount of searching at the bureau of vital statistics had to be done, much of it by national staff field personnel. In some instances, names on the questionnaires were not the same as those on the death certificates, which required considerable investigation. When the person who died had a commonly found name, the date and place of birth, residence, informant’s name, and other vital data on the death certificate had to be double-checked against the data on the original question- naire for verification that the death certificate was correct. The number of these discrepancies and incomplete reports was sufficient to make a computer check an unlikely method of obtaining all the death certificates. Persons were reported to have died within the state in which they were enrolled, but they actually died in another neighboring state. Some of the ages were inaccurate by as many as 20 years in comparison to the ages indicated on the questionnaire. In a sample of about 20,000 death certifi- cates, 92.3% of the participants died within the state of enrollment; 7.5% in another state, and 0.2% out of the country. In addition to the annual tracing by the researchers, every other year (1961, 1963, and 1965) a supplemental questionnaire was prepared for each participant and sent to the researchers for distribution. This questionnaire (printed on an International Business Machines Corp. card) asked about hospitalizations since the last questioning, diagnosis of cancer, changes in residence, and current smoking habits. Although answers to these questions would provide relevant and pertinent data, the questionnaires also served as an independent check on the researchers. We found some instances in which a subject was checked “alive” or “unknown” on the follow-up form, but a member of the subject’s family returned the questionnaire card with a note stating that the subject was dead. More than 959% of survivors returned completed questionnaire cards. In addition to this independent check, we matched our entire name file in | state against all the deaths recorded over a period of 4 years. We found no deaths previously unknown to us in the follow-up. FOLLOW-UP IN 1971-72 The subjects were followed annually for 6 years, and more than 989% were successfully traced. According to the original design of the study, follow-up stopped after 1965. In 1971, we decided to renew the follow-up for 2 major reasons: 1) Interest in low-tar, low-nicotine cigarettes was great, and we needed to determine the effect that smoking such cigarettes had on the mortality rate; and 2) too few deaths were listed from some of the less common sites of cancer during the first 6 years of follow-up, and additional follow-up was needed if we were to relate these sites to etiologic factors. In the Spring of 1971, a feasibility study began. National staff members, assigned to trace 2,000 persons still alive in 1 county in California, found that they were able to trace 95% of the surviving subjects with little difficulty. This amount of success was encouraging, and we asked partici- pating divisions if they would cooperate in doing additional follow-up. Three of the original 29 divisions in the study decided that they would not participate in the new follow- up, but the others agreed to continue. The follow-up began NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES 51 in the Fall of 1971, and we repeated this renewed contact with an additional questionnaire in the Fall of 1972. The follow-up proved to be successful with 98.49% traced through September 1971 and 92.8% through September 1972. Our requesting the volunteers to distribute the questionnaire in 1972 slowed down the follow-up process. Because some units had administrative problems, we sent teams of national office follow-up workers to complete the job. Toward the final stages, the staff did not attempt to get questionnaires completed but tried only to trace whether the subjects were alive or dead. As a result, only 75% of the survivors returned the 1972 questionnaire. The follow-up was terminated because tracing became increasingly diffi- cult due to death or movement of the researchers and substitutes. FOLLOW-UP OF SELECT ELDERLY PERSONS In 1976, we decided to continue to trace a small segment of the population, so that we could analyze mortality rates for cancer and other diseases in persons who have lived a long time. We selected those born in 1887 and earlier. Of 51,438 such persons at the start of the study, 14,000 were still reported alive by the completion of the last follow-up conducted in 1972. Follow-up by national office field per- sonnel began in the Fall of 1976 and was repeated at 2-year intervals in 1978, 1980, and 1982. By now (October 1983) only 1,422 subjects are still alive and only 69 have not been traced. Death certificates have been obtained for all subjects reported dead. (Those still alive when traced were at least 94 years old if male and 96 years old if female.) We plan to continue tracing these subjects biannually. ANALYSIS PROBLEMS A cohort study diminishes some of the analytic problems inherent in case-control studies, one of which is choice of an appropriate control group. Selection of appropriate hospital controls or matching neighborhood controls is not required in a prospective study. Another problem is recall of frequency of habits; in case-control studies, subjects are asked to recall their habits (smoking, dietary, etc.). Some of the same recall problems exist in prospective studies, but the exposure to risk is usually dated from the start of the study, i.e., the time the questionnaire is completed. In a cohort study, one can deal with potential confounding factors directly by examination of their effects and make adjustments, if necessary. Also, one can examine the consistency of findings by age and sex groups and not have to adjust for age as a confounder as is done in some case-control studies. Moreover, synergistic effects of 2 or more variables can be investigated directly with large numbers. One of the useful elements of a prospective study is that the population can be defined in various ways depending on the hypothesis being tested. For a study of coronary heart disease, persons with a history of heart and vascular diseases may be excluded, but they may be included in a study of cancer death rates. We found that people who said on their questionnaire that they were “sick at present” or had a history of serious disease, such as heart disease, SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES stroke, or high blood pressure, had death rates three times as high as persons who reported none of these, so we controlled for these conditions when making a number of different analyses. One of the most satisfying methods of controlling confounders is the matched groups analysis. Only in a study as large as CPS I is control possible on 10 or 15 factors at the same time. We are sometimes queried as how one can draw conclusions from a population that is not a random sample of the United States population. It is true that CPS I and 11 are weighted with more highly educated groups compared with the population at large and have more native born, fewer blacks, and fewer Hispanics than the United States population. However, in analysis of mortality rates in relation to various life-style and exposure factors, these selective factors carry little weight because comparisons are made internally. Rates for various segments of the study population can be adjusted to the total United States population. CANCER PREVENTION STUDY II In September 1982, the American Cancer Society launched a new prospective study, CPS II, which was started because of the new factors in our life-styles and environment that have come to our attention since the first study began. A questionnaire was designed with the help of about 50 consultants and pretested by 200 volunteer researchers in 10 areas of the country in trials that included 3,000 questionnaires and interviews with volunteer partici- pants. After changes in wording, it was tested again in 3 other areas with 600 subjects. The study was conducted in all 50 states (58 divisions of the American Cancer Society). The plan and the organiza- tional structure were the same as in CPS I. Starting in September 1982, 75,000 volunteer researchers enrolled 1.2 million persons and asked them to complete a confidential 4-page questionnaire on their life-styles and exposures. Most of the questionnaires were completed in September and October, and virtually all were completed by the end of November. This contrasted with CPS I in which enrollment was over a 6-month period. These data are currently being processed, and a file of the first 157,900 questionnaires received or about 13% of the total has been prepared. Questionnaires from 48 of the 58 divisions are included. The instructions for enrollment were the same as in CPS I: Enroll families in which at least | person was 45 or older, ask each member over age 30 to complete the questionnaire, and try to enroll older people, if possible. The researchers apparently followed instruc- tions. Figure 1 shows a comparison of the age distribution of men and women in the sample population in CPS I and II. The percent of men and women in ages 40-49 dropped by about 11%, but those aged 60 and over increased by 8%. The subjects are much more highly educated than the total population. Whereas 32% of the men and 23% of the women have had a college education, 19% of men and 15% of women have not completed high school. A special attempt was made for minority group enrollment; in this sample, 59% was black and less than 19% Hispanic. 52 GARFINKEL MALES FEMALES 704 | 8.0% _— 70+ 8.4% _— 60-69 | 20% 60-69 18.4 29.0 259 >. iY a Ss a = D N, E 50-59 38.2 3 Sess. 33 © o w w 345 [LO] 37.4 << bY Ne 40-49 29.1 ™ $0249 3a 17.6 22.1 <40 4.1 = ry <40 6.4 ie 71 STUDY: CPS-I CPS~-11 CPS—1 CPS—1I1 NUMBER: 440558 68290 56267! 89646 FIGURE 1.—Age distribution of males and females in CPS 11 (1982) vs. CPS 1 (1959). Follow-up will be conducted every 2 years to save costs, demiology as related to chronic disease. More than 80 the first to begin in October 1984. papers have been published based on the data collected in CPS 1. They include some of the most important contri- SUMMARY butions to the epidemiologic literature on smoking and health. The data base has also been used for constructing The American Cancer Society prospective studies, de- control groups for studies of occupational exposure of signed and initiated by Dr. E. Cuyler Hammond, have asbestos workers, Seventh-Day Adventists, and other made important contributions to our knowledge of epi- groups. Selection, Follow-up, and Analysis in the Atomic Bomb Casualty Commission Study 2 Seymour Jablon ? ABSTRACT —More is known about ionizing radiation as a cause of human cancer than about any other carcinogen. Most of this knowledge is derived from the studies conducted by the Atomic Bomb Casualty Commission and Radiation Effects Research Foundation on about 100,000 Japanese survivors of the atomic bombing in 1945. The importance of these studies is based on the large size of the exposed population and the fact that individual estimates of radiation dose were possible. These factors and the combined excellence of the centralized vital statistics reporting and population registration systems in Japan have made feasible the continuing longitudinal studies of cancer mortality by site in relation to radiation dose over a span of more than 30 years. Excellent voluntary cooperation by the survivors has enabled the continuation of a biennial physical examination program which has made possible the acquisition of blood for studies of radiation-induced chromosomal aberrations and muta- tions at the level of specific genes. Similarly, with the cooperation of local universities, hospitals, and physicians, tumor and tissue registries necessary for the study of cancer incidence have been developed. An autopsy pathology program has enabled study of the accuracy of cause of death certification.—Natl Cancer Inst Monogr 67: 53-58, 1985. This subject is especially appropriate for a workshop which honors the contributions of Dr. Cuyler Hammond. The first medical studies of Japanese victims of the bombings of Hiroshima and Nagasaki were done in late 1945 by the Joint Commission for the Investigation of the Atomic Bomb in Japan (7). Members of the Commission brought back to the United States an enormous quantity of data concerning casualties that was analyzed under the direction of Major E. Cuyler Hammond, who was then in the Air Corps. Dr. Hammond is, if not the father, at least a grandfather of the epidemiologic and biostatistical studies of the Japanese who were exposed to the atomic bombs. ABBREVIATIONS: ABCC=Atomic Bomb Casualty Commission; RERF=Radiation Effects Research Foundation; LSS=Life- Span Study; T-65=tentative 1965 dosimetry system; AHS= Adult Health Study. I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Supported in part by the Department of Energy under contract DE-ACO1-76EVO3161. 3 Commission on Life Sciences, National Research Council, 2101 Constitution Avenue, Washington, D.C. 20418. As much, and perhaps more, is known about ionizing radiation as a cause of cancer in man than about any other carcinogen. Much has been learned about cigarette smoking and cancer, due in no small part to Dr. Hammond’s studies. However, for ionizing radiation, we have better measures of the dose of the carcinogen, and, because of an enormous amount of experimental work, ideas have been proposed concerning the mechanisms of radiation carcinogenesis, ideas which, if not proved conclusively, are at least plausible. Explicit estimates of excess cancer incidence and mortality as functions of radiation dose are available. Much is known about latent periods and about the ways in which radiation interacts with other risk factors in the causation of cancer. Also, we have enough information at least to nourish controversy about the exact shape of the dose-response curve at the low-dose end of the scale. At the foundation of this knowledge are the data and analyses concerning the Japanese A-bomb survivors, data which continue to come from the ABCC and its successor organization, the RERF. The estimate made by the United Nations Scientific Committee on the Effects of Atomic Radiation is that a l-rad dose to a normal population of 1 million persons will induce 100 to 200 extra fatal cancers in that population’s remaining lifetime (2); this represents an increase of less than one-tenth of 19 in cancer mortality and, as a practical matter, would be unmeasurable in a statistical sense. Evidently, if one wants to obtain reasonably good estimates of radiation cancer risks, what is needed is a population which has sustained many millions of person- rad and which can be followed for several decades. This requirement explains the central importance of the studies in Japan in which a survivor population of just over 82,000 persons, with a total population dose of about 2.2 million person-rad, has been followed for more than 30 years. The average dose to the survivors was about 27 rad. The experience was unique, and we hope it will remain so. When the ABCC began operation in 1948, little was known about the effects of radiation. Muller (3) and others had shown that radiation caused mutations, so a study of mutations that might result in increased rates of congenital abnormalities in the offspring of the survivors was an obvious target. Many thought that somatic abnormalities might be found among the survivors that would be pathognomonic of radiation injury; such expectations were not realized. However, the first essential requirement of the program was an enumeration and listing of the survivors in the 2 cities. 53 54 JABLON The ABCC conducted various censuses in 1949 and 1950, and persons so enumerated were included in some early studies. Two large cohorts serve the needs of the most important of the continuing studies: 1) Neel and Schull, who conducted the genetics study, took advantage of the fact that, because food was rationed in Japan during the first few years after the end of the war, women in the fifth month of pregnancy could register for extra food rations. Registration for the genetics program began in mid-1948 and continued until early 1954. During that time, just over 71,000 pregnancies were listed. Outcome variables included stillbirth, neonatal death, birth weight, malformation, and sex. No significant relationship between exposure category of either or both parents and any of the outcome events was demonstrated (4). In recent years, with the use of biochemical techniques, the problem of radi- ation-induced mutation at the gene level was addressed. The rapid development of methods for examining DNA gives us hope that, through the enormously increased sensitivity of the newer laboratory procedures, we may be able to measure the specific gene mutation rate per rad. 2) The Government of Japan undertook a national census as of 1 October 1950, and, in response to a request from the ABCC, collected a supplemental questionnaire on which the question was asked: “Was any member of this household in Hiroshima City or Nagasaki City at the time of the A-bomb?” Affirmative respondents to this question constitute the sources for practically all ABCC cohort studies of the survivors, except those on genetics. In all Japan, 284,000 persons replied affirmatively to the question, of whom 195,000 were residents of the 2 cities (5). The ABCC field staff had to interview each person (or sometimes a surrogate informant) to determine whether he or she had been present in either city, and if so, exactly where. The interview process proved about 5-109 were not in the cities at the time of the bombings. Other exposed persons throughout Japan and in the 2 cities undoubtedly denied exposure, but their number cannot be estimated closely. At the time the sampling plan for the LSS was determined (5), no accurate information was available concerning radiation doses at various distances from the hypocenters. From earlier studies, especially those of the Joint Commission (/), it was known that few persons in either city who were exposed and who were more than 2,500 m from the hypocenters showed such evidence of radiation injury as epilation or purpura. The sampling plan was devised in 1955, 10 years before the so-called T-65 system of dosimetry (6) came into being. On the basis of such fragmentary data as were available concerning what the radiation doses might have been and in consideration of logistic constraints, the ABCC leaders decided to include all of those who were exposed at distances less than 2,500 m and to use 2 comparison groups. These groups were matched by age and sex to the 28,000 who were exposed within 2,000 m and thus had the largest radiation doses among all who were exposed. The comparison groups included about 209% of the 130,000 persons in the cities who were exposed and who were more than 2,500 m away. A second, nonexposed group was also selected (table 1). The LSS is a study of mortality. The task of tracing TABLE 1.—LSS sample, Hiroshima and Nagasaki Distance from No. of hypocenter, m persons <2,000 28,142 2,000-2,499 16,663 >2.,500 28,010 Not in city 26,574 Total 99,389 mortality on a continuing basis for a population of 100,000 persons is a daunting one. Accomplishment of the task within even liberal budgetary constraints depended on the possibility of taking advantage of Japanese administrative reporting systems, which seem almost designed for the purpose. In Japan, every citizen is enrolled in what is called a “koseki” or family register. The koseki are maintained by the mayor or village head man, who has authority over the place of permanent family residence or “honseki.” Under Japanese law, every birth and death must be entered into the koseki. When a couple is married, their names are removed from the existing koseki and a new one is established for the newly formed family. If one can learn initially where the koseki for any person is maintained (the honseki), that person can be traced until death. Further- more, the notice of death in the koseki contains enough data for the acquisition of a copy of a transcript of the death certificate which is filed at the local health center that has jurisdiction over the place of death. The cooperation of the relevant ministries of the Japanese Government has made these resources available for the studies. Because mortality was to be ascertained by use of the koseki records, all survivors who were not Japanese citizens and, therefore, had no koseki were excluded from the sample. Many Koreans who were brought to Japan during the war as laborers were in this category. The small number of sample members who migrated from Japan after selection (less than 100) and for whom the koseki are not maintained have also been deleted from the sample at the time of emigration. In an effort toward improvement of the matching of survivors to nonexposed immigrants, as originally selected, the cohort was restricted to persons whose honseki was in Hiroshima or Nagasaki (table 1). Any gain that might be achieved by improved matching was more than offset by losses of the already small number of persons who had survived large radiation doses. This consideration acquired additional force because those who conducted analyses of radiation effects relied principally on internal analyses within the survivor group in relation to radiation dose; the nonexposed component of the sample had a distinctly subsidiary role. Therefore, the cohort was extended by the addition of the survivors who were within 2,500 m and who had been excluded originally because their honseki was not in either city. This extension added more than 9,000 survivors to the cohort (table 2). Table 2 shows clearly the abnormal sex and age distribution of the survivor population caused by the NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 STUDIES OF ATOMIC BOMB SURVIVORS 55 TABLE 2.— LSS sample (extended) by age and sex at the time of bombing Age at time of No. of Percent Deaths, 1950-78 bombing, yr persons female No. Percent 0-9 20,578 50.8 495 24 10-19 23,921 56.2 1,275 5.3 20-34 22,112 71.8 2,188 9.9 35-49 25,265 57.5 7,993 31.6 50+ 16,884 53.0 11,551 68.4 Total 108,760 58.2 23,502 21.6 absence from the cities of males who were elsewhere (for the most part in military service). No less than 71.89% of persons 20-34 years old were females. One must assume that, especially in this age group, adverse health selection among males was strong because by the end of the war the Japanese Army was not overly discriminating con- cerning the physical fitness of desperately needed re- placement troops. From the data in table 2, one can surmise that the experience of the oldest survivors, i.e., those over 50 years in 1945, is approaching completion. By 1978, more than two-thirds of them had died, and the youngest survivor was 83 years old. Certificates, obtained for all deaths, are coded according to current versions of the International Classification of Diseases and have been put on magnetic tape preparatory to analysis since the early 1960s. A copy of every death certificate filed in Hiroshima and Nagasaki is routinely sent to the RERF; approximately 80% of the certificates needed are obtained in this way because only about 20% of the cohort has migrated from the cities, chiefly young men seeking employment elsewhere, and young women leaving at the time of marriage. One essential variable in the analyses is the radiation dose received by each person at the time of the bombing. An ambitious so-called shielding program required detailed interviews with tens of thousands of survivors for verifica- tion of exact places of exposure and details of shielding configurations. Because the 2 explosions were high air bursts, i.e., about 600 m high, little or no local fallout occurred in the range within 2,000 m of the hypocenters where the initial radiation was important. Most of the survivors were in houses of typical (light wood) Japanese construction, but shielding characteristics varied with number of stories, orientation of the house, presence or absence of adjoining houses, etc. Persons who conducted the shielding interviews were armed with pre- strike aerial photographs which showed in remarkable detail every structure in the cities. The survivor, or sometimes another informant, could pick out the house in which the survivor was exposed. Detailed drawings in 3 views had to be made for each survivor in a house so that attenuation of gamma rays and of neutrons due to the structure and the position of the survivor within it could be calculated (figs. 1-3). Determination of the radiation fields from the 2 detona- tions has been a most difficult task, one which is presently being re-done. The fields consisted of a mixture of neutrons SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES Master File Number Name 274-114 location at Time of Bomb Ote-machi Gchomees | Scale: Co-ordinates Distance 1: 1200 44.03% 60.30 1L,323M. SEE OMANTER FLLE NUMBER: OTE-MACH| 8-CHOME | ~ SINGLE STORY JAPANESE TYPE HOUSE 2+ 2 STORY " FIGURE |.—Shielding history area. and gamma rays which derived from several processes: so- called prompt radiation emitted by the bombs in the process of exploding and delayed radiation emitted from the ascending fireballs that resulted from radionuclides produced in the bomb materials by the detonations. These radiations interact with building materials in complicated ways. The T-65 was devised by the Health Physics group at the Oak Ridge National Laboratory in the early 1960s. In the 20 years since this system was created, computer programs have become increasingly sophisticated, and knowledge of neutron cross-sections has been refined. It now appears that major revision of the T-65 is required. In all probability, this revision will reduce the estimated doses on the average and thus increase the estimates of cancers or other health effects induced per unit of exposure. Most of the 79,856 survivors in the extended cohort had doses estimated in the T-65 system at less than 10 rad (table 3). Less than 13% had doses of 50 rad or more, and only 3.6% doses of 200 rad or more. However, 49 of the 180 leukemia deaths, or 27%, have occurred among the 3.6% with the largest doses. The pattern is similar but far less extreme for deaths from cancers other than leukemia. Five percent of such deaths occurred among the 3.6% with doses of 200 rad or more. The percent of persons in the cohort who died from leukemia or other cancer during 1950-78 increased regularly with increasing dose (table 3). An exhaustive account of all of the data collection 56 JABLON Master File Number Nane 274-114 location at Time of Bomb ote-machi q-Chome 68 | scale Co-ordinates Distance 1100 44.03 x60.30 7.323 Mm. TABLE 3.— Number of survivors and deaths from leukemia or other cancer by T-65 dose estimate among exposed survivors in the LSS extended cohort” SEE MASTER FILE NUMBER: N KITCHEN |ENTR, 45M WOODEN 2MATS FLOOR cops CLOSET Alcove] IMATS 6MATS — AUCOVE BATH ROOM pr TOILET s from: T-65 total Deaths from radiation dose, No. of i Leukemia Other cancer ad survivors No. Percent No. Percent 09 54,654 70 0.13 2,994 5.5 10-49 14,942 33 0.22 891 6.0 50-99 4,225 11 0.26 259 6.1 100 199 3,128 17 0.54 205 6.6 =>200 2,907 49 1.69 227 7.8 Total 79.856" 180 0.23 4576 5.7 FIGURE 2.— Shielding history floor plan. mechanisms that have been used over the past 30 years would be unrewarding. However, some of the more important ones are: 1) Information concerning such epidemiologically im- portant factors as occupation, residential history, and smoking habits was obtained by interviews and question- naires; women have been queried about histories of pregnancies. 2) Tumor and tissue registries have been in operation in both cities for 20 years; personnel in these registries have supported studies of cancer incidence and enabled analysis of radiation-induced cancers not only by specific site but also by cell type. The registries are maintained in collabora- tion with local physicians’ associations, hospitals, and university medical schools. 3) The AHS, a biennial examination program, is aimed at the diagnosis of disease and obtaining specimens of blood for chromosomal analysis and sera for hormone and immunoglobulin assays. The principal late health effect of exposure to ionizing radiation is cancer. Whereas fatal can- cers are detected by means of the mortality ascertainment program of the LSS, some cancers, including such radia- tion-inducible ones as those of the breast and thyroid, are not always fatal, e.g., thyroid cancer is seldom fatal. Thy- roid cancer may never be diagnosed unless a physician searches for it by palpation. Information concerning the 4 Source of data is (10). " This total excludes 2,386 survivors with unknown dose. radiation risks for these cancers derives chiefly from the AHS examination program and the tumor registries. The original LSS cohort was too large for inclusion in a biennial examination program. Therefore, it was sampled to produce a cohort of 20,000 persons, as of the census date in 1950 (table 4). The core group of 5,000 consists of all the original LSS members who were exposed within 2,000 m of the hypocenters and who also reported symptoms of acute Master File Number Name 274 - 14 8 Location at Time of Bomb ree. Ote-machi 9-chome 68 | Scale: Co-ordinates Distance 1: 100 lL 4403x60.30 L323 Mm SEE MASTER FILE XMIMRER 580 m — 1, B08 I jeeeeeemmeemsemmissme FIGURE 3.— Shielding history cross-sectional. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 STUDIES OF ATOMIC BOMB SURVIVORS 57 TABLE 4.—AHS sample in relation to LSS sample Symptoms Distance from of acute No. of No.in LSS hypocenter, m radiation persons sample from * . which drawn injury <2,000 Present 4,993 4,993 <2,000 Absent 4,987 23,149 2,000-2,499 Any — 16,663 =2,500 Any 4,990 28,010 Not in city Any 4,992 26,574 Total 19,962 99,389 radiation injury. Three other groups were selected and were age- and sex-matched to the core group. In 1957, when the AHS sample was defined, dose estimates for individual survivors were still not available. The samples had to be defined in such crude indicators as distance from the hypocenters and presence or absence of symptoms of radiation injury. After 1965, when survivors could be assigned individual estimates of dose, the grouping by distance and symptoms lost its importance. Since then, analyses of the data have included estimated radiation dose as the primary variable for study. Just under 19% of the sample had doses estimated at 100 rad or more, and another 199% had doses between 10 and 100 rad. The remainder of the sample had doses smaller than 10 rad or were unexposed immigrants to the cities after the bomb- ings (7). The most important product of the LSS is the informa- tion concerning the relationship of mortality by cause to radiation dose. Recognizing that the data are unique, the ABCC and REREF staffs have published mortality data in great detail by sex, city of exposure, and age at exposure for 7 successive 4-year periods and for 8 dose classes, from 0 rad to 400 and over. The supplementary tables to Report 9 run to 355 pages (8). A catalog of analytic methods that have been used to fit models to the data include pro- portional hazard, log-linear regressions, and regression of mortality rates on competing models which are based on radiobiologic experimental data. Analyses of mortality depend on death certificate tran- scripts; most mortality studies are based on death certifi- cation. The question must be asked about how well death certificate diagnoses can support analyses for specific tumor types. A large autopsy pathology program at ABCC has explored just this question. Beginning in 1960, in- vestigators made a vigorous effort to procure autopsies in every decedent in the LSS and succeeded in obtaining autopsies in over 40% of all deaths in the cohort. This percentage does not sound especially impressive when compared with autopsy rates in some teaching hospitals, but less than 40% of the deaths occurred in hospitals. An intensive effort was essential if every death in the 2 cities was to be reported within hours after it occurred and permission for autopsy secured. In 1962, a rate of just over 40% was achieved for the 60% of deaths that occurred at home (9). In Japan, cremation is customary soon after death, and certificates are filed before autopsies are performed; thus the ante-mortem clinical diagnosis is SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES essentially independent of the autopsy diagnosis. The results of a comparison were revealing: Nearly 95% of deaths ascribed to malignant neoplasms on certificates were confirmed at autopsy, but only 77% of deaths due to cancer as determined by the autopsy were so noted on the death certificates (table 5). This analysis was based on over 3,700 autopsies. If interest is not in malignant neoplasms as a class but in specific cancers, results were less good. In particular, only 55% of deaths due to lung cancer were diagnosed as such on the certificates. Confirmation rates were generally larger than detection rates, and, although the autopsies identified 1,256 deaths from cancer, only 1,022 were noted on the certificates. Therefore, the cancer death rate was only 81% of what it would have been had every death in the cohort been classified on the basis of an autopsy. These are Japanese data for the decade of 1961-70, and it would be rash for one to generalize them widely. They are cautionary. Death certificate-based mor- tality rates for particular sites, especially for such sites as the pancreas and liver, in the absence of high rates of pathology confirmation, may be highly erroneous. What has been learned from these arduous and expensive studies? Principally, we know ionizing radiation does cause cancer and that different cancers are unequally increased in incidence by radiation, the most sensitive being leukemia; cancers of the esophagus, stomach, colon, lung, breast, and urinary tract; and multiple myeloma (/0). The induced leukemias began to occur about 2 years after irradiation, rose to a peak at 5-7 years, and then decreased in number until about 30 years after the bombings. Contrastingly, the induced solid tumors had minimum latent periods of 10-15 years, after which their number increased and, 33 years after the bombings, appear to be increasing still. Despite the clear evidence of cancer induction, the absolute or relative number of cancers caused by the radiation from the bombs is not large: Kato and Schull (/0) estimate that among all 283,000 A-bomb survivors in all Japan through 1978, not just those in the LSS cohort, about 191 excess leukemia deaths had occurred (—~509% increase) and about 336 extra deaths from other cancers (an increase of 3.2%). These increases resulted from an average radiation dose of about 16.1 rad (T-65). In addition to cancer induction, effects of radiation that have been identified in the studies include the induction of lenticular opacities (//), adverse growth patterns of young children manifested by diminished stature as adults (12), decreased head size and mental development of fetuses TABLE 5.-— Comparison of death certificate and autopsy statements as to underlying cause of death, 1961-70 Percent of: Underlying cause of death Certifications Autopsy diagnoses confirmed detected by death by autopsy certificate Malignant neoplasm 94.1 76.6 Stomach cancer 84.1 75.2 Lung cancer 84.8 55.4 Breast cancer 96.9 79.5 Leukemia 84.4 90.0 58 JABLON exposed to high doses, especially in the first trimester of gestation (/3), and the induction of chromosomal aberra- tions (74). In summary, the ABCC-RERF program has been fortunate because a cohort of 100,000 persons could successfully be traced over a period of more than 30 years, largely because advantage could be taken of the excellent, centralized Japanese vital statistics and registration systems, and because the program has had the support of the United States and Japanese governments. Most of what is known about the long-term health effects of radiation on man is derived from this tragic and (we hope) unique experience. REFERENCES (1) OUGHTERSON AW, WARREN S: Medical Effects of the Atomic Bomb. New York: McGraw-Hill, 1956 (2) United Nations Scientific Committee on the Effects of Atomic Radiation: Sources and Effects of Ionizing Radi- ation. New York: United Nations, 1977 (3) MULLER HJ: Artificial transmutation of the gene. Science 66:84-87, 1927 (4) NeeL JV, ScHuLL WIJ: The Effect of Exposure to the Atomic Bombs on Pregnancy Termination in Hiroshima and Nagasaki. NAS-NRC Publ. No. 461. Washington, D.C.: Natl Acad Sci-Natl Res Council, 1956 (5) IsHipA M, BEEBE GW: Research plan for JNIH-ABCC study of life span of A-bomb survivors. Atomic Bomb Casualty Commission Tech Rep No. 4-59. Hiroshima: ABCC, 1959 (6) AUXIER JA: Ichiban: Radiation Dosimetry for the Survivors of the Bombings of Hiroshima and Nagasaki. ERDA Publ. No. TID-27080. Washington, D.C.: U.S. Depart- ment of Commerce, 1977 (7) BELSKY JL, TACHIKAWA K, JABLON S: The health of atomic bomb survivors: A decade of examinations in a fixed population. Yale J Biol Med 46:284-296, 1973 (8) KATO H, SCcHULL WJ: Supplementary tables for Technical Reports 12-80 and 5-81, Radiation Effects Research Foundation Life-Span Study Rep 9. Hiroshima: RERF, 1980 (9) STEER A, LAND CE, MORIYAMA IM, et al: Accuracy of diagnosis of cancer in the JNIH-ABCC Life-Span Study Sample. Radiation Effects Research Foundation Tech Rep 1-75. Hiroshima: RERF, 1976 (10) KATO H, ScHuLL WJ]: Studies of mortality of A-bomb survivors. 7. Mortality, 1950-1978. Part 1. Cancer mor- tality. Radiat Res 90:395-432, 1982 (11) MILLER RJ, FuJiINOo T, NEFZGER MD: Eye findings in atomic bomb survivors, Hiroshima-Nagasaki, 1963-64. Am J Epidemiol 89:129-138, 1969 (12) BELsKY JL, BLoT WJ: Stature of adults exposed in child- hood to the atomic bombs, Hiroshima-Nagasaki. Am J Public Health 65:489-494, 1975 (13) MILLER RW, BLOT WIJ: Small head size following in utero exposure to atomic radiation, Hiroshima and Nagasaki. Lancet 2:784-787, 1972 (14) Awa AA, SOFUNI T, HONDA T, et al: Relationship between dose and chromosome aberrations in atomic bomb survivors, Hiroshima and Nagasaki. J Radiat Res 19:126-140, 1978 The Framingham Study: Sample Selection, Follow-up, and Methods of Analyses ' Manning Feinleib 2 ABSTRACT —The Framingham Heart Study, begun in 1948, had a cohort of 5,209 individuals who have been followed for 35 years. The selection of this population and the success in following it through biennial clinical examinations and indirect surveillance for deaths and hospitalizations are described. The major techniques used in analysis of the Framingham data are identified. Natl Cancer Inst Monogr 67: 59-64, 1985. October 1983 marked the 35th anniversary of the launching of the Framingham Heart Study. After more than a year of planning, the first examinations of what was to become the Framingham Heart Study Cohort took place on September 29, 1948. The Framingham Heart Study Clinic was formally dedicated on October 11 at the Framingham Union Hospital. The first 2,500 patients who came into this Clinic were actually volunteers participating in a demonstration program designed to develop case- finding procedures for heart disease. This was one of several projects initiated by Dr. Joseph W. Mountin, Assistant Surgeon General, in cooperation with Dr. Vlado A. Getting, Health Commissioner of the Commonwealth of Massachusetts, Dr. David D. Rutstein, Professor of Preventive Medicine at the Harvard Medical School, and Dr. Bert R. Boone, Chief of the Heart Disease Demonstra- tion Section of the United States Public Health Service. In 1949, the Framingham Heart Study was transferred to the newly created National Heart Institute. At this point, the Director of the study, Dr. Gilcin F. Meadors, and the Head of the Biometry Unit at the National Heart Institute, Felix E. Moore, Jr., undertook the development of a formal study protocol and a formal sampling scheme to reorient the Framingham program from a demonstration project to a prospective epidemiologic investigation. In April 1950, Dr. Thomas R. Dawber became the first Director of the Framingham Study as it is recognized today, serving until 1966 when he was succeeded by Dr. William B. Kannel. Dr. Kannel served as Director until 1978 and was succeeded by Dr. William P. Castelli. The supervision of the administrative and epidemiologic aspects ABBREVIATION: CHD=coronary heart disease. ! Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 National Center for Health Statistics, Center Building, Room 2-19, 3700 East-West Highway, Hyattsville, Maryland 20782. of the program in Bethesda was the responsibility of Dr. William J. Zukel from 1963 to 1971 and were my responsibility from 1971 to 1982. The statistical activities in Bethesda were directed successively by Felix E. Moore, Harold Kahn, Tavia Gordon, and Robert J. Garrison. In Framingham, Patricia M. McNamara supervised statistical operations for many years. Until 1970, the examinations at Framingham were conducted by full-time employees of the Public Health Service. In 1970, due to a restriction in funding and a reduction in force of personnel, there was a brief hiatus in the examination cycle that was resolved about 9 months later through non-Federal funding under the auspices of Boston University with Dr. Dawber as Co- Director. In 1975, the National Heart, Lung, and Blood Institute resumed financial support of the research project through the research contract mechanism while continuing to maintain a core of physicians and support staff. The statistical and data processing support for the Study in Bethesda was maintained uninterruptedly throughout the 35-year period. The history of the Framingham Heart Study has been described in numerous papers (/-10). The first 6 years were a critical period as the project shifted from a demonstration program to an epidemiologic investigation, as the research activities shifted from a free-standing clinical operation to a more formally supervised and orchestrated epidemiologic statistical investigation, with geographic separation of the data collection and analytic activities and the superposition of increasingly complex scientific and bureaucratic reviews and controls. In this presentation, we can review only briefly some of the more important decisions necessary during the initiation and conduct of the study. SELECTION OF THE COHORT Although it was recognized from the onset that the town of Framingham, Massachusetts, could be considered neither a random nor a completely representative sample of the United States, the town did have certain characteristics that made it eminently suitable for a long-term epidemi- ologic study (/, 4, 6). 1) The town was of adequate size to provide enough individuals for the Study. 2) It was compact enough that the Study population could be observed conveniently. 3) It contained a variety of socioeconomic and ethnic subgroups to provide contrasting groups for analysis. 4) The population was relatively stable to enable 59 60 FEINLEIB adequate follow-up for a long time. This was partly due to a fairly stable economy supported by a diversity of employ- ment opportunities. 5) The town was located near a medical center which could provide consultations and the opportunity for educational development of the staff. 6) The physicians and other medical professionals in the town were highly supportive of the Study and cooperated fully with its objectives. 7) Framingham contained 2 general hospitals at the beginning but | closed shortly after the Study began, so that a major portion of the medical care was provided by a single hospital. 8) Framingham, like most towns in Massachusetts, maintains an annual list of its residents. 9) The staff of a well-organized health department helped to provide death certificate information and other vital statistics. 10) Framingham had been the site of a community study of tuberculosis nearly 30 years before that had had successful participation by the townspeople. It was believed that this spirit of cooperation was still present in 1948. As Dawber and Moore succinctly put it, “In short, it was a place where such a study could be done, and it was not grossly atypical in any respect that appeared relevant” (2). The initial organization of the Study was an ambitious undertaking with the coordination of various recruitment and supportive activities, including the organization of several lay and professional advisory groups, endorsement of the Massachusetts Medical Society, and the organization of a cadre of volunteers to solicit participation in the project. From 1948 through the first part of 1950, participation in the program was completely voluntary. Any adult in the town between the ages of 20 and 70 who wished a cardiovascular examination was admitted to the Clinic. During this period, 2,941 people were examined (2). At this point, the change from a demonstration program to a long- term epidemiologic investigation was implemented, and a random sampling scheme was devised. Approximately 6,000 people could be examined during the proposed 2-year examination cycle. Because the town contained approxi- mately 10,000 residents in the target population aged 30 to 59 years, and on the basis of the enthusiastic initial response to the call for volunteers for the demonstration project, those conducting the Study decided to apply a two- thirds sampling ratio to yield a respondent group of 6,000 participants (table 1). They also estimated that of 6,000 participants, 5,000 would be found to be free of CHD at the base-line examination. These 5,000 disease-free individuals would form an adequately-sized cohort for follow-up of heart disease during the succeeding 20 years. The official sampling frame was the population aged 30-59 as of January 1, 1950, according to the town census. A separate list was drawn up for each of the 8 precincts in the town of Framingham. Within each precinct, the lists were arranged by family size and then in serial order by address. Two of every 3 families were then selected for the sample. In each family, all residents in the eligible age range were invited to have an examination. The recruitment TABLE 1.— Initial sampling estimates for the Framingham Heart Study Variable Estimate Population aged 30-59 yr 10,000 Sampling ratio of 2:3 6,600 Less 10% refusal, outmigration 6,000 Respondents free of CHD at base line 5,000 effort was a highly organized affair with 6 committees set up to organize the logistical and publicity aspects of the effort. In particular, a neighborhood organization com- mittee was established to contact all selected individuals personally and to urge their participation in the study. By 1952, physicians had examined 4,469 eligible persons when the investigators decided to end the recruitment period and reconsider the structure of the cohort (table 2). To achieve the target population of 5,000 disease-free individuals, they identified from the town list of residents 888 volunteers who had been examined but were not included in the drawn sample and who were 30-59 years old in 1950. These residents were invited to the Clinic for an examination. Of these, 740 (83.3%) returned for a second examination and were added to the study cohort. Thus, although 5,209 people were taken into the cohort, the lower than expected prevalence of CHD (1.7% in the examined sample and 0.8% among the volunteers) yielded a population of 5,127 individuals free of the disease as determined from the base- line examination and medical history. Many of the problems that occurred in the definition of the cohort, examination of the reasons for nonresponse, and potential biases among the examined and nonexamined groups and among the volunteers as opposed to the sample groups have been discussed in previous publications (4, 7, 8-10). At this late date, it is impossible for me to add any significant new insights into this process. However, a few points may be noted. First of all, the attained cohort seemed to be considerably healthier than would have been expected in the general population. Study personnel expected to find approximately 1 in 6 of the respondents to have some evidence of CHD; the observed prevalence rate was less than 0.1 of this. If presence of CHD was an important factor for nonresponse, then the population-at- risk actually attained may not have been as seriously underrepresented as the crude response rate of 68.7% might indicate. Secondly, a complete report of the response rate was not finished until about 1959. This delay was primarily due to the mechanical difficulty of sorting through over 6,000 records, so that duplications and subjects out of the prescribed age range were eliminated. With modern com- puterized methods for handling records, a complete count on all eligible subjects and respondents would have been available early in the course of a study. However, in the 1950s only simple sorting and tabulating equipment was available, which made detailed examination of the records a time-consuming and error-prone operation (77). FOLLOW-UP After the initial base-line examination of the 4,469 sample respondents and after the second examination of NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 FRAMINGHAM METHODOLOGY 61 TABLE 2.— Derivation of the Framingham Heart Study cohort Sample segment No. of persons Percentages Drawn sample (1) 6,587 Exclusions, duplicate names, 80 2)/(1H)=1.2 outside age range (2) Eligible sample (3) 6,507 Examined (4) 4,469 (4)/(3)=68.7 Sample free of CHD (5) 4,393 (5)/(4)=98.3 Volunteers (6) 740 (6)/(8)=14.2 Volunteers free of CHD (7) 734 (7)/(6)=99.2 Total cohort (8) 5,209 (8)/(9)=98.4 Total cohort free of CHD (9) 5,127 the 740 volunteers, follow-up during the next 35 years was achieved by 2 methods: 1) direct, by means of medical examination at the Clinic scheduled on a biennial cycle; 2) indirect, through secondary sources of information in- cluding death records, obituary notices, hospital records, and reports from physicians and family. Detailed analyses of the follow-up have been given previously (4, 6, 8, 9) and the quality and utility of the various types of follow-up information have been formally evaluated (7). By far the most important source of follow-up information was the biennial examination at the Clinic. The basis for achieving successful response from the participants was carefully spelled out by Dawber et al. (6): “Each of the subjects was advised at the initial interview that it was intended to re-examine him at two-year intervals, and that he would be approached directly at the appropriate time. The names of a relative, a friend, and the family physician were all recorded so that the subject could be traced in case he moved during the interval. An abstract of the initial examination was sent to the family physician and the subject was advised by letter as to whether the physician should be consulted or not. The objective of this procedure was to provide some tangible benefit to the subject other than the knowledge of his contribu- tion to medical science. At the same time, care was taken not to become involved in the medical manage- ment of the subject and to avoid interfering in any way with the relationship between the subject and his physician. This helped to maintain rapport, not only with the subjects themselves, but with the medical community as well.” The success of this approach is attested to by the high proportion of participants who returned for examina- tion at each biennial cycle (table 3). The greatest loss due to dropouts occurred between the first and second examina- tions. Considerable evidence indicates that those who came in most reluctantly for the initial examination (i.e., came in toward the end of the recruitment period) had the highest dropout rate during the next 30 years (4, 8). As the population ages, as more and more people become inca- pacitated, and as increasing numbers of participants move from the Framingham area, the dropout rate has tended to increase slightly over the years. Yet, at the time of the ninth examination (16-yr follow-up), 84.2% of the cohort still alive appeared for the examination. Examinations were discontinued for about 9 months during 1970-71, covering SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES a portion of cycles 10 and 11, which resulted in the loss of information from approximately 800 examinations. Al- though approximately 80% of the patients surviving continued to return for examination, the decline from the pre-1970 rates was noticeable. Even those who have moved away from Framingham tend to schedule their examinations during the times they return to visit friends and family in the neighborhood (9). Among those who are alive at a particular examination, the likelihood of reexamination is about the same for both men and women and for each age group up to about age 70 (8). After age 70 the rate of nonreturn increases gradually. Indirect follow-up through secondary sources of informa- tion has been pursued vigorously and continuously through- out the history of the Framingham Study. A member of the Study staff monitors all admissions to the Framingham Union Hospital, the major source of hospital care for the community, so that admissions of Study participants to the hospital can be identified promptly. This monitoring is particularly important because it allows for standardized examination of stroke cases by a consultant neurologist while symptoms of the disease are still present. Mortality follow-up is virtually complete with less than 2% of the cohort having unknown vital status. The criteria for diagnosis of cardiovascular disease and other end points investigated in the Framingham Study have been precisely defined (7/0), and the utility of the various sources of information in providing diagnostic information according to the Study criteria has been investigated (7). Friedman et al. have estimated that the biennial examinations at the Clinic provided the diagnostic information for three-fourths of all cardiovascular events and that one-half the cases would not have been adequately documented without these examinations. The remaining one-fourth was identified from information on death certificates or hospital records. Clinician reports and those from relatives and friends did not contribute appreciably to the identification of cases but were useful in determination TABLE 3.— Proportion of surviving cohort receiving examinations Examination No. No. Percent No. alive examined examined 1 5,209 5,209 100.0 2 5,180 4,792 92.5 3 5,129 4416 86.1 4 5,074 4,545 89.6 5 4,990 4,422 88.6 6 4.897 4,259 87.0 7 4,804 4,191 87.2 8 4,676 4,030 86.2 9 4,552 3,833 84.2 10 4,406 3,595 81.6" 1 4,249 2,955 69.5" 12 4,081 3,261 79.9 13 3,883 3,133 80.7 14 3,690 2,871 77.8 “ The 237 volunteers were recalled so late in cycle 2 that they were not called in for examination No. 3. If they were included in examination No. 3, respondents would yield an examination rate of 90.7%. ” There was a hiatus in examinations during 1970-71 (see text). 62 FEINLEIB of the cause of death for persons who did not die in the hospital. One question raised by Friedman and his co-workers was: “How useful was it to have biennial examinations?” They answered this question thus: “Since, at most, a sixth of all cases were lost when all followup examinations except the last were ignored, it appeared that less frequent examinations might have resulted in very little under- estimation of disease incidence. It is suggested, though, that frequent examinations aided in maintaining rapport and contact with the subjects and the town physicians and without them other steps would have to be taken to insure good followup” (7). METHODS OF ANALYSIS Over the years, more than 250 research reports have emanated from the Framingham Study authored by members of the Framingham staff or collaborating investi- gators. Numerous other articles citing Framingham data have been written by investigators unaffiliated with this Study, although no definitive record has been kept. The rich data base contained in the Framingham files and the importance of the questions that could be addressed by these data have led to the development and use of a wide variety of analytic procedures for study of the complex relationships between various risk factors and the occur- rence of cardiovascular diseases. This effort was fostered by a group of capable, imaginative, and devoted statisticians and epidemiologists who have worked effectively with the clinical and laboratory staff. This effort was also enhanced by stable and continuous support of the biostatistical operations at the Institute in Bethesda and by the availa- bility of computer facilities. For perhaps the first 10 years of the Framingham Study, analysts primarily related the cumulative incidence of disease to the level of the risk factors as measured at the base-line examination and grouped them into 3 or 4 strata (3, 5). In the early 1960s, Kahn (/2) and Kahn and Dawber (13) explored several other techniques for characterizing the independent variables, including the “average value to date” before the occurrence of the event according to: 1) the most recent value, 2) the annual slope of the values to date, 3) the maximum value before the occurrence of the event, and 4) the standard deviation of the measured values to date. Rather than cumulative incidence to date, Kahn (12) used as the dependent variable the risk of disease during the 2-year period following classification according to one of the above measures. For serum cholesterol values, he found that the values of the latest examination tended to give the highest morbidity ratios, although any of the other methods used to classify the independent variable appeared to make little difference. After that, the usual method by which the Framingham data were analyzed was by re- classification of each of the subjects at each biennial examination and calculation of the risk of disease during the next 2-year period. Each examination was considered to be independent of the other examinations. Although this feature was of concern to the statisticians involved with the Study, they found no tractable way to handle it and believed that any effect of nonindependence would fail to affect the results materially. In the late 1950s and early 1960s, Cornfield and his associates (/4, 15) turned their attention to the analyses of the bivariate and multivariate interactions of the risk factors for the development of cardiovascular disease. Their efforts resulted in one of the most significant developments in the analysis of the Framingham data; their techniques have been used in a wide variety of other applications (14-16). Cornfield approached the problem of predicting who would get CHD during a fixed period as an instance of a linear discriminant analysis. He assumed that there are 2 populations, one destined to get heart disease and the other to remain free of heart disease. He also assumed that the distribution of base-line variables for these 2 populations were of the same form, i.e., multivariate normal, with equal variance-covariance matrices, but with different vectors of mean values. For any given combina- tion of risk factors, one could then estimate the probability that an individual with those risk factors would belong to the population destined to get heart disease or to the population that would remain free of disease during the follow-up period. This probability was then related to the risk of disease for that combination of risk factors. The resulting equation describing this risk function is the now- familiar, multivariate logistic function: | Pxy= 14 goer ° where P is the probability of developing heart disease, x is the vector of observed values, and a and b are the coefficients to be estimated by the linear discriminant method from the total body of data. Statisticians soon realized that the assumptions under- lying this model did not hold in either theory or practice. However, once the form of the risk function was specified, i.e., multivariate logistic, a maximum likelihood solution could be obtained. This was done by Walker and Duncan (17) who found that the solution could only be obtained by an iterative procedure. Although the probability of disease is a nonlinear function of the vector of observed values, the logarithm of the odds of becoming diseased is a linear function of the independent variables, so that when the disease rates are relatively low, straightforward linear regression of incidence upon the independent variables often provides a satisfactory prediction equation. Although it took several years for this model to become widely accepted by the cardiovascular epidemiologic community, it finally became the model of choice for many investigations. It solved a number of problems that had plagued the previous analyses. 1) The model provided a risk function which gave values in the range of 0 to 1, unlike some of the regression models that had been used pre- viously. 2) It avoided the need for classification of the independent variables and could handle continuous traits at their observed levels. 3) The model was sufficiently robust so that discontinuous independent variables could also be used in the prediction equation. 4) It enabled the statistician to summarize the effects of several different variables in a single equation and provided statistical tests NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 FRAMINGHAM METHODOLOGY 63 of significance for the estimated values of the effects of each of the variables in the final prediction equation. Although several difficulties are involved in the application of the logistic function (/8), Walker and Duncan’s model has proved extremely valuable in supplying a succinct and often graphic representation of the relationships among different variables in predicting heart disease (/9). Designers of newer approaches to the analysis of Framingham data have tried to incorporate the longitudinal nature of the data collection system (20-22). Most of the variables studied in the Framingham Heart Study tend to “track over time,” i.e., the values on an individual from one examination to another follow an orderly trajectory, and usually the correlations from 1 examination to the next are high. Thus the ranking of an individual within the Framingham population remains stable over time. Another aspect that has been explored in the past few years is a review of the assumption that it is valid for one to consider only the 2-year interval of risk. In particular, when one tries to use obesity or measures of overweight to predict either cardiovascular outcome or total mortality, one obtains different results from the 2-year data than those obtained by using the cumulative experience from base-line measurements (23-25). Finally, statisticians have recently explored the possibility that risk of illness need not be monotonic in any particular risk factor. In particular, the possibility of U-shaped risk factor functions with regard to various end points has been analyzed (26, 27). In the Framingham data, evidence indicates that values at either extreme of the distribution for certain variables may increase one’s risk of disease. Further analyses of these complex relations may serve to uncover optimal ranges of the risk factors for promoting longevity and good health. CONCLUSIONS I have been able to highlight only briefly a few of the many aspects of the Framingham study. The study is still vigorous; not only is continued follow-up of the original cohort planned but 10 years of information have already been obtained on the offspring generation (28-32). The Framingham cohort will likely continue to yield many important results involving not only cardiovascular diseases but various other conditions, particularly important among the elderly, that might be related to the wide array of measurements made in this population during the last 34 decades. The offspring offer a resource for study of the development of and changes in risk factors among younger adults. Framingham will long be remembered not only as a pioneering investigation that resolved many of the practical aspects that beset longitudinal studies but also as a demonstration that the long-term follow-up of a general population by a highly motivated and expert group can yield invaluable information that cannot be obtained any other way. REFERENCES (1) DAWBER TR, MEADORS GF, MOORE FE: Epidemiological approaches to heart disease: The Framingham Study. Am J Public Health 41:279-286, 1951 SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES (2) DAWBER TR, MOORE FE: Longitudinal study of heart disease in Framingham, Massachusetts: An interim report. In: Research in Public Health, Papers Presented at 1951 Annual Conference of the Milbank Memorial Health Fund. New York: Milbank Memorial Fund, 1952, pp 241-247 (3) DAWBER TR, MOORE FE, MANN GV: Coronary heart disease in the Framingham Study. Am J Public Health 47(Suppl):4-24, 1957 (4) GORDON T, MOORE FE, SHURTLEFF D, et al: Some methodologic problems in the long-term study of cardio- vascular disease: Observations on the Framingham Study. J Chronic Dis 10:186-206, 1959 (5) KANNEL WB, DAWBER TR, KAGAN A, et al: Factors of risk in the development of coronary heart disease six-year follow-up experience: The Framingham Study. Ann Intern Med 55:33-50, 1961 (6) DAWBER TR, KANNEL WB, LYELL L: An approach to longitudinal studies in a community: The Framingham Study. Ann NY Acad Sci 107:539-556, 1963 (7) FRIEDMAN GD, KANNEL WB, DAWBER TR, et al: An evaluation of follow-up methods in the Framingham Heart Study. Am J Public Health 57:1015-1024, 1967 (8) GOrDON T, KANNEL WB: The Framingham, Massachusetts Study twenty years later. /n The Community as an Epidemiologic Laboratory: A Casebook of Community Studies (Kessler 1J, Levin ML, eds). Baltimore: Johns Hopkins Univ Press, 1970, pp 123-146 : The prospective study of cardiovascular disease. In Trends in Epidemiology (Stewart GT, ed). Springfield, Ill. Charles C Thomas, 1972, pp 189-211 (10) DAWBER TR: The Framingham Study: The Epidemiology of Atherosclerotic Disease. Boston: Harvard Univ Press, 1980 (11) DAWBER TR, KANNEL WB, FRIEDMAN GD: The use of computers in cardiovascular epidemiology. Prog Cardio- vasc Dis 5:406-417, 1963 (12) KAHN HA: A method for analyzing longitudinal observa- tions on individuals in the Framingham Heart Study. In Proceedings of the Social Statistics Section, 1961 (Gold- field ED, ed). Washington, D.C.: Am Stat Assoc, 1962, pp 156-160 (13) KAHN HA, DAWBER TR: The development of coronary heart disease in relation to sequential biennial measures of cholesterol in the Framingham study. J Chronic Dis 19:611-620, 1966 (1/4) CORNFIELD J, GORDON T, SMITH WW: Quantal response curves for experimentally uncontrolled variables. Bull Int Stat Inst 38:97-115, 1961 (15) CORNFIELD J: Joint dependence of risk of coronary heart disease on serum cholesterol and systolic blood pressure: A discriminant function analysis. Fed Proc 21:58-61, 1962 (16) TRUETT J, CORNFIELD J, KANNEL WB: A multivariate analysis of the risk of coronary heart disease in Framing- ham. J Chronic Dis 20:511-524, 1967 (17) WALKER SH, DUNCAN DB: Estimation of the probabiljty of an event as a function of several independent variables. Biometrika 54:167-179, 1967 (18) GORDON T: Hazards in the use of the logistic function with special reference to data from prospective cardiovascular studies. J Chronic Dis 27:97-102, 1924 (19) GorDON T, KANNEL WB: Predisposition to atherosclerosis in the head, heart, and legs; the Framingham Study. JAMA 221:661-666, 1972 (20) TRUETT J, SORLIE P: Changes in successive measurements and the development of disease: The Framingham Study. J Chronic Dis 24:349-361, 1971 9) 64 FEINLEIB (21) Wu M, WARE JH, FEINLEIB M: On the relation between blood pressure change and initial value. J Chronic Dis 33:637-644, 1980 (22) HOFMAN A, FEINLEIB M, GARRISON RJ, et al: Does change in blood pressure predict heart disease? Br Med J 287:267-269, 1983 (23) SoRrLIE P, GORDON T, KANNEL WB: Body build and mor- tality: The Framingham Study. JAMA 243:1828-1831, 1980 (24) GARRISON RJ, FEINLEIB M, CASTELLI WP, et al: Cigarette smoking as a confounder of the relationship between relative weight and long-term mortality. The Framingham Heart Study. JAMA 249:2199-2203, 1983 (25) HuBerT HB, FEINLEIB M, MCNAMARA PM, et al: Obesity as an independent risk factor for cardiovascular disease: A 26-year follow-up of participants in the Framingham Heart Study. Circulation 67:968-977, 1983 (26) SoORLIE PD, FEINLEIB M: The serum cholesterol-cancer relationship: An analysis of time trends in the Fram- ingham Study. JNCI 69:989-996, 1982 (27) FEINLEIB M: Review of the epidemiological evidence for a possible relationship between hypocholesterolemia and cancer. Cancer Res 43:2503s-2507s, 1983 (28) FEINLEIB M, KANNEL WB, GARRISON RJ, et al: The Fram- ingham Offspring Study. Design and preliminary data. Prev Med 4:518-525, 1975 (29) KANNEL WB, FEINLEIB M, MCNAMARA PM, et al: An investigation of coronary heart disease in families: The Framingham Offspring Study. Am J Epidemiol 110:281-290, 1979 (30) FEINLEIB M, GARRISON RJ, STALLONES L, et al: A com- parison of blood pressure, total cholesterol and cigarette smoking in parents in 1950 and their children in 1970. Am J Epidemiol 110:291-302, 1979 (31) HAavLIK RJ, GARRISON RJ, FEINLEIB M, et al: Blood pres- sure aggregation in families. Am J Epidemiol 110: 304-312, 1979 (32) GARRISON RJ, CASTELLI WP, FEINLEIB M, et al: The association of total cholesterol, triglycerides and plasma lipoprotein cholesterol levels in first-degree relatives and spouse pairs. Am J Epidemiol 110:313-321, 1979 Selection, Follow-up, and Analysis in the Health Insurance Plan Study: A Randomized Trial With Breast Cancer Screening ' 2 Sam Shapiro, * Wanda Venet, * Philip Strax, * Louis Venet, © and Ruth Roeser *’ ABSTRACT —Critical decisions made 20 years ago by those who planned the randomized trial at the Health Insurance Plan (HIP) of Greater New York to determine the efficacy of periodic screening for breast cancer are detailed. These decisions affected the age group to be screened, screening modalities, frequency of screening, sample size, primary measures for testing efficacy, and period of follow-up (long term). Results of follow-up, 16 years after entry, indicate that mortality due to breast cancer continues to be lower among study women than controls. Numerically, the differential has been stable; relatively, it has decreased. It is estimated that the study group would have experienced about a 309% reduction in breast cancer mortality if screening had been maintained. Relative case survival rates over a 14-year period after diagnosis show changes in contours of trend lines that result from screening. The study group’s trend is slightly concave in contrast to the usual convex curve for the controls. The contour of the curve is more decidedly concave among subjects detected through mammography alone than for other subgroups detected through screening, although the relative survival rate remains highest in the mammography only group. Uncertainty persists about effects of screening in the HIP study on breast cancer mortality among women aged 40-49 years at entry. — Natl Cancer Inst Monogr 67: 65-74, 1985. BACKGROUND December 1983 marked the twentieth anniversary of the randomized trial conducted at the HIP for the determina- tion of the efficacy of periodic screening with mammog- raphy and clinical examination of the breast for reducing ABBREVIATIONS: HIP=Health Insurance Plan of Greater New York; CSR=case survival rates. ! Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Supported in part by Public Health Service contracts PH43- 63, NIH69-88 with the National Institutes of Health, and NOI- CP43278 with the Division of Cancer Etiology, National Cancer Institute. 3 Health Services Research and Development Center, School of Hygiene and Public Health, The Johns Hopkins University, 624 North Broadway, Baltimore, Maryland 21205. 4 Health Insurance Plan of Greater New York, 220 West 58th Street, New York, N.Y. 10019. 5 Department of Community Medicine, New York Medical College, Valhalla, New York 10595. % Department of Surgery, Beth Israel Medical Center, 10 Nathan B. Perlman Place, New York, N.Y. 10003. 7 We thank Dr. George B. Hutchison, Department of Epi- demiology, Harvard University School of Public Health, for his valuable suggestions when reviewing the draft of this paper. breast cancer mortality in the female population. This makes it the longest and largest scale cohort study designed as an experimental trial in the United States. From a researcher’s standpoint, 20 years is a long time because of change independent of the intervention that might occur to obscure experimental effects. Unfortunately, a current assessment of the problem involving breast cancer mirrors statements made a generation ago. The fact is that, despite changes over the past 40 years in social and economic conditions, nutritional status, health care, fertility patterns, and other life circumstances, the summary indi- cator of the seriousness of the breast cancer problem, i.e., mortality from this disease, has been at a remarkably constant level for the country as a whole (fig. 1). The degree to which this unvarying picture masks the presence of counteracting forces, such as increases in the incidence of breast cancer and more favorable prognoses for the cases diagnosed or changes that are taking place in subgroups of the female population as suggested by the rise in breast cancer mortality among nonwhites shown in figure 1, remains largely unexplored. However, the important point is that breast cancer still accounts for about one-fourth of all cancers diagnosed each year among women. About | in 11 develops clinically detectable breast cancer during her lifetime, and at least 1 of 4 women is likely to have a breast biopsy. Now, as was true years ago, screening ranks high among the approaches to change this situation. The attention that screening has attracted as a secondary preventive measure for breast cancer derives from the experience in clinical practice which indicates that women whose cancer has been detected in an early stage of development have far better prognoses than those with breast cancer detected at a later stage. Interest in screening has centered on the proposition that it would shift the diagnosis to an earlier stage of the disease, and this, in turn, would lead to reduced mortality from breast cancer. In the early and mid-1960s, reports appeared from many periodic examination programs on the detection of breast cancer through palpation. Uniformly, they indicated that a higher proportion of breast cancer patients were diagnosed while the disease was in a localized stage than was experienced in the general population, and some of the programs showed an increase in the survival rate among patients detected with cancer of the breast (/-3). However, considerable doubt persisted about the contribution such examinations could make toward the reduction of breast cancer mortality in the population at large. The reason for hesitation in generalization of the results of past studies is simple. In the main, the programs were based on persons who volunteered for the examinations or 65 66 SHAPIRO, VENET, STRAX, VENET, AND ROESER 25 All Females® wn 4 20 oe? 3 IN Sra NT , © A om 2 5 o o o x 10 & a w Eos x oh . ; eet . . s , 1940 45 50 55 60 65 70 5 80 FIGURE 1.— Age-adjusted death rate for cancer of the breast in the United States, 1940-80. Rates are adjusted to age distribution of the female population in 1940. Trend line is not shown separately for white females; rates are almost identical for all females and white females. Data are from the National Center for Health Statistics, Department of Health and Human Services. on patients who appeared for other medical care at a clinic; and the selectivity factors associated with these groups were not known. In short, suitable comparison groups could not be established for the women studied, and efficacy remained uncertain. Actually, despite the increased attention that had been given to early breast cancer detection and the technical progress that appeared to be under way in treatment, the outlook for a reduction in mortality from breast cancer was pessimistic (4). The catalyst for reappraisal of the possible role of periodic screening in lowering breast cancer mortality was the emergence of mammography. In a periodic examination survey, Gershon-Cohen et al. (5) showed that nonpalpable carcinomas of the breast were being detected by means of mammography, and, independently, Egan (6) demon- strated the value of mammography for differential diagnos- tic purposes and in locating occult cancers. These developments proved to be sufficiently impressive for the National Cancer Institute to contemplate a long- term study of the effectiveness of periodic screening with mammography and clinical examinations of the breast in reducing breast cancer mortality (7). Because of researchers’ past difficulties in drawing conclusions from programs that did not have appropriate comparison groups, the study was to be designed as a randomized clinical trial. Under the leadership of Dr. Michael Shimkin, then head of the Institute’s Biometry Branch, sites were sought for the conduct of the trial. Concurrently, Dr. Philip Strax, a radiologist at HIP, was exploring applications for mam- mography. Other favorable circumstances that led to the selection of HIP as the study’s site were the size and coverage of the Plan: about 700,000 members with prepaid comprehensive medical care benefits and the presence of an experienced research department. Members came from broad spectra of socioeconomic, ethnic, and religious groups in New York City and Long Island. An important factor in initiation of the project was the evidence of a high degree of reproducibility of Egan’s mammography technique that was accumulated in a study conducted by the Public Health Service and the M. D. Anderson Hospital and Tumor Institute (8). This evidence was encouraging, although those involved recognized that the HIP screening study would be conducted under different conditions. In this investigation, unlike the reproducibility study, a substantial proportion of the biopsy recommendations was expected to be made on radiologic evidence alone, involving small, nonpalpable lesions. KEY DECISIONS Critical decisions in planning the investigation concerned the age group to be included, whether the test was to be of the combined effect of mammography and clinical examina- tion of the breast, the number and periodicity of screening examinations, and sample size. Resolution of these issues was a joint effort involving William Haenszel, Sidney Cutler, Nathan Mantel, and Marvin Schneiderman, all of whom were in the Biometry Branch, and George Hutchison, who served as a consultant to the investigators. On the question of what ages should be included, an important consideration was that concern about secondary prevention extends over a wide age range. Figure 2 illustrates the point based on more recent data than available at the time. The 3 cumulative distributions in this chart relate to breast cancer starting with those women 20-24 years old and ending with 75- to 79-year-old women; older age groups are excluded because of the high death rates from all causes at these ages. The lowest curve cumulates by age the number of deaths due to breast cancer; the denominator is the total number of women who die from breast cancer in the age range of 20-79 years. The contour of the curve is affected not only by the age-specific death rates but by the age distribution of women in the population. The fact that a substantially larger number of women are in the younger ages than older results in about 20% of all women whose deaths are attributed to breast cancer being in the age group of 35-49 years, despite low rates. When incidence cases are con- sidered (next higher curve on the chart), this proportion increases moderately. However, when we direct attention to the cumulative number of years of life lost due to breast cancer, the third curve, ages under 50, assumes a new importance. About two-fifths (41%) of the years of life lost from breast cancer detected below 80 years of age are associated with the cases diagnosed at ages 35-49 years. 100 90 80 70 = Breast concer deaths = === Breast cancer incidence CUMULATIVE PERCENTAGES —..= Person-years of life lost among breast cancer cases T T T T T T T T T T T T 1 20 25 30 35 40 45 50 55 60 65 70 75 80 85 AGE FIGURE 2.—Cumulative percent distributions of deaths from breast cancer; incidence and person-years of life lost. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 BREAST CANCER SCREENING METHODS AND RESULTS 67 The special position of the age span selected for the HIP study, 40-64 years, is reflected by the fact that it includes almost 3 of 4 of the years of life lost from this disease, i.e., 349% at ages 40-49, and 38% at ages 50-64. A similar situation existed in the early 1960s. The issue of what modalities should be included was resolved in favor of having women examined with both mammography and palpation of the breast. Ideally, to evaluate the effect of mammography on breast cancer detection and on the lowering of breast cancer mortality, the investigators would have had to design a study involving 2 experimental groups: 1 receiving both mam- mography and palpation and the other 1 modality only. They expected that with 1 modality, false negatives would be appreciable, and ethical issues were raised related to screening 2 groups under conditions known to differ in cancer detection effectiveness. The latter consideration might have been assessed differently if the nature of present day concerns had been fully anticipated, although increased costs and operational difficulties might still have ruled out a design with 2 experimental groups. The climate within which the HIP study started was one of general pessimism about how the natural history of breast cancer could be altered through early detection in a screening program. A major objective of the study organizers was to test whether indeed the natural history of this disease could be changed through a maximum effort. Accordingly, the study pro- ceeded with both modalities. The study was designed to include an initial examination followed by 2 rescreenings at annual intervals; this decision was based on estimates of costs, case detection, and breast cancer mortality. Soon after the start of the project, it became clear that an additional annual rescreening would be needed so that an adequate exposure of the experimental group to screening would be assured. Calculations of type | and type II errors indicated that samples of 30,000 for the study and control groups were required to detect a 20% or greater reduction in breast cancer mortality at an alpha level of 0.05 (one-tailed test) with a power of 50%. The risks of engaging in a costly investigation with samples that had this low a power were appreciated, but it was decided that the study should proceed on the rationale that the 20% decrease in breast cancer mortality was probably a con- servative estimate of actual findings. OBJECTIVES AND STUDY DESIGN The primary objective of the project leaders in HIP has remained constant, i.e., to determine whether periodic breast cancer screening utilizing mammography and clinical examinations holds substantial promise for a long-term reduction in mortality from breast cancer in the female population. Mortality from breast cancer rather than case survival (or case fatality) was designated as the primary end result measure due to the confounding effect on survival rates of lead time gained in case detection through screening. Furthermore, the concern of investigators about the issue and the opportunity to address it resulted in the addition of a new objective, i.e., to develop methods for measuring lead time and to derive estimates for application to the data acquired. SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES Unrelated to the efficacy issue was the potential of the study for investigation of the relationships of a wide range of parameters to the development of breast cancer. Relationships involving demographic variables, such as age, race, marital status, country of birth, religion, educa- tional attainment, past and present breast conditions, and familial history of breast cancer were of particular interest. Twenty-three of the 31 medical groups in HIP partici- pated in the study. Within each group, 2 systematic random samples of women 40 to 64 years old with membership in HIP for at least 1 year were selected. The first step was stratification of a file identifying these women by age, size of insured family, and employment group through which the family joined HIP. Every nth woman was placed in the study group, the paired (n + 1) woman in the control group. The total number of women in each sample was about 31,000, with each medical group contributing a share proportionate to its size. The pairs of subjects and controls in a medical group were randomized, and study women were drawn in sequence from the list in the scheduling of screening examinations. The scheduled date for the initial screening examination became the woman’s entry date, and all observations start from this date. Each control was assigned the same entry date as the corresponding study group woman. Study group women were offered screening examinations in their medical group centers. Pregnant women were excluded; if a woman with a prior mastectomy appeared, she was examined but the findings were not included in the study. As women in the “refused” screening subgroup and control group were identified through other sources as having had a mastectomy before their entry dates, they were dropped from the investigation. This status was most completely ascertained for the screened women such that the total cohort of study women is therefore slightly smaller than the control cohort (30,239 vs. 30,756 at entry). Every woman who had an initial examination was asked to appear for 3 annual follow-up examinations even if she was no longer a member of HIP. The only exceptions were women who, on their screening examination, were found to have conditions that required earlier follow-up. Women in the control group followed their usual practices in receiving medical care. They were neither encouraged nor discouraged from having general physical examinations, which were part of their benefits in HIP. The extent to which women in the study and control groups had general physical examinations is not known. However, mammography for early breast cancer detection among asymptomatic women was not a covered benefit in HIP; its use was restricted to differential diagnosis. Each examination consisted of a clinical examination of the breast and mammography as well as an interview for relevant demographic information and a health history. Emphasis was placed on variables implicated in the epidemiology of breast cancer. Usually, clinicians were surgeons; but in a few instances, they were internists. The clinician conducted the examination without knowledge of the radiologic findings and recorded his/her observations and recommendations for follow-up medical care on specially designed study forms. Cephalo-caudal and lateral 68 SHAPIRO, VENET, STRAX, VENET, AND ROESER radiographs of each breast were taken for each woman by technicians who had been specially trained. Studies of radiation exposure indicated that the skin dosage for examination was 7.7 rad; exposure to midline of the breast was estimated to be 2 rad, 3 cm in depth. Clinical, radiologic, and lay interview reports emanating from the examination sessions and the mammograms were sent to a central location where a medical chart was established for the patient. The mammograms were separated from the rest of the chart for independent readings by 2 staff radiologists. Differences of opinion were resolved by the project’s chief radiologist, Dr. Philip Strax, and recommendations were made without knowledge of the clinical findings. Later, clinical information derived from the screening examination and radiologic findings were reviewed by the chief clinician on the project staff, Dr. Louis Venet, for a final recommendation; i.e., routine examination | year later or early recall because of suspicious findings, biopsy, or aspiration. The woman was informed by mail if there were no positive findings, and by telephone if follow-up was required. Her designated physi- cian was advised of the screening results, and special procedures were applied in positive cases so that linkage between the woman and her physician was assured. BREAST CANCER AND MORTALITY ASCERTAINMENT Women in the study and control groups who had breast biopsies have been identified by several overlapping sources of information. These include the patient’s record in HIP and notice of hospital claims paid by insurance. Surgical and pathologic findings in patients with breast cancer are obtained from hospital charts. The project’s coordinating pathologist reviews slides and conducts special studies of tissue blocks when they are available. Clinicians investigate each case of microscopically confirmed carcinoma of the breast to establish whether a mastectomy had been performed before the woman entered the study (if so, she is excluded), the type of surgery performed, and the histologic type, nodal involvement, and size of lesion. Deaths have been identified through intensive follow-up of all confirmed breast cancer cases. This has included matching death records on file in various health depart- ments (New York City, upstate New York, New Jersey, Connecticut, and Florida) against the total file of study women (including those who failed to have a screening examination) and the file of controls. Copies of death certificates including sections on cause of death are obtained. Effectiveness of specific sources of information decreases over time. The utility of HIP medical records is reduced as subscribers leave the Plan because of changes in employ- ment, move out of the service area, and transfer to other types of coverage. By 10 years after entry, about 45% were in this category and by 15 years, the proportion no longer in HIP was well over one-half (precise figures are still to be determined). Hospital insurance becomes a less certain source of information for similar reasons. Furthermore, as women become eligible for Medicare, access to hospital claims data is reduced. Death record matching also decreases in effectiveness because of changes in names and moves to areas not included in the search (Social Security numbers are not available in this study). For these reasons, additional methods for ascertainment of newly diagnosed breast cancer cases and deaths were mandatory. Several months after the fifth and tenth anniversary of the woman's entry date, the staff conducted mail surveys to determine survival status and history of breast surgery for all study and control women other than those already known to be dead or on the breast cancer registry. A similar survey 15 years after entry started late because of delay in the decision to continue follow-up. To deal with name and address changes, study staff are using the services of a tracer organization (all direct communications with the women or families are made by the study staff). As these activities are performed, information about the status of women in the study and control cohorts 5 and 10 years after entry improves, and the status of their breast cancer and mortality is now known for the following percentages: Yr post entry Study group, % Controls, % 5 87.6 84.5 10 82.0 80.2 It is too early for one to estimate the corresponding proportions at 15 years post entry. However, it may be in the 70-75% range. Ordinarily, follow-up is plagued by the possibility that many of those who cannot be located are likely to be deceased. This is not believed to be true among the “not locateds” in the study and control groups. Although HIP medical and enrollment records, hospital claims, and death record tapes are subject to the problems previously mentioned, they are still being used. Deaths and cases of breast cancer are being ascertained through these sources. Finally, arrangements have been made with the New York State Department of Health for access to their cancer registry. A critical requirement in follow-up has been identifica- tion of breast cancer cases and deaths with similar degrees of success in the study and control groups. The results of the mail surveys and other searches are reassuring that this objective is being met. As indicated above, almost identical ascertainment rates were obtained for the 2 groups, and the numbers of cases identified through the various procedures have been consistent. Follow-up of women with breast cancer diagnosed after entry involves communication with medical care providers and the women or their surviving relatives. This phase is 100% complete, i.e., the survival status is known for all identified study and control women. SCREENING PARTICIPATION AND COMPARABILITY OF STUDY AND CONTROL GROUPS Between December 1963 and June 1966, about 20,200 women, or 67% of the study group, appeared for their initial screening examinations. Close to 80% of these women participated in the first annual examination, 75% in the second, and 699% in the third. Information obtained through surveys of subsamples of the total study and NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 BREAST CANCER SCREENING METHODS AND RESULTS 69 control women indicate comparability between these 2 groups as shown in table 1 (9). However, study women who refused screening differ from those examined in several respects. They are slightly older, have lower educational attainment, are less likely to be Jewish, and are less likely to have been married or to be multiparous or premenopausal. A lower proportion report ever having had a lump in the breast. Also, they differed markedly from the others in basic attitudes toward preventive health examinations. They were more apt to avoid physical examinations in general, to believe that people should wait until they have symptoms before seeing a physician, and to believe that their physicians “know all my health conditions with- out . . . more special tests.” Rates of reexamination were influenced negligibly by age, educational attainment, race, and menopausal status (/0). Further evidence on the issues of comparability between study and control groups and self-selection bias in the decision to be screened or not is obtained from comparisons TABLE |.— Percent distribution of study and control groups of women entering study during 1964 by selected characteristics Study group” Characteristic? - Not ( arty] Total Examined : group examined Total 100.0 100.0 100.0 100.0 Age, yr 40 44 24.2 25.3 2213 24.5 45 49 23.7 24.1 229 23.6 50-54 22.5 224 227 21.9 55-59 18.4 17.8 19.3 18.7 60-69 11.2 10.4 12.8 11.3 Religion Protestant 29.1 28.0 31.1 29.2 Catholic 38.1 36.3 41.4 37.9 Jewish 32.8 35.7 27.3 329 Education Elementary school ~~ 22.6 19.5 28.3 22.1 High school 46.5 46.8 45.9 45.0 College 30.9 33.7 25.8 32.9 Marital status Never married 8.7 7.5 10.9 9.3 Ever married 91.3 92.5 89.1 90.7 Prior pregnancies Never pregnant 20.3 19.4 219 23.0 13 61.9 61.5 62.7 58.6 4 or more 17.8 19.1 15.4 18.4 Had or now having menopause No 29.1 334 21.2 25.9 Yes 70.9 66.6 78.8 74.1 Ever had lump in breast No 90.5 89.1 93.0 88.2 Yes 9.5 10.9 7.0 11.8 “ “Not stated" categories, ranging from less than 19; to maximum of 4; of total, are distributed in same manner as “knowns.” " Data for age are based on total counts. For all other characteristics, data are based on 10% of examined group and 209, sample of nonexamined group. * Data are based on 20% sample of the control group. SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES TABLE 2.— Mortality from all causes excluding breast cancer and breast cancer detection rates: 10-yr follow-up after entry Intervals from entry Rates 10 yr 1-Syr 6-10yr Deaths/ 10,000 person-yr Total study 66.2 55.1 77.8 Screened 55.4 42.4 69.0 Refused screening 88.2 81.0 95.8 Control 65.5 56.4 75.0 Breast cancers/ 1,000 person-yr Total study 2.07 2.04 2.10 Screened 2.20 2.26 2.14 Refused screening 1.80 1.59 2.02 Control 2.03 1.95 2.12 of breast cancers detected and mortality from all causes other than breast cancer. During the first 5 years following entry, rates of detection were 2.04 and 1.95 per 1,000 person-years in the study and control groups, respectively (table 2). In the succeeding years, the magnitude and direction of the differential fluctuated, but for the total 10-year period, the average annual rates were almost identical, 2.07 and 2.03 per 1,000 person-years. The other measure, general mortality excluding breast cancer, pro- vides additional support for concluding that the 2 groups are highly comparable. From the early years of the study, the rates have been exceedingly close. Further examination of relationships (results not shown here) indicates that differences between study and control women in age- specific and cause of death-specific rates are well within chance variation limits. From table 2, one can clearly ascertain that, although the total study and control groups are similar, study women who participated in screening had higher breast cancer and lower general mortality rates than those who refused screening. The differentials decrease as the interval from entry increases, but the general mortality differential remains large. These observations and the evidence on biases in personal characteristics emphasize the importance of comparisons based on the total study group (screened and refusers combined). The question might be raised whether the study popula- tion is atypical, inasmuch as it consists of women living in the New York City area who were covered by comprehen- sive health insurance at the start of the study. However, this does not appear to be an important factor. The study population’s breast cancer incidence rate differs by only about 10% from the rate in the Third National Cancer Survey (age-adjusted), and the case survival rates among the controls are close to the experience of the Surveillance, Epidemiology, and End Results Program. BREAST CANCER SELECTION From the beginning of the study, long-term follow-up was recognized as essential if the efficacy of breast cancer screening was to be determined. The interval has been variously defined in the early stages as 10-15 years, more recently as a minimum of 15 years. From the current efforts 70 SHAPIRO, VENET, STRAX, VENET, AND ROESER toward ascertainment of breast cancer cases and deaths from causes other than breast cancer, it would appear that data for the first 10 years after entry are close to being final; measures for subsequent periods are subject to greater change, but this is not likely to affect seriously the relationships to be discussed. The last publication of comprehensive findings in the study, August 1982, covered breast cancer mortality over a 14-year period (1/1). Two additional years of observations are included in the discussion that follows. Before proceed- ing, however, it is worth reviewing the frequently reported results of case detection during the first 5 years, a period that ends about 114 years after the last screening cycle. As indicated previously, the gap between the study and control groups in rates of breast cancer is small. Women who had at least | screening examination had a higher breast cancer rate than study group women who refused screening (2.26 vs. 1.58/1,000, respectively). Detection rates among women attributable to screening and to diagnosis in the course of regular medical care follow: Detection in women screened Rate TABLE 3.— Cumulative numbers of deaths due to breast cancer by selected time intervals from date of entry" Deaths with breast cancer as underlying cause through following yr after entry? Interval to breast cancer diagnosis, yr 5 7 10 16 Within § Study 39 71 95 121 Control 63 106 133 155 Difference No. 24 35 38 34 Percent 38.1 33.04 28.6 21.9° Within 7 Study 39 81 123 171 Control 63 124 174 222 Difference No. 24 43 51 51 Percent 38.1 34.74 29.34 23.0" Within 10 Study 39 81 147 236 Control 63 124 192 281 Difference No. 24 43 45 45 Percent 38.1 34.74 23.4" 16.0" 2.72/ 1,000 women examined 1.49/ 1,000 person-years 0.92/1,000 person-years Due to initial examination Due to annual reexamination Not due to screening Clinical and mammography examinations contributed cases not detected by the other; the relative contribution of mammography (in the absence of clinical findings) was lower among women under 50 years of age at diagnosis than among those 50 and over (14.4 vs. 37.6%). The proportion with no histologic evidence of axillary nodal involvement was higher among study women than in controls (56.4 and 46.3%, respectively). Breast cancers detected through screening had an especially high propor- tion with no nodal involvement (70.5%). BREAST CANCER MORTALITY Table 3 gives cumulative numbers of deaths with breast cancer the underlying cause by interval from date of entry. Three sets of comparisons are made. The first is restricted to breast cancers diagnosed (histologically or at time of death but with no histologically confirmed diagnosis before death) within 5 years after entry. This interval includes an average of 34 years of screening and 14 years after screening. The second relates to breast cancer mortality among cases diagnosed within 7 years after entry, which is close to the time when the cumulative numbers of cancers among study and control groups were equal. The third covers experience among cases diagnosed within 10 years after entry, which includes breast cancers detected after the study group would appear to have recovered to its prescreening status. In the first set, a plateau is reached in the numerical difference between study and control groups at about 7 years of follow-up; the second set shows increases in the difference at 9 or 10 years of follow-up after which it remains stable; differences in the third set are at a slightly lower level than that for the second set. One interpretation is that the reduction in breast cancer mortality among study “ Follow-up was through December 31, 1982. " Data include deaths due to breast cancer among cases histologically confirmed within a specified interval after entry and deaths among women with breast cancer as the underlying cause but with no histologically confirmed diagnosis before death. © 0.01 ax 40 = > w > = a 4 30 3 | —— Mammography only (44) All cases detected on screening (132) T ====- Chinical only (59) | ~~ Clinical and mammography (29) ol Pcl ie lism cil matialicaclicsell sald I 2 34 56 78 3811011121314 YEARS FROM DATE OF DIAGNOSIS FIGURE 6. Cumulative relative survival rates by modality of detection on screening. Breast cancer diagnoses were made within 5 yr after entry into study. histologic evidence of axillary nodal involvement. Prog- noses of women with negative and positive nodes are more favorable among study women, but, as follow-up time increases, the further annual fall in survival rates in study and control groups becomes similar. Another noteworthy but puzzling observation is that from the fifth year after diagnosis, the margin between the 2 groups is substantially greater among patients with positive nodes than is the corresponding relative difference for those with negative nodes. We have no clear interpretation of what this means. However, we might speculate that as a result of screening, the patients with positive nodes in the study group have less aggressive cancers than do the corresponding patients in the control group, i.e., screening has shifted some of them into the negative node category where they are combined with another group that have long lead time or are subject to considerable length biased sampling, or both. Whatever the explanation, this experience illustrates the nonequiva- lency between not only negative node cases derived from usual clinical experience and observations in a screening study but also the positive node cases. Figure 6 gives further evidence of the complex changes that occur in cumulative CSR under screening conditions. The trend line for all cases detected on screening is slightly concave. Data plotted for subgroups of these cases are based on small numbers and are subject to large sampling variability. Nevertheless, it would appear that the trend in relative CSR for cases detected on mammography only is decidedly concave. For clinical only cases, the trend line is less concave, and after 10 years of follow-up, the rate is approximately the same as for cases detected by both modalities at the time of screening. As seen in figure 6, shorter term follow-up would have led to the conclusion that prognosis of clinical only cases was far more favorable than for the latter group. Changes are taking place in the relative differentials among survival rates detected through different modalities, but the rates continue to be higher in the mammography only group than in the other 2 cate- gories. MORTALITY BY AGE The question whether the lowered mortality from breast cancer among study women is related to age at entry is dealt with here because of its relevance to an issue that is frequently encountered in randomized trials. The HIP study was designed to determine the efficacy of screening women aged 40-64 years of age at entry, and the decision on sample size was based on this objective. However, this does not preclude exploration of relationships involving controlled variables in the stratified random sampling procedure. Age was one of those variables, and, as mortality results started to become available, age was introduced into the analysis with attention drawn to the possibility that small numbers may be a factor in our failure to find a statistically significant difference for one or another age group, e.g., in this study, ages 40-49 years. Short-term follow-up showed such close correspondence between study and control groups in mortality from breast cancer among women 40-49 years old at entry that investigators generally accepted the report of no effect of screening for this age group accompanied by firm state- ments about an effect in women 50 59 years old. With longer follow-up, the differential between study and control groups for the 40-49 year olds increased (table 5) and there appeared to be less reason for differentiating results among the age groups (15, 16). Technically, age at diagnosis, which could not be a stratifying variable at entry, is not a consideration in assessment of the effects of screening, the experimental variable. However, we cannot ignore the fact that, in a repetitive screening program, women will age into older groups as successive rounds of screening are conducted. In the current study, aging brought increasing numbers of women over 50 years at which time a benefit was demon- strated; the relevant group of women were 45 49 years old at entry. The difference in the number of deaths due to breast cancer in this age group 16 years after follow- up (31 for the study group vs. 35 for the control group) is smaller than in the last report (//) on this study, but now, Tair 4. Cumulative CSR (per 100) among confirmed breast cancer cases diagnosed within 5 yr of entry” Confirmed cases — — SE* cases 5 7 10 14 Total study 303 739 67.0 548 479 29 Detected through 132 87.1 78.8 644 552 43 screening Screened but not 93 624 57.0 46.2 422 S53 detected through screening Refused screening 78 654 59.0 48.7 423 5.6 Control 205 59.7 539 464 403 29 Total study (adjusted)’ 303 71.3 644 S538 47.1 29 “ See footnote a, table 3. PSE were due to sampling variability for 14-yr CSR. “Lead time of | yr is included for cases detected through screening. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 BREAST CANCER SCREENING METHODS AND RESULTS 73 as then, the differential can be attributed to the reduced number of deaths (table 6) among patients with breast cancers diagnosed after the women advanced to 50-54 years of age (13 vs. 23). If the deaths are aggregated by age at diagnosis, there is no difference between the study and control groups at 45-49 years (11 + 18, study group vs. 16 + 12, control group). The numbers at 40-44 years, age at diagnosis, are too small for meaningful assessment. The uncertainty introduced by the age at diagnosis data about effects related to ages 40-49 years at entry remains. It will require a study with substantially larger sample sizes to deal with the issue effectively. Until this occurs, the position that the HIP study, conducted under screening conditions available in the 1960s, has not demonstrated an effect at these ages seems plausible. OTHER RESEARCH ISSUES Determination of the long-term effect of periodic screen- ing for breast cancer on mortality from this condition is the overriding concern in this project. A constraint is imposed by the fact that screening ended after the third annual rescreening examination, a circumstance that becomes increasingly significant because of the recovery in time of the study group to a prescreening state with respect to newly diagnosed breast cancer. This problem calls for estimation procedures in measuring longer term effects of screening based on breast cancer mortality. One approach we have taken results in an estimate of a 30% decrease over a 10-year period among the study women. Application of statistical models to the data which received considerable attention several years ago (/7) might well be reexamined with the availability of information covering a longer period. Follow-up of cases from date of diagnosis has the potential for clarifying whether screening acts not only to delay mortality but also to increase the cure rate for breast cancer, i.e., to assess the characteristics of changes in the natural history of breast cancer. -Relative survival rates make it clear that such changes are marked and that we are dealing with a phenomenon for which longer follow-up is needed. Other analytic methods are being applied to the TABLE S5.— Deaths from breast cancer by age at entry 5 and 16 yr after entry” No. of deaths through following yr after entry” Age at 5 16 entry, yr Study Control Study Control Total 39 63 121 155 40-49 19 20 49 61 40-44 9 11 18 26 45-49 10 9 31 35 50-59 15 33 53 70 50-54 8 23 29 37 55-59 7 10 24 33 =60 S 10 19 24 “ See footnote a, table 3. ® Deaths with breast cancer cited as an underlying cause are included with those diagnosed during the first 5 yr after entry. SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES Table 6.— Breast cancer deaths, age at entry 40-49 yr, by age at diagnosis, 5 and 16 yr from entry” No. of deaths through following yr after entry Age at diagno- 5 16 Sis, yr Study Control Study Control 40-49" 9 1 18 26 40-44 5 5 7 10 45-49 4 6 11 16 45-54¢ 10 9 31 33 45-49 7 3 18 12 50-54 3 6 13 23 ? See table 5; also footnote a, table 3. ® Age at entry was 40-44 yr. © Age at entry was 45-49 yr. data, including use of the hazard function and smoothing techniques. Also, the closely related issue of lead time is being reexamined, based on the experience acquired in the first 10 years after entry, but this time we have the additional objective of estimating lead time by age at diagnosis. From the early years of the study, the policy of making data from the study available to other investigators in the United States and elsewhere has been followed. The HIP data set is unique because of the randomized design and the long period of follow-up of study group women, screened and not screened, and the control group. Interesting ideas have emerged about modeling and analytic approaches to the data which, in some instances, have led to estimates of lead time and to conclusions about the effect of starting screening at ages 40-49 years that differ from those of the study’s investigators. In short, the additional analyses have contributed importantly to our understanding of the certainties and uncertainties of the HIP study results. The policy of availability of data will continue. There are limitations in what can be learned from the study. Because of the design, it is impossible for one to measure 1) the separate effects of mammography and clinical examination of the breast on reduction in breast cancer mortality achieved through periodic screening, or 2) the effect of different periodicities in screening. Although the HIP study indicates that screening does have an appreciable impact on breast cancer mortality, the magni- tude of this effect derived from the study needs to be interpreted in the context of the era in which the study was conducted. Over the past 10-15 years, changes have occurred in detection procedures; mammography has improved and nurse-practitioners have assumed responsi- bilities for breast palpation in screening programs. Judged on findings from the Breast Cancer Detection Demonstration Projects (/8), breast cancer, under properly controlled conditions, can now be diagnosed earlier and with lower radiation exposure than that used in the HIP study. This improvement raises the possibility of increased benefits from screening. The absence of a suitable compari- son group in the Demonstration Projects makes it necessary that we turn to studies in the United Kingdom, Sweden, and The Netherlands for new information on the subject. 74 SHAPIRO, VENET, STRAX, VENET, AND ROESER The randomized trial in Canada has a special potential for answers to questions about the relative importance of each modality of detection and the efficacy of screening women 40-49 years old (19). The perspective then is availability of a large body of additional findings on the value of screening for breast cancer in the not too distant future. REFERENCES (1) HoLLEB Al, VENET L, DAY E, et al: Breast cancer detected by routine physical examinations. NY State J Med 60:823-827, 1960 (2) DAY E, VENET L: Periodic cancer detection examinations as a cancer control measure. Proc Natl Cancer Conf 4:705-707, 1961 (3) GILBERTSEN VA: Survival of asymptomatic breast cancer patients. Surg Gynecol Obstet 122:81-83, 1966 (4) SHIMKIN MB: Cancer of the breast. JAMA 183:358-361, 1963 (5) GERSHON-COHEN J, HERMEL MB, BERGER SM: Detection of breast cancer by periodic X-ray examination. JAMA 176:1114-1116, 1961 (6) EGAN RL: Mammography, an aid to diagnosis of breast carcinoma. JAMA 182:839-843, 1962 (7) SHIMKIN MB: In the middle: 1954-63—Historical note. JNCI 62:1295-1317, 1979 (8) CLARK RL, COPELAND MM, EGAN RL, et al: Reproduci- bility of the technic of mammography (Egan) for cancer of the breast. Am J Surg 109:127-133, 1965 (9) FINK R, SHAPIRO S, LEWISON J: The reluctant participant in a breast cancer screening program. Public Health Rep 83:479-490, 1968 (10) FINK R, SHAPIRO S, ROESER R: Impact of efforts to increase participation in repetitive screenings for early breast cancer detection. Am J Public Health 62:328-336, 1972 (11) SHAPIRO S, VENET W, STRAX P, et al: Ten- to fourteen- year effect of screening on breast cancer mortality. JNCI 69:349-355, 1982 (12) ZELEN M: Theory of early detection of breast cancer in the general population. /n Breast Cancer: Trends in Research and Treatment (Heuson JC, Mattheiem WH, Rosencweig M, eds). New York: Raven Press, 1976, pp 287-300 (13) SHAPIRO S, GOLDBERG J, HUTCHISON G: Lead time in breast cancer detection and implication of periodicity of screening. Am J Epidemiol 100:357-366, 1974 (14) PrOROK PC: The theory of periodic screening. I. Lead time and proportion detected. Adv Appl Prob 8:127-143, 1976 (15) DUBIN N: Benefits of screening for breast cancer: Applica- tion of a probabilistic model to a breast cancer detection project. J Chronic Dis 32:145-151, 1979 (16) ProrOK PC, HANKEY BF, BUNDY BN: Concepts and problems in the evaluation of screening programs. J Chronic Dis 34:159-171, 1981 (17) BEAHRS OH, SHAPIRO S, SMART C, et al: Summary report of the working group to review the National Cancer Institute- American Cancer Society breast cancer detection demonstration projects. I. General issues related to breast cancer screening. JNCI 62:655-662, 1979 (18) BAKER LH: Breast cancer detection demonstration project: Five-year summary report. CA 32:194-225, 1982 (19) MILLER AB, HOWE GR, WALL C: The national study of breast screening. Clin Invest Med 4:227-258, 1981 Discussion Il 1-2 R. Peto: Two questions, Mr. Shapiro. You mentioned just in passing the need to select people who had their disease diagnosed before entry into the study so that they could then be excluded from it. I wonder if you would give us details of exactly what was done. The second thing is that, in looking at your data, I would have expected more cases of breast cancer to have been found in the treated group. One could believe that the breast cancers that were found were to some extent treated more effectively because they were picked up earlier. On the other hand, mere chance might have put into the treated group fewer women who were going to develop breast cancer. To some extent screening might not only pick up breast cancers earlier, it might actually detect some cancers that would not have been discovered at all. It seemed to me as if some effect of the play of chance was in the treated group because the numbers of cases diagnosed in the treated and control groups were similar. S. Shapiro: I am not sure I understand the second question fully, but let me try to deal with what I think you are asking. We aimed to determine the efficacy of screening only among women who had not previously had a breast cancer diagnosis. The process was straightforward for identifying who among the women screened were to be excluded from the study because breast cancer had been diagnosed before the start of the study, i.e., it was determined by the examina- tion. The situation was different for women who re- fused the screening invitation and for the control group. When our follow-up of the total cohorts indicated that a woman had been hospitalized for breast cancer, we queried hospitals and physicians to determine whether this was an incident case or whether breast cancer had been diagnosed before the woman's entry into the study; if so, she was excluded. Of course, some prior breast cancer cases are still unknown to us among the study group women not screened and the controls that should be excluded. How- ‘ever, these cases are believed to be few in number and are unlikely to affect our comparisons. Does that answer your question? Peto: Yes, basically, if the women had had prior surgery. As for my second question, I would have thought that screening would have detected a lot more cases in the screened group than in the controls. Shapiro: Well, as you know, in the initial screening we find the prevalence cases, and more cases of breast cancer I Conducted at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Address reprint requests to Lawrence Garfinkel, Epidemi- ology and Statistics Department, American Cancer Society, 4 West 35th Street, New York, N.Y. 10001. were found during the first year in the total study group than in the control group. The accrual of breast cancer cases during the active period of rescreening should be similar in the total study and control groups, unless screening detects cancers among the screened women that would otherwise never become clinically known, i.e., the accrual relates to incidence which should be the same in the 2 groups. As the interval from last screening increases, the number of new cases detected among women in the screening program decreases and reflects the lead time gained through screening which had advanced the date of detection. In our project, equalization occurred between study and control groups in the number of breast cancer cases in year 6 or 7, and the equivalency has been maintained subsequent to year 7 except for minor varia- tions from year to year. Peto: You were bound to catch up in the end. There is bound to be some divergence, then you will get the subsequent parallelism with equalization of detection rates between cases and controls. What surprises me is that this equalization was complete; I would not expect it to be as complete as that. I wondered if that suggested chance weighted things slightly in favor of the treated group. Shapiro: Just to show how fortuitous things can be, at 10 years following entry, exactly the same number of his- tologically confirmed cancer cases were detected in the study and control groups. From that point on, it varied, several more in the study group, several more in the control group. So we have had some variation. Peto: Yes, but it still surprises me that the detection rates were so complete. E. Wynder: Mr. Garfinkel, the key questions that most of us have studied are related to smoking; we think we have both answers and data. Is there a difference between tar yield and lung cancer risk? On the issue of passive inhalation, what do you expect to find on this subject which you know is of crucial significance? Do you expect to make a contribution to the dietary fat data relating to breast cancer and cancer of the colon with your new Cancer Prevention Study? L. Garfinkel: One of the major goals of Cancer Pre- vention Study II is to study tar yield of cigarettes in relation to lung cancer. In a preliminary analysis of smoking habits of our study population, we found that among smokers about 26% of the males and 29% of the females smoked cigarettes with less than 10 mg tar. At the other end of this scale, about 209% of the males smoked cigarettes with 16-19 mg tar, and 12% smoked cigarettes with 20 mg or more of tar. Among women, 22% of the smokers used cigarettes with 16-19 mg tar and 1.7% with 20 mg or more. With respect to passive smoking, I think we have a wide enough distribution to determine after a while if there is a relationship with mortality rates. We asked the people 75 76 DISCUSSION II whether they were exposed directly to the cigarette smoke of others, and the nonsmokers responses were divided roughly into thirds, those who reported they were not exposed at all, those who were exposed 1-3 hours a day, and those exposed 4 or more hours a day. Your third question was about dietary fat. I do not know at this time if we will be able to make an analysis of dietary fats and cancer rates. As you know, it is difficult to ask questions about diet and get answers that you are confident about and can classify well for epidemiologic studies. We asked the questions that we thought were pertinent and with the advice of a number of consultants. Whether they will actually be related to mortality from cancer and other diseases remains to be seen. G. Howe: This is a question for Mr. Garfinkel. In the first study, you mentioned that you had an underascer- tainment of deaths of approximately 89% by linking records to state death records that was due primarily to interstate migration. Do you plan to use the now available National Death Index in the second study to overcome the problem of interstate migration? Garfinkel: The reports I have heard about the National Death Index thus far are not good. Perhaps Mr. Jablon or maybe Mr. Haenszel have more information than I do. In our study, we recognized that there are many obstacles to data linkage because reports of dates of death may be in error by months, even years; date of birth on death certificates, by next of kin, may also be incorrect. Many characteristics will not match up with data on a death record. One really has to scrutinize a death record to see if a certificate is acceptable, despite many discrepancies. We made an analysis of age of death on the original questionnaire and on death certificates in the first study and found that 95% of all death certificates were within 5 years of the death on the original questionnaire. All the others were discrepant by more than 5 years, both older and younger. We found peaks of 5 years and 10 years, particularly 10 years, because people make errors when they subtract their year of birth from the current year and this creates some difficulties when we are matching histories. In our second study, we asked for the Social Security number on the questionnaire and most, but not all, recorded it. In our pretest, the one question that was most objectionable to participants was reporting the Social Security number. Therefore, we decided to leave it on the questionnaire but with the word “optional” next to it, and about 90% of our subjects recorded it. A. Lilienfeld: Mr. Garfinkel, I would like to ask a general question in reference to the use of volunteers. This issue came up when the study was started. Someone suggested that certain counties be included in the study through a probability sample of their populations and that these be added to the data collected throughout the United States by the volunteers. This would mean that the findings of the total study would consist of 2 subsets, 1 based on a probability sample and 1 on volunteers. Comparisons could then be made between the findings of these 2 groups. This was initially suggested in answer to certain questions raised by Berkson with regard to the original Hammond- Horn study. Garfinkel: I think this is an issue worth discussing. Mr. Edward Lew mentioned before that the death rate in the study population does not reach that of the national rate; it is more like the rate of insured persons. However, the main point is that we make internal comparisons within the study. You would have to postulate that, by taking this select group and comparing smokers and nonsmokers or people who eat one kind of food versus people who do not, the results would differ from those based on a random population of a particular county. I think this is unlikely. Dr. Hammond, would you like to comment? E. C. Hammond: Certainly, a 100% sample is at least as good as, if not better, than a probability sample. For example, in some rural areas in the first Cancer Preven- tion Study, we obtained close to 100% of the eligible population. In a few other places, we got a representative sample. In Nashville, Tennessee, for instance, we divided the city into census tracts and recruited our volunteers in proportion to the number of people in each census tract. These areas were divided by economic status, and thus we had enrollees who were almost in direct proportion to the socioeconomic makeup of the city. It was not a strict probability sample, but it came close to being one; the slight variation could not have made a difference. I believe that it is entirely unnecessary to get either 100% of the population, which Dr. Harold Dorn used to argue, or to get a probability sample. As you may know, I re- examined his data at his request. I put the 2 studies on the same tape and got similar results with respect to smoking. I could not see how obtaining what was purported to be 100% follow-up of the population would help much. G. Comstock: I would like to ask the first 2 speakers what criteria they set up for matching. We all tend to speak glibly about matching records but rarely do we define what elements must agree for an acceptable match. I am referring to record linkage. When you have a death certificate and a population record, how do you decide whether they match? Usually, agreement or disagreement of the matching elements is obvious, but sometimes it is uncertain, and biases can creep in without clear-cut rules. Garfinkel: We have all the demographic information in our files on all persons, i.e., the name and address, date of birth, spouse’s name, place of birth, and sometimes the name of a daughter or son, who could be the informant on the death certificate. The volunteer furnishes a date and place of the death. If the name, date, and place of death match the death certificate perfectly, we accept it until a later review. Sometimes we accept the certificate and, upon later review of age and other variables, it becomes apparent that the record is that of the father of the person on our records. In our particular experience, in at least 90% of the deaths reported, we were able to match our record with the death certificate with little or no problem. Sometimes the reporting volunteer did not give a complete date of death and we would have to call back and ask her to check it. Sometimes the name was spelled wrong, and we had to check again with the volunteer. When we had the best information available, the problems of matching were minimal. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 DISCUSSION 11 77 In other cases, the information reported was verified as true, but after reviewing all the available information, if we still had some doubts, we rejected it. The subjects reported dead for whom we could not locate an acceptable death certificate were called “not traced.” When the first study was completed, we had to review carefully about 250-300 reported deaths out of 190,000 and make a decision whether to accept or reject the death certificate. Hammond: I remember we received considerable help from the members of the Undertakers Association; they helped us a great deal in making sure we had the correct person’s death certificate. When the wife’s name was checked on the certificate, they helped us get information from the wife as well as from the appropriate undertakers. They also gave us clues as to additional sources for verification of the identity and the death. Sometimes we could not be absolutely sure that we had the correct information. This happened in only about 200 of nearly 200,000 deaths, however. Therefore, I agree it is a difficult problem, but it is not an impossible one. Garfinkel: Dr. Comstock, once we could not find the death certificate after verifying the date and place of death through several sources. There was no doubt that a certain George Jones died on a certain date and in a specific place, but that certificate could not be found in the health department. Either he was buried without a death certificate or it was lost in filing. Those things happen. There is one other point. In Cancer Prevention Study I, our annual follow-up began in October. Someone may have died in August or September, and either the death was not reported by our volunteer, or if reported, for one reason or another, it was delayed in being put into the state health department’s computer and was not available to us until the following January or even later. We failed to locate the certificate then, only to pick it up in the next year’s follow-up. About 5% of the deaths that should have been located in a given year were found in the following year. Howe: The question of matching records is a critical one in many studies, and I believe the procedure should be quantified as far as possible with the use of appropriate probabilistic techniques. It is surprising how much error can creep into a comparison of records when one relies on subjective nonquantified judgments. J. Higginson: The American Cancer Society study presumably will be the last big volunteer study ever to be done, and the Framingham Study will never be set up again. Do you see such studies ever being done again in the future, unless they are conducted under government or state auspices, possibly in relation to Social Security or in a nonvoluntary situation? Hammond: I spent the better part of the last 31% years designing a study to be done in China. At this point, I do not know if it will be conducted, but it likely will be. However, this study will be based on 5,000,000 not 1,000,000 persons and will include a medical examination at the start. The Chinese are convinced that the study will have to be on a voluntary basis to a large extent. Many will quarrel with their definition of volunteer, however. Bare- foot doctors, nurses, and school teachers will be asked to volunteer to do it. The officials assure us that the response SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES rate will be high. When the people are asked to volunteer, they volunteer. IN. Breslow: I would like to make a general comment inspired by Dr. Feinleib’s earlier presentation. One of the features of this Workshop that I appreciate most is the historical perspective it gives on methodologic techniques in current use. Some of the most important statistical procedures were developed originally by practitioners on an empirical or intuitive basis. A few years later, theoreti- cians derived the same procedures in the context of a specific mathematical model and thus developed further insights into their performance. We saw some examples of the contributions of life table methodology to medical follow-up studies. One example cited by Dr. Feinleib comes from the Framingham Study. Statisticians working on this project in the early 1960s treated the results of the periodic examina- tions conducted on each participant as statistically inde- pendent and looked at the subsequent 2-year mortality as the risk factor status at the time of each examination. They worried that the assumption of independence might not be strictly valid but nevertheless went ahead and analyzed the data in what seemed like a reasonable manner. Today we would recognize that this procedure is fully justified as an application of the Cox proportional hazards model with time-dependent covariables. J. Stellman: I have a question about the control group Mr. Jablon used to represent the remainder of Japan which was not directly affected by the atomic bomb. Did you take into account the severely deprived conditions of postwar Japanese society? Was it possible, or would it be possible now, to compare prewar and postwar outcomes in Japanese mothers? How would birth outcomes among such mothers compare with those in the United States? S. Jablon: I cannot answer the second question as well as I would like. We have not really done much in the way of comparison of prewar and postwar data. The fertility of postwar women has been examined in relation to their histories of prewar pregnancies. The difficulty, however, is that these women were aging all the time. It is difficult for one to compare their status in the late 1940s with the late 1930s. Essentially, the kind of comparison you are asking about was not done. Concerning the nature of controls, more generally I should say that there are no really satisfactory controls. A nonexposed comparison group was comprised of many different kinds of people. Hiroshima before the war was an important source of emigrants. In fact, many of the Japanese migrants to Hawaii and California came from Hiroshima, and persons from Hiroshima could be found in China, southeast Asia, the Philippines, Korea, and elsewhere. After the war, these people were all repatriated and were a large fraction of the so-called “nonexposed” population of the city; they were obviously different from those who remained in the city. Those who were in the city, particularly young men between the ages of 20 and 30, were men who had not been inducted into the Japanese Imperial forces during the war and were elsewhere at the time of the bombing. Thus you had a strong adverse medical selection because most of the 78 DISCUSSION 11 healthy young men were gone. For that reason, little reliance has been placed on comparison of the exposed and nonexposed persons when investigators searched for- radiation effects on health. Inasmuch as we have been able to make dose estimates for survivors, we sought effects by regressing health effects on dose. Fortunately, the dose was a sharp function of distance from the hypocenter and varies considerably. J. Stellman: I did not realize that. Where did the fallout go? Did it go to other places in Japan or around the world? Jablon: No, there was some of what is called “rainout” in Nagasaki in an area east of the city called the Nishiyama district which, as it happens, was a sparsely inhabited, mountainous place. Over the western part of Hiroshima in a suburb called Takasu, which is about 2,500 m from the hypocenter, some fallout occurred, with doses of 5-10 rad to the residents. The division between people who might have received some radiation from fallout and those who received doses from direct radiation from the bombs was sharp. H. Seidman: The Health Insurance Plan study was a wonderful accomplishment, but in its sheer wonder, it becomes a mixed blessing. We get impaled by its results because such randomized trials are so difficult, expensive, and time-consuming. For instance, considerable improve- ment has been reported in the efficacy of mammography in younger women, mammography which was difficult for radiologists and technicians to accomplish successfully in dense breasts during the 1960s when the examinations were conducted. Despite the great advances in technology and the considerable documentation of these advances, they are usually discounted in the face of the results of younger women based on what amounts to obsolete technology. Randomized trials are marvelous; they are the best we can do to get the most definitive results, but they are not our only source of worthwhile information. My second point is that age at diagnosis presents many problems in analysis. The whole point of our use of screening is to advance the age at diagnosis. Thus we have women with breast cancer who are detected under screening conditions at age 50 and over. Just what proportions they represent depend on the effectiveness of screening. The lead time on the average may be 1 year, but the distribution is such that the proportion of women found at ages under and over 50 in 1 group compared with another may vary. However, more women in the study group will be found at ages under 50 than will be found in the controls. For example, in the Health Insurance Plan study, 68 breast cancers were diagnosed during the first 5 years after entry among both study and control group women who were 45-49 years old at entry. Whereas 40 of those diagnosed under 50 were study group women, only 30 of those so diagnosed were controls. This certainly biases the numbers of women then found to die among those diagnosed in particular age groups. Thus the usual unqualified state- ments are made that, in classifications by age at diagnosis, deaths from breast cancer were unchanged for women under 50 (36 vs. 38 deaths in the 10- to 14-year follow-up of study and control women aged 40-49, respectively), and the reduction really was to be seen after age 50 (10 study compared with 23 controls). If you allow for the fact that 1) about 10 control group patients were diagnosed after age 50, whereas their study group counterparts were diagnosed at ages 45-49, and 2) that about 6 deaths would have occurred among the 10, then adjusted comparisons of the breast cancer deaths would be 36 compared with 44 for women under age 50 and 10 compared with 17 after age 50; these figures convey quite a different message from the unqualified results. They also serve to highlight the truism that age at entry analyses are much more in accord with the motivations underlying the design of randomized clinical trials. A third point concerns length bias. It is true that the more indolent cancers are found under screening condi- tions. However, at the first examination one tends to find prevalence cases, but one also tends to find cancers which may already be beyond the natural course of the disease when the screening is going to be beneficial. These findings serve to underestimate the potential accomplishments of clinicians starting screening at earlier ages. The fourth point is that for determining the most reliable results, one looks at a total study group compared with the total control group. In the randomized Health Insurance Plan trial, the total study group was comprised of two- thirds of the women screened at least once and one-third of the women never screened. Presumably, the total benefits are concentrated in the screened women, and one would think some attempt might be made to estimate the benefits in this self-selected group of women compared with their counterparts among the control group. Perhaps this can only be done roughly but it seems to me that if one does, one has to get a better improvement in them than in the total study group. This idea follows because we have no reason to believe that the study group of women who were not screened should show any benefit compared with their control counterparts. Rather than disparaging the diffi- culties inherent in interpreting data from such self-selected women, investigators should be delighted with them because if any group is going to show a life-prolonging effect, if the effect is present, these are the women who will do it. W. Haenszel: Mr. Shapiro, 1 realize that in your presentation you dealt with the total category of breast cancer cases without any subdivision by histologic types. I realize, too, that you had to be selective in your presenta- tion. Could you offer any comment as to whether your findings are reinforced by the considerations of histologic type or does this make a difference? Shapiro: 1 will deal with the question on histology first. When we addressed this issue some time ago, we found that histologic type differentials were small, except that the study group had more intraductal cancer than the controls (39 vs. 24). However, our analysis which took this into account was not productive of an explanation for the lowered mortality from breast cancer in the study group. Recently, an epidemiologist has approached us with the proposition that the slides be reexamined to grade the tumors. This is an interesting idea, and the study is now in an exploratory stage. Incidentally, one of the points I did not make in the paper is that from the early years of this study, we have NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 DISCUSSION 11 79 made data available to outside investigators who wished to take other analytic approaches to our material. In a number of instances, they did arrive at different conclu- sions, e.g., on the significance of the breast cancer mortality difference between study and control groups among women aged 40-49 years at entry (an issue that I covered in the presentation) and also in the estimates of lead time. Mr. Seidman’s comments raise a number of interesting questions which I believe we cannot answer given the complexity of breast cancer and the uncertainty about how screening changes the mix of cases at detection. In part, that was the point behind some of the slides in my presen- tation. For example, it is far from clear why those with positive nodes in the study group show a substantially higher relative survival rate than the corresponding group among the controls, and why this advantage seems to be greater than when we make similar comparisons among those with negative nodes. Perhaps the explanation will be found in other characteristics of the tumor, such as size or number of nodes involved, or perhaps small numbers are affecting the picture. On the other hand, these relationships may be reflecting fundamental changes in case mix within each stage of disease subgroup, e.g., through lead time and length biased sampling, which are not identifiable for classification or measurement purposes. Furthermore, the nature of these changes may vary from one screening program to another. Even more problematic is the attempt by investigators to partition the credit for reducing breast cancer mortality between screening modalities in a study like that of the Health Insurance Plan in which both mammography and clinical examination of the breast were applied. To a large extent, the motivation has been the establishment of the utility of each modality under screening conditions. In time, the relative survival rates may tell us whether cases SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES detected on mammography alone have a different cure rate than the clinical only cases, or whether the difference in the shapes of the survival curves is due to lead time and length biased sampling. This is useful to know but it cannot tell us what would have happened to case mix if only 1 of these modalities had been used in a repetitive screening program. To settle the issue, we will need to wait for information from the randomized trial in Canada, which provides for an appropriate control group and determination of relative effectiveness of each modality based on measures of mor- tality from breast cancer. With respect to the specific issue of how to interpret the data of the Health Insurance Plan study for women 40-49 years old at entry, we have two opposing views. Mr. Seidman concludes that women in this age group do benefit from screening. Perhaps with improved mammography this is true, but the results of the Plan’s study are equivocal because the reduction occurred only among women whose cancers were detected after they were 50 years old or older. To advance a public policy of screening at ages 40-49 years, we must be assured that the evidence of gain is more certain. A national program of screening women at these ages would require huge resources for the examinations and major promotional efforts. Actually, we have not made significant progress in screening large segments of the women 50-59 years old for whom benefits could be at a 30% level. Mr. Seidman has also advanced the idea that we are underestimating the effect of screening because the data for the total study group include breast cancer mortality among the participants in the screening examinations. This is so, but I am not sure how useful such an estimate would be, aside from all the qualifications. For public policy purposes, it is not going to make much difference if the 30% benefit already shown is increased to 40%. SESSION III Chronic Disease Studies: Occupational Cohorts Chairman: Philip J. Landrigan Co-Chairman: Joseph F. Fraumeni, Jr. fee Chairman’s Remarks! Philip J. Landrigan ? Workers constitute the subset of the American popula- tion who are most heavily exposed to chemical and physical toxins. In consequence of those frequently heavy and prolonged exposures, workers tend to develop illnesses of toxic etiology more frequently than does the general public, sooner after the introduction of new chemical compounds, and in more severe forms. Examples of toxic illnesses which were first recognized in industrial populations include angiosarcoma of the liver in workers exposed to vinyl chloride monomer (7), reduced fertility in men exposed to dibromochloropropane (2), and neuropathy in workers exposed to n-hexane (3). In addition to their heavy exposures and consequently heightened risk of disease, employed populations have other attributes which make them particularly suitable for study by the cohort methodology: 1) Occupational populations are more carefully docu- mented than are the general public. For example, inclusive dates of employment and information on job categories are frequently available; workers often can be traced because they belong to retirement programs; also, Social Security numbers are known. 2) The exposures of occupational populations have frequently been characterized through industrial hygiene evaluations or at least can be estimated closely. This information makes possible the estimation of cumulative exposures to toxic agents (4) and may serve ultimately as a basis for quantitative risk assessment (5). 3) Occupational populations have frequently been sub- jected to periodic medical examination and to biologic monitoring. Data from those examinations may provide a basis for the further evaluation of toxic exposures and also for assessment of the temporal progression of disease. Of course, some problems may impede the study of occupational cohorts. These include the frequently small sizes of the populations available, difficulties with access, multiplicity of toxic exposures, and the lack of appropriate comparison populations. Additionally, investigators wish- I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 National Institute for Occupational Safety and Health, 4676 Columbia Parkway, Cincinnati, Ohio 45226. ing to undertake future cohort studies of worker popula- tions will need to consider closely ethical issues pertaining to notification (6). Despite those potential difficulties, the advantages of conducting cohort studies in occupational populations are immense. It is more likely that diseases will be detected and related to specific exposures by such studies than by those of almost any other segment of the population. In addition, studies in worker populations provide data of unique importance for the prevention of illness not only in workers but also in the general public. Cohort studies of workers exposed to such agents as asbestos (7), radon gas (8), benzene (9), and arsenic (/0) provided information on the hazards of those materials years before the hazards were recognized to extend to the general population. Cohort studies of occupational populations should continue to be pursued vigorously. REFERENCES (1) CREECH JL JR, JOHNSON MN: Angiosarcoma of the liver in the manufacture of polyvinyl chloride. J Occup Med 16:150-151, 1974 (2) WHORTON D, MiLBY TH, KRAUSS R, et al: Testicular function in DBCP exposed pesticide workers. J Occup Med 21:161-166, 1979 (3) HERSKOWITZ A, ISHII N, SCHAUMBURG H: N-hexane neu- ropathy: A syndrome occurring as the result of industrial exposure. N Engl J Med 285:82-85, 1971 (4) SMITH AH, WAXWEILER RJ, TYROLER HA: Epidemiologic investigation of occupational carcinogenesis using a serially additive expected dose model. Am J Epidemiol 112:787-797, 1980 (5) International Agency for Research on Cancer Working Group: Some Aspects of Quantitative Risk Estimation. IARC Monograph: Evaluation of the Carcinogenic Risk of Chemicals to Humans, vol 29. Lyon: IARC, 1982 (6) SCHULTE PA, RINGEN K, ALTEKRUSE EB, et al: Notifica- tion of a cohort of workers at risk of bladder cancer. J Occup Med 27:19-28, 1985 (7) SELIKOFF 1J, CHUNG J, HAMMOND EC: Asbestos exposure and neoplasia. JAMA 188:22-26, 1964 (8) ARCHER VE, WAGONER JK, LUNDIN FE Jr: Uranium min- ing and cigarette smoking effects on man. J Occup Med 15:204-211, 1973 (9) RINSKY RA, YOUNG RJ, SMITH AB: Leukemia in benzene workers. Am J Ind Med 2:217-245, 1981 (10) LEE AM, FRAUMENI JF JR: Arsenic and respiratory cancer in man: An occupational study. INCI 42:1045-1052, 1969 83 Selection, Follow-up, and Analysis in the Birmingham Study J. A. H. Waterhouse ? ABSTRACT —The Birmingham, England, Cancer Registry is so organized that every case of cancer in its territory of 5,200,000 persons is included. This coverage allows the staff to detail every epidemiologic aspect of the cancer experience of a whole population. For example, this registry system made it possible for us not only to demonstrate that the Birmingham region had four times the incidence of scrotal cancer as another region had but to identify the locations and the specific practices in the workplace responsible for the excess. The result was the successful adoption of protective measures. Other instances are presented of the inestimable value of a population-based registry to cancer epidemiology.— Natl Cancer Inst Monogr 67: 85-88, 1985. I must begin by describing some of the peculiarities of the Birmingham Cancer Registry, i.e., those idiosyncrasies which constitute its character and its relevance to our present field of discussion. Toward the end of 1935, a clerical officer was appointed at the General Hospital in Birmingham to complete the forms requested by the Radium Commission that had been established by the British Government in 1929 to purchase and distribute radium to the major hospitals of the country to aid in the treatment of cancer. A year or two after the staff of the Commission had begun to work, they decided to request some information about the patients treated by the radium provided and also about other patients not treated by radium. In Birmingham, we were fortunate in the appoint- ment to this work of the redoubtable Miss Levi, a woman of great drive and energy, who continued until her retirement less than 10 years ago. Though her diligence led to the inclusion in the files of every case of cancer treated or not, the Registry was what we would nowadays call hospital based. With the advent of the British National Health Service, we could envisage the collection of similar data on a regional scale: the regions being the units of administration, based on several millions of population. Birmingham was (and still is) the largest, with a current population of 5.2 million. My affiliation with the Cancer Registry began about this time. Miss Levi’s records were kept in large loose-leaf ledger books; I decided that they should be on punch cards and a standard form, more extensive than that required by the original Radium Commission, was needed to list the items to be recorded. In addition to the basic identification and social data about the patient and the names of the clinicians and hospitals involved, a full clinical description of the tumor, both I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Regional Cancer Registry, Queen Elizabeth Medical Center, Birmingham B15 2TH, United Kingdom. macroscopic and microscopic, and of the treatment given was essential. Follow-up should be added at regular intervals, shorter initially, but thereafter annually until eventual death, and it should include details of any extension of the growth (recurrences or metastases) or the development of fresh primaries, with descriptions of the treatment given. At this time, the International Classifica- tion of Diseases codes for cancer were inadequate, so that we were forced to design our system of hierarchic and contingent codes largely for reasons of economy of space on the punch card, although we were using a system which permitted effectually 160 columns, doubling the use of each column by what was known as “interstage punching.” By 1960, we considered we could regard the Registry as being population based because we now had a registration efficiency of 95% that in a few more years was increased to about 989%. It was now much more valuable for epidemio- logic purposes, inasmuch as it could be used to describe the pattern of cancer in the whole of a substantial and representative region, which was typical of the country as a whole and close to one-tenth of the population of England and Wales. One of the consequences, infrequently used because it is unwelcome, is the Registry’s ability to publish in almost as much detail of presentation, treatment, and survival, the results for a single site, such as breast or colon, relating to the whole population of an area rather than to a selected hospital series. The contrast is both sharp and salutary. I have made use of the Registry’s data, ac- cumulated over 10 years, as the basis of my handbook (7), to provide in reference book form the incidence and survival by age and sex for each major system of the body and for each site. We are planning a second edition of this book now on a slightly more extended scale; this will include some histologic data and more information on time trends to utilize the decade or so of data collected since 1974. The passion of our legislators to change the boundaries of administrative units and subunits has affected many internal subdivisions but (fortunately and uniquely in the United Kingdom) has left our outer boundaries unchanged. Thus we can study and compare the changing patterns of cancer in a single sizeable area over a period of about a quarter century. OCCUPATIONAL STUDIES Let us now turn to the use of the Registry’s data in occupational studies. I think one of the first applications was both simple and effective; simple because the time available was extremely short, effective because it made its point. The occasion was the celebrated Stokes case, a test case brought by the trade union on behalf of the widow of a toolsetter who had died from a scrotal cancer. With extremely short notice, I was asked for any evidence of an 85 86 WATERHOUSE epidemiologic kind that might be put forward. The evidence I presented is shown in table 1 and consists essentially of a simple comparison in staged form of our data with those of the South Metropolitan Cancer Registry, which was formed from 2 of the registries around London. The populations relate to the mid-1960s. The message, as it should be in a court of law, is clear; each comparative figure for the South Metropolitan Cancer Registry is close to twice the size of that for the Birmingham Registry, except for cancer of the scrotum for which the relationship is reversed. It demonstrates in unsophisticated form that we had in the Birmingham region approximately four times the incidence of scrotal cancer observed in the South Metropolitan region. Relieved of the pressure of time for the court case, it was then possible for us to examine the subject much more fully, using only our Registry’s data. However, here we could utilize the records back to their start in 1936 and demonstrate the increase in the number of scrotal cancers with time. Clearly, as the scope of the Registry had been increased in the same period, one might only be reflecting a change in the other. For various reasons, a real increase in the number of cases could be shown that equaled the increased use of machines of the bar automatic type in the small engineering factories of the Midlands. There was, of course, a time lag, i.e., the latent period of development of the neoplasm in relation to the carcinogenic stimulus and also skin cancers of other sites, as well as premalignant conditions. Cruickshank and Squire (2) predicted just such a possible development in their early review of Birmingham data in 1950, as I recalled in my epidemiologic study of 1971 (3). Scrotal cancer is not a common condition; even in our peak year it only attained a crude overall incidence of 1/100,000 men. Nevertheless, we could draw upon nearly 300 cases in our series, which I believe to be the largest in the literature. The British Institute of Petroleum sponsored a fuller study of our data in which we endeavored to contact every patient for a personal interview, or, if he was deceased, a surviving relative. I can still remember well the sense of gratification we experienced to see the changes wrought in the factories as a result of our investiga- tions, even though they were probably much more the result of the test case and the rate of compensation now TABLE |.— Cancer of the scrotum: Comparison of registries Birmingham South PATaVSLEs Regional ~~ Metropolitan Cancer Cancer Registry Registry Population, millions Total 4.76 8.22 Male only 2.35 3.88 No. of cases of malignant disease Total, 3 yr 40,069 77.393 Male only 20,625 38,697 Total No. of cases of cancer Testis, 3 yr 152 292 Penis, 3 yr 77 139 Scrotum, 8 yr 113 55 payable to the victims than they were motivated by a sense of general philanthropy and benevolence. Exhaust ventilation abounded, individual filters were fitted to each machine, which now stood in something akin to a drip tray to collect any surplus oil instead of being surrounded by oil-soaked sawdust. Perspex hoods over the working area prevented the scatter of oil from the rotating parts yet permitted inspection of their function; changing rooms, baths, and the free laundering of work clothes were all provided. Finally, solvent-refined oils were substituted for the “neat” (i.e., undiluted) oils which had frequently been used in the past. I want to return to this same topic a little later, but let me first introduce some other occupational studies. Our appetites were whetted by the results of work on scrotal cancer and the fact that the Midlands was an industrial center with a variety of industries and also that in the Registry we had both a data file in which we could trace the diagnosis of cancer among nominal rolls of employees and at the same time a source of cancer incidence rates by site, sex, and age. This meant that we were well placed to undertake such studies, on a morbidity rather than a mortality basis. An enthusiastic and ever-helpful collleague in this field was the late Miles Kipling (who was then the senior medical inspector of factories for the region) to whom we were indebted for obtaining the entrée to many otherwise difficult places. However, rather than discuss a number of separate studies, I should like to describe one which I think conforms more closely to the pattern under review here. This is our study of the British rubber industry. The papers of Case and his colleagues (4-6), published in the 1950s but based on their investigations of the 1940s (shortly after the war), had clearly shown the relationship between bladder cancer and the use of beta-naphthylamine in the manufacturing chemical and rubber industries. It was a constituent of the principal antioxidants used in the making of tires for automobiles. Consequently, beta- naphthylamine was banned from use in the British rubber industry since the end of 1949. No doubt some small residues remained for a short time afterward, but in the main the ban was effective. Therefore, about 20 years later, we were asked to investigate whether the desired result of reducing the incidence of bladder cancer had been achieved. We examined 13 major factories (all the principal ones) of the British tire industry; a few also made general rubber goods. The population of workers included in the study amounted to nearly 40,000. The plan we adopted was to take in every employee who entered any one of these factories between the beginning of 1946 and the end of 1960 and had remained for at least a year. These men (the study was limited to men because so few women were employed in these factories) were divided into 3 entry cohorts according to their first dates of employment in the industry. The 5 years, 1946-50, constituted the first cohort, 1951-55 the second, and 1956-60 the third. We obtained and recorded a full job history from each man and could trace the fate of all but 1.5%. We subdivided the work areas of the factories into 11 groups and examined the mortality from all causes, cancers, certain specific cancers, and from other diseases. In Britain, we do not have a unique personal number as they do in the Scandinavian countries. Most NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 SELECTION, FOLLOW-UP, AND ANALYSES: BIRMINGHAM STUDY 87 Britons have 2 personal numbers: one for the National Health Service and one for the National Unemployment Insurance system. Again, unlike the Scandinavians who are expected to know their numbers by heart, few Britons could even locate their records of 2 numbers, far less remember them. Nevertheless, the National Health Service Central Register, which includes virtually everyone and contains well over 100,000,000 names is remarkably effi- cient in tracing the vital status of each entry on nominal rolls submitted to it by those permitted to make this use of the Register, even when the National Health Service number is not available to simplify the task. When we made adjustments for latency by making the comparison of expected deaths from bladder cancer with those observed to refer only to those who had survived a minimal time (the period of latency to be used) from their entry into the work category under examination, we found a significant excess of deaths only in the first entry cohort but not thereafter. Because only the first cohort had been exposed to beta-naphthylamine (before its ban), our results neatly supported Case’s findings and endorsed the effective- ness of the ban. However, our results also showed that in more recent entry cohorts, excess mortality was recorded for stomach and lung cancers. Similar and other findings have resulted from studies of the rubber industry in the United States. We are currently extending our studies to investigate in greater detail the possible etiology of these conditions. Though I am not in a position to report further results, it is abundantly manifest that the cohort design of the study, with its periodical updating, permits a full examination of the mortality pattern in industry and indeed to exercise upon it an important monitoring function. INVESTIGATION OF MULTIPLE PRIMARIES A further use we have made of the Registry’s data has been in the field of multiple primary cancers. For this purpose, Registry data with a long period of full follow-up is the ideal or almost the essential source. When examined systematically, one usually sees an excess of second primaries (over expected) within a short period of the diagnosis of the first cancer. A number of writers on this topic deliberately exclude the first year to avoid this problem, but Dr. Pat Prior, who has worked on this subject for a long time in my department, has proposed an ingenious reason for including those cases on the grounds that they represent “anticipative” diagnoses (7). This is not the place to expand our discussion on such matters, nor indeed on her other and parallel studies of the malignant sequelae of various chronic disease conditions, such as Crohn’s disease (8), adult celiac disease (9), ulcerative colitis (/0), and rheumatoid arthritis (11). My chief objective in mentioning multiple primaries is to indicate that we already had the expertise within the department to investigate and evaluate their incidence among our patients with scrotal cancer when we did our detailed study of them. We were led to look at subsequent primaries among them because of the observations of a dermatologist colleague, who remarked that a second primary had been found in the lung of one of these patients. This was the second time such an observation had been SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES made. Our investigation showed a significant excess of subsequent tumors in 3 site groups; the skin, the respiratory system, and the upper parts of the alimentary tract. Among men known to have much exposure to oil, skin cancers were not unexpected; we had noted them already in the same factories. The other 2 site groups appeared strongly to incriminate oil mist by inhalation and by ingestion as the carcinogen. When we published these findings, the Lancet hailed them in a leading article (1/2) as the first direct proof of oil mist carcinogenicity. If one looked down a long fac- tory workshop at that time, a blue oil haze was most apparent; oil dripped from lattice roof girders and was sprayed from rotating machinery. Now, however, with all the improvements I mentioned earlier, the atmosphere in contrast is clear and bright. It may well be that oil mist is only discernibly carcino- genic if it is of sufficient concentration and perhaps also undiluted because investigators conducting some later studies failed to demonstrate a similar effect. An interesting parallel can be drawn from an investigation of cadmium oxide exposure in relation to an excess of prostate cancer (13), later shown also in an American study (/4). When we re-examined the records of the same factory about 15 years later (15), the effect had virtually disappeared. Again, the degree of exposure had been much reduced. I find it interesting to speculate whether in some kinds of carcino- genicity there may be a threshold effect which these two (oil mist and cadmium oxide) might exemplify. The Cancer Registry, in Birmingham as elsewhere if it is population based, is, by intention, the opposite of selective because it aims to include every case that occurs in its territory. In a sense, the function or objective of a registry can be regarded as summarizing the cancer experience of the cohort which its population represents, especially if that is reasonably stable; it is an experience epitomized in (/) and other publications. This same data base is also of inestimable value in tracing, in follow-up, and in the analysis of selected occupational and industrial studies, as I have attempted to describe here. Of these, the rubber industry survey typifies well the use of occupational cohorts in time and in exposure categories, whereas the application of analysis by subsequent primaries where possible admits potentially a much wider scope for detailed investigation of the etiology of specific industrial hazards. REFERENCES (I) WATERHOUSE JA: Cancer Handbook of Epidemiology and Prognosis. London: Churchill-Livingstone, 1974 (2) CRUICKSHANK CN, SQUIRE JR: Skin cancer in the en- gineering industry from the use of mineral oil. Br J Ind Med 7:1-11, 1950 (3) WATERHOUSE JA: Cutting oil and cancer. Ann Occup Hyg 14:161-170, 1971 (4) CASE RA: Incidence of death from tumours of the urinary bladder. Br J Prev Soc Med 7:14-19, 1953 (5) CASE RA, HOSKER ME: Tumour of the urinary bladder as an occupational disease in the rubber industry in England and Wales. Br J Prev Soc Med 8:39-50, 1954 (6) CASE RA, HOSKER ME, DREVER B, et al: Tumours of the urinary bladder in workmen engaged in the manufacture and use of certain dyestuff intermediates in the British chemical industry. Br J Ind Med 11:75, 1954 88 WATERHOUSE (7) PRIOR P, WATERHOUSE JA: The incidence of bilateral breast cancer: II. A proposed model for the analysis of coinci- dental tumours. Br J Cancer 43:615-622, 1981 (8) GYDE SN, PrIOR P, MACARTNEY JC, et al: Malignancy in Crohn’s disease. Gut 21:1024-1029, 1980 (9) HoLMES GK, STOKES PL, SORAHAN TM, et al: Coeliac disease: Gluten-free diet and malignancy. Gut 17:612-619, 1976 (10) PRIOR P, GYDE SN, MACARTNEY JC, et al: Cancer mor- bidity in ulcerative colitis. Gut 23:490-497, 1982 (11) Prior P, SymMMoNs DP, HAwkINs CF, et al: Cancer morbidity in rheumatoid arthritis. Ann Rheum Dis 43:128-131, 1984 (12) ANONYMOUS: Hazard of mineral oil mist. Lancet 2: 967-968, 1970 (13) KIPLING MD, WATERHOUSE JA: Cadmium and prostatic carcinoma. Lancet 1:730-731, 1967 (14) LEMEN RA, LEE JS, WAGONER JK, et al: Cancer mortality among cadmium production workers. Ann NY Acad Sci 271:273-279, 1976 (15) SORAHAN T, WATERHOUSE JA: Mortality study of nickel- cadmium battery workers by the method of regression models in life tables. Br J Ind Med 40:293-300, 1983 Selection, Follow-up, and Analysis in the Coke Oven Study ' Howard E. Rockette and Carol K. Redmond ? ABSTRACT —The current standard for exposure to coke oven emissions sets a permissible exposure of 150 ug benzene-soluble fraction of total particulate matter/m>. The major epidemiologic study that formed the basis for this standard including a review of the evidence of a dose-response relationship between exposure to coal tar pitch volatiles and lung cancer is reviewed. Particular attention was given to the selection of the cohort, follow-up procedures, and the evolution of the analysis.—Natl Cancer Inst Monogr 67: 89-94, 1985. In 1962, the Department of Biostatistics initiated a study to investigate the relationship between occupational expo- sure in steel plants and cause-specific mortality among workers with particular reference to respiratory cancers. The cohort identified and followed for mortality consisted of approximately 59,000 men who were employed in 1953 by firms at 7 steel plants in Allegheny County and who represented approximately 629% of all men working in basic iron and steel production in the County. After formation of the cohort, the investigators attempted to determine the vital status of all members as of December 31, 1961. To insure efficient and accurate follow-up, they devised a systematic plan for controlling the flow of records to and from each of the agencies cooperating in the follow-up effort (fig. 1). The final result of this effort was that 97 of the original 59,072 employees in the cohort (0.2%) were lost to follow-up (fig. 2). The 4,716 deaths were confirmed by death certificates. Determination and coding of the primary and contributory causes of death were done by a nosologist trained at the National Vital Statistics Division of the United States Public Health Service, according to the Seventh Revision of the “International Classification of Diseases.” Concurrent with the follow-up was the abstraction of employees’ records at the plant. Each job title was assigned a 4-digit code which identified the occupation as belonging to 1 of 76 subgroups within the steel industry. Approxi- mately 14,000 distinct titles or abbreviations were noted during the assignment of job title numbers. When follow- up was completed, all the information was keypunched and prepared for analysis. Detailed analyses were performed of the cause-specific ABBREVIATIONS: CTPV=coal tar pitch volatile; BSFTPM=ben- zene-soluble fraction of total particulate. I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, 130 DeSoto Street, Pittsburgh, Pennsylvania 15261. Address reprint requests to Howard E. Rockette, Ph.D. mortality for 57 work areas within the industry, and the mortality of the steelworkers cohort was compared with the total United States male population, the Allegheny County rates, and with other work areas within the industry. One of the objectives of the initial steelworker study was the development of general methodology that could be applica- ble to other occupational mortality studies. The procedures of determining vital status and processing of the job history information has served as a model for a large number of historical prospective studies in other industries. In addi- tion, the investigators who did the analyses of the steel- worker cohort were among the first to demonstrate the importance of an internal control group and to use the Mantel-Haenszel chi-square summary statistic for testing whether the relative risk differed significantly from 1. The specific details of the design and results for several of those occupational groups have been published in a series of journal articles (/-10). In this paper, we will focus on those results for the subgroup of workers employed in the coke plant. BY-PRODUCT COKE PLANT The primary purpose of the by-product coke plant is the transformation of coal into metallurgical coke. A secondary function is the recovery of chemical by-products resulting from carbonization. A by-product coke plant can be roughly divided into the following 3 areas: 1) the coal handling area where the coal is stored and blended; 2) the coke ovens where the coke is produced; and 3) the by- products plant, which is used for the recovery of gas and chemical products resulting from the carbonization of bituminous coal. The highest exposures occur in workers assigned to the coke ovens. Coal is charged through ports on top of the oven while doors on both sides of the ovens are removed to push the coke out into railroad quenching cars. The major exposures to workers in the coke oven area result from leakage about the lids or pipes at the top of the ovens or from the oven doors due to incomplete sealing. In the coke oven study, workers within the coke plant were classified as being oven or non-oven workers. Because terminology from plant to plant and overtime is not standardized, we encountered difficulty in classifying some of the jobs. The coke oven group included all jobs which required that some or part of the working day be spent at the top or side of the ovens. The coke oven category was further subdivided into topside and side oven exposure. Topside work involves greater exposure to oven effluents. COKE OVEN STUDY The first analysis of mortality patterns in the steelworker cohort by work area indicated an excess of respiratory 89 ROCKETTE AND REDMOND Determine Emplovee Record status on 1/1/62 59072 I [ I IN 1. Still Retired Left employ- Died during employed prior to ment prior employment 1/1/62 1/1/62 to 1/1/62 1953-1961 32263 7842 17128 1839 L l J Alive 2004 Review of Employer Record 24970 Deceased 1496 Alive 9087 Alive 2828 Alive 3846 l County Death Lists 1953-1963 21470 Deceased 1124 i R.L. Polk & telephone directories 20333 i Post Office Mailing Card 16776 l Deceased 1 Deceased 54 Local Inter- nal Revenue Service 7635 { Social Security Deceased i! Deceased Adminiscratiod 4807 197 1 Telephone contacts 764 3 Board of Public Assistance 260 T Bureau of Employment pecurity 204 Deceased 4 Deceased Deceased l Tal Ta] 16] Bureau of Missing Persons 175 i Other Sources 164 + Status Unknown 97 Deceased Deceased 1 FIGURE 1.—Follow-up scheme for tracing steelworkers. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 SELECTION, FOLLOW-UP, AND ANALYSES IN THE COKE OVEN STUDY 91 employed in 1953 Population | 59072 l l V l Died while Retired Continued Left employment employed employment : 1839 7842 32263 17128 Dead Alive Alive Alive Dead Unknown ep — 1496 6346 54259 15650 1381 97 Dead > RE ——— 4716 Lost to Follow-up ~ 97 FIGURE 2.—Steelworkers employed in 1953, classified by employment status and vital status through December 31, 1961. cancer in the coke plant (4). Further comprehensive analyses (5) were conducted that confirmed an excess risk of lung cancers among black men employed at the coke ovens when compared with the mortality experience of the total steelworker cohort. An excess risk of certain digestive cancers occurred in non-oven coke plant workers, but the number of deaths was too small for adequate analysis. A summary of these findings for the original cohort is given in table 1. On the basis of these results, those conducting the study decided that it should be expanded so they could further evaluate the extent to which the coke oven population, particularly its nonwhite segment, was at risk of developing lung cancer. In 1967, a new study was initiated at 10 additional plants in the United States and Canada. Methods of data collection, processing, and analysis in this study were made TABLE |.— Observed deaths and relative risks of death from selected causes by race, 1953-61, for men employed in the coke plant before 1953 Coke ovens Non-coke ovens Total Rage Caysznbdeann Observed Relative Observed Relative Observed Relative risk risk risk White Respiratory cancer 8 1.60 3 0.41 11 0.90 Digestive cancer 6 0.98 14 1.59 21 1.41 Other malignant 6 1.03 4 0.49 10 0.71 neoplasms Nonwhite Respiratory cancer 25 2.98“ 1 _ 26 2.71% Digestive cancer 3 0.53 3 # 6 0.91 Other malignant 10 1.20 2 — 12 1.24 neoplasms Both Respiratory cancer 33 2.48¢ 4 0.47 37 1.70° Digestive cancer 9 0.76 17 1.75" 27 1.26 Other malignant 16 1.13 6 0.62 22 0.93 neoplasms “ Value is significant, P=0.01. ® Value is significant, P=0.05. SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES 92 ROCKETTE AND REDMOND as compatible as possible with the original investigation. Because primary interest was focused on coke oven workers, the criteria for inclusion in the cohort for members of this work area was expanded. Men at any of the study plants who had worked at the coke ovens at any time in the 5-year period 1951 through 1955 were added to the cohort. Men working only in the coal and coke handling or by-product areas of the coke plant were not included due to the previous observation that these men were not subject to an excess risk of respiratory cancer. At each of the study plants, a sample of other workers matched by race and starting date of employment was taken as a control. The 7 plants in Allegheny County which comprised the original cohort, 2 of which contained coke ovens, were also included in the new investigation. The mortality observation period for all plants was extended through 1966. Table 2 summarizes some of the results from this expanded cohort. The results confirmed an excess mortality from respiratory cancer among coke oven workers in other geographic areas and indicated that both white and nonwhite workers were at risk. The apparent differences between white and nonwhite oven workers in Allegheny County appeared to be attributable primarily to a lesser exposure among the whites studied. With the larger cohort, it also became apparent that genitourinary cancer showed a statistically significant excess in the coke oven cohort. Further investigation indicated that most of the excess in the category of cancers of the genitourinary system was due to cancer of the kidney. The relative risk for kidney cancer for men employed in the coke oven at any time before 1951 was 7.49 (P<0.05) based on 8 observed deaths. This cohort was subsequently updated through 1970 and at a later time updated again through 1975. The results obtained from the cohort of coke oven workers through the 1970 update for Allegheny County formed a basis for the development of a standard for exposure to coke oven emissions by the Occupational Safety and Health Ad- ministration in 1977. Unfortunately, due to limited re- sources, it was decided that the 1975 update would not include an update of the work histories during the period 1971 through 1975. The data base is presently being updated through 1982 and includes work histories as well as follow- up for the workers of the 12 coke oven plants and their controls. DOSE-RESPONSE RELATIONSHIP An important aspect of the analysis of coke oven workers has been the demonstration that excess risk from respiratory cancer is related to length and intensity of exposure. Table 3 illustrates the increase of risk from respiratory cancer for workers with a longer duration of exposure and for those who work topside. Among topside workers with 15 or more years experience, 8 of 29 workers at risk (28%) died of respiratory cancer, resulting in almost a sixteenfold relative risk. TABLE 2.-— Observed deaths and relative risks of death for selected causes by race for coke oven workers employed during 1951-55 at 10 non-Allegheny County plants and for coke oven workers employed during 1953 at 2 Allegheny County steel plants Non-Allegheny County Allegheny County All plants Race Cause of death ; i : Observed Relative Observed Relative Observed Relative risk risk risk White All causes 129 .88 56 99 185 92 Malignant neoplasms of 13 3.024 4 or 17 2.05" lung, bronchus, trachea Malignant neoplasms of 2 — 5 6.99" 7 3.49" genitourinary organs Other malignant 9 42 3 43 12 42° neoplasms Nonwhite All causes 259 1.01 145 1.03 404 1.02 Malignant neoplasms of 23 2.99% 29 3.77 52 3.35" lung, bronchus, trachea Malignant neoplasms of 10 3.02° 4 _ 14 1.60 genitourinary organs Other malignant 28 1.09 15 2 43 94 neoplasms Both All causes 388 .96 201 1.01 589 93 Malignant neoplasms of 36 3.00” 33 2.69" 69 2.85" lung, bronchus, trachea Malignant neoplasms of 12 2.42° 9 1.76 21 2.05% genitourinary organs Other malignant 37 .80 18 61 55 32 neoplasms “ P=0.05. " P=0.01. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 SELECTION, FOLLOW-UP, AND ANALYSES IN THE COKE OVEN STUDY 93 TABLE 3.— Observed deaths and relative risks of death from cancers of the respiratory system, 1953-70, for coke oven workers by work area and length of employment through 1953 Yr employed through 1953 Wotlk ated 5+ 10+ 15+ Observed Relative Observed Relative Observed Relative risk risk risk Coke oven 54 3.02° 44 3.42° 33 4.14° Oven topside full-time 25 9.19“ 16 11.79¢ 8 15.72¢ Oven topside part-time 12 2.29° 16 3.07¢ 18 4.72° Oven side only 17 1.79° 12 1.99° 7 2.00 “ P<0.01. ® P<0.05. We used several more refined approaches to investigate a dose-response relationship. A survey for exposure to CTPV was conducted independently of the steelworkers study by the Pennsylvania Division of Occupational Health. We used their results to estimate average ex- posure levels for specific jobs at the ovens. The surveyed jobs were categorized into 3 exposure groups with mean levels of CTPV given by 0.88, 1.99, and 3.15 mg/m? (11). Detailed descriptions of the coke-making environment, the work performed, and an evaluation of an industrial hygienist provided criteria for decisions regarding the category into which the 106 coke oven jobs should be placed. The cumulative exposure was calculated for each of the workers in the study group through the end of 1966. Following a preliminary analysis, the exposure range was stratified into 4 exposure intervals: less than 200, 200-499, 500-699, and 700 or more mg/m? months. The analysis indicated that for the nonwhite workers the association between level of exposure to CTPV and lung cancer mor- tality was strong (fig. 3). Using only the results from this analysis, it would appear to one that exposure at levels close to the standard would produce no excess risk because an average exposure corresponding to the lowest exposure level in figure 3 for 30 working years produces a cumulative exposure of 200 ug BSFTPM/ m3. However, this method of evaluation of dose-response presents several potential analytical prob- lems. First the follow-up and exposure index overlap. The resulting bias from this type of confounding can be avoided to some extent if you consider the workers to be at risk in different exposure groups across the observation period and thus recognize the prospective nature of exposure. A further deficiency in this analysis is that no lag time has been incorporated, which leads to the possibility of an opposite bias, i.e., overestimation of the amount of exposure-caused disease. Prior to the adaptation of the standard, investigators used other methods to calculate dose-response relation- ships with the idea in mind of estimating the minimal safe dose. Using both linear and quadratic models and assuming exposure to constant CTPV concentrations, they used life table methods to estimate lifetime excess risk to lung cancer mortality for a worker employed from age 20 to death or retirement (12). SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES The model used incorporates a lag period of 0, 5, 10, and 15 years with only partial weight given to exposures occurring during the observation period. With such a model, we determined that even exposures within the CTPV levels of 150 ug BSFTPM/m? did not result in negligible risk. For example, if you assume 150 ug BSFTPM/m3 is the average exposure, the estimated rela- tive risk under the linear model would range from 1.31 to 1.47. Nevertheless, in deciding upon the 150 ug BSFTPM/m?3 for an 8-hour period as the permissible limit, Occupational Safety and Health Administration personnel justified this as the lowest level that had been shown to be technologically feasible, acknowledging that such a level was not necessarily absolutely safe. After the adaptation of the standard, the nature of the dose-response relationship of CTPV to respiratory cancer 200 | Total mortality JOO vv Cancer (all sites Lung cancer [ o Death rate per thousand ] :! : 0 200 400 600 BOO 1000 Cumulative exposure FIGURE 3.—Age-adjusted death rate, 1951-66, for specified causes among nonwhite coke oven workers by cumulative exposure (milligrams/ cubic meter months) groups. 94 ROCKETTE AND REDMOND was investigated with the additional analyses. Such analyses have been hampered by the fact that the exposure data in 10 of the plants had not been updated past 1966, and in the 2 Allegheny County coke plants, exposure had not been updated since 1970. Presently, the coke oven workers cohort is being updated relative to both exposure and follow-up through 1982, with our primary objective being a more definitive determination of the dose-response relation- ship that exists between CTPV and respiratory cancers. REFERENCES (I) Lroyp JW, Crocco A: Long-term mortality study of steel- workers. I. Methodology. J Occup Med 11:299-310, 1969 (2) ROBINSON H: Long-term mortality study of steelworkers. II. Mortality by level of income in whites and nonwhites. J Occup Med 11:411-416, 1969 (3) REDMOND CK, SMITH EM, LLoYD JW, et al: Long-term mortality study of steelworkers. III. Followup. J Occup Med 11:513-521, 1969 (4) LLoyp JW, LunDIN FE, REDMOND CK, et al: Long-term mortality study of steelworkers. IV. Mortality by work area. J Occup Med 12:151-157, 1970 (5) LLoyD JW: Long-term mortality study of steelworkers. V. Respiratory cancer in coke plant workers. J Occup Med 13:53-68, 1971 (6) REDMOND CK, Ciocco A, LLoyD JW, et al: Long-term mortality study of steelworkers. VI. Mortality from malig- nant neoplasms among coke oven workers. J Occup Med 14:621-629, 1972 (7) LERER TJ, REDMOND CK, BRESLIN PP, et al: Long-term mortality study of steelworkers. VII. Mortality patterns among crane operators. J Occup Med 16:608-614, 1974 (8) REDMOND CK, GUSTIN J, KAMON E: Long-term mortality experience of steelworkers. VIII. Mortality patterns of open hearth steelworkers (a preliminary report). J Occup Med 17:40-43, 1975 (9) MAZUMDAR S, LERER T, REDMOND CK: Long-term mor- tality study of steelworkers. IX. Mortality patterns among sheet and tin mill workers. J Occup Med 17:751-755, 1975 (10) RockeTTE HE, REDMOND CK: Long-term mortality study of steelworkers. X. Mortality patterns among masons. J Occup Med 18:541-545, 1976 (11) MAZUMDAR S, REDMOND CK, SOLLECITO W, et al: An epidemiological study of exposure to coal tar pitch volatiles among coke oven workers. J Air Pollut Control Assoc 25:382-389, 1975 (12) MAzZUMDAR S, REDMOND CK: Evaluating dose response relationships using epidemiological data on occupational subgroups. In Proceedings of the SIAM Institute for Mathematics and Society (Breslow N, Whittemore A, eds). Philadelphia: SIAM, 1979 Statistical and Practical Problems of Cohort Study Design: Occupational Hazards in the Health Care Industry 2 Jeanne M. Stellman ? ABSTRACT —Many populations are exposed to health hazards, particularly workers in the health care industry. Yet practical reasons make it impossible or unfeasible for investigators to meet the technical requirements of the cohort method. One such experience is detailed of hospital workers who were members of a large health care workers’ union. Given the fact of exposure to known or suspected hazards, two strategies are urged: 1) projec- tion of work toward adoption of rules regarding organizational settings that would make cohort investigation practical when necessary, and 2) development of alternate means by which work can be assessed when cohort analysis cannot be realistically conducted.—Natl Cancer Inst Monogr 67: 95-100, 1985. The success of a cohort study in answering research questions in occupational health will depend on whether sufficient data are available on the population-at-risk, its health outcomes, and its history of occupational exposure. It will also depend on whether the study or “exposed” population is sufficiently large. Availability of data for these factors, i.e., population, outcome, and exposure, is subject to a wide variety of demographic, social, economic, political, and scientific factors, many of which are usually outside the control of the investigator. In this paper, I will attempt to provide insight into some of the variables which bear on data availability for an industrial cohort study. Many examples will be drawn from ongoing epidemiologic, educational, and industrial hygiene work by our research group on the health and well-being of health care workers and on the effects of exposure to ethylene oxide. In addition to serving as a convenient model, the consideration of health care industries here is relevant because of the large number of workers involved and the potentially serious hazards they face. More than 7 million people were employed in health care facilities in the United States in 1981. Employment in the health care sector represented approximately 8% of the total employment in the United States (/). Evidence continues to be gathered from investigations of workers and working conditions in hospitals that establishes that many jobs in the hospital are potentially hazardous. I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Supported in part by Public Health Service Academic Award 5-K07-CA00730 from the National Cancer Institute and grant R808429 through the Environmental Protection Agency Coopera- tive Agreement with Columbia University. 3 Division of Health Administration, School of Public Health, Columbia University, 21 Audubon Avenue, New York, N.Y. 10032. Among the potential health hazards in hospitals are several recognized human carcinogens, including ionizing radiation and asbestos. Highly mutagenic and reactive substances which have been found to be carcinogens in laboratory animals are also present. Ethylene oxide, a gas used for room temperature sterilization, is an example of such a substance. Conditions and substances toxic to normal human reproduction, such as anesthetic gases and some infectious agents, can be found in most hospitals. Some of the occupational health hazards in hospitals are listed in table 1. Epidemiologic and clinical data support the hypothesis that some types of hospital work can be hazardous. Radiologists were among the first occupational group to have been found to be at risk for occupational cancer. Excess leukemia was observed and attributed to their exposure to ionizing radiation (3). Those who conducted cohort studies on nurse anesthetists and anesthesiologists found an elevated rate of spontaneous abortion, congenital abnormalities in offspring, and hepatic and renal disease. Some investigators have suggested that an excess of leukemia and lymphoma has been detected in this occupa- tional group, whereas others found no elevated risks for cancer (4-6). Halogenated ether anesthetics are mutagenic in in vitro assays (7). Increased mutagenic activity in the urine of nurses who administer anticancer chemothera- peutic agents has been documented, which indicates the possibility of an inadvertent occupational exposure to these potent drugs, albeit at low doses in comparison to those given to patients (8). Industrial hygiene work by our group determined that the chemotherapeutic agents can be spread to work and body surfaces during administration (9). Recently, Dr. Theresa Schnorr and I found that cancer is the only proportionately elevated cause of death among a cohort of hospital workers that included many nonprofes- sional employees. These workers were members of a large health care workers’ union and belonged to its joint labor-management benefit fund. The findings in these various studies led us to consider the practicality of follow-up, using the cohort study method to explore these hypotheses further. The signifi- cance of the health care industries both from the large num- bers of employees and the potential hazards adds to the logic of our exploring the question of designing an occupa- tional cohort study of health care workers. REQUISITES FOR A COHORT STUDY The basic requisites for a successful occupational cohort study are: 1) identification of a population-at-risk and articulation of a hypothesis about its exposures and risks; 05 96 STELLMAN TABLE 1.— Examples of chemical hazards in health care’ Some possible locations Hazard and examples Substances Anesthetic gases Cancer chemotherapeutic drugs Ethylene oxide Formaldehyde Operating room, recovery room Patient care areas Central supply, operating room Clinical laboratories, pathology, necropsy Operating room (orthopedic surgery), clinical laboratories Patient care areas, laboratories Throughout facility Methyl methacrylate Radioisotopes Cleaning solutions Others Infections Tuberculosis, hepatitis B, chickenpox, influenza, herpes, mumps, rubella Back injuries, puncture wounds, assaults by patients, electrical accidents Contact with drugs, food products, and chemicals exacerbated by frequent hand-wetting Patient contact, long working hours, rotating shifts, hierarchical structure, insufficient staff Injuries Dermatitis Stress “ See (2). 2) sufficient numbers of individuals to study; 3) identifica- tion and quantification of the occupational exposures of the cohort members; and 4) quantification of the health experience of the cohort for the outcome under study with an adequate follow-up mechanism. COHORT IDENTIFICATION AND DATA AVAILABILITY Several potential sources of data are available for cohort definition and follow-up. Among the major job categories in health care given in table 2, cohorts for the professional job categories could be defined through use of membership rosters of professional associations, state and local licensing boards, alumni rosters of professional schools, and indi- vidual hospital employers. Nonprofessional employees may be assembled through employment records in individual institutions or through membership in a union or other employee association. Once defined, data on the cohort members may be obtained by several methods. One method that has been used is the survey questionnaire. Several studies have been successful at initial assembly of cohorts through profes- sional organizations. However, investigators’ efforts at enlisting high levels of participation have not been uni- formly successful. In our recent work on the health effects of exposure to nonionizing radiation among male physical therapists who are members of the American Physical Therapy Association, a 70% response rate was achieved with a cross-sectional mailed survey, with 2 follow-up mailings (/0). A cross-sectional survey of health workers exposed to anesthetic gases was done in cooperation with the American Society of Anesthesiologists, Association of Nurse Anesthetists, and the Association of Operating Room Nurses and Technicians. The response rates in per- cent for each organization were 67, 76, and 65 for males, and 59, 54, and 55 for females, respectively. The control cohorts were contacted through cooperation with the American Academy of Pediatrics and the American Nurses Association. The response rates for males and females were 41 and 72, and 44 and 429%, respectively (4). The ability to elicit the cooperation and participation of professional association members will be a limiting factor for anyone conducting a cohort study designed to use this data base. Another method for gathering data on the outcome variable under study is through sources other than the cohort members. The Social Security Administration would be a useful source of data on vital status, if employer records or other rosters were available for identification of the initial cohort. Data on end points other than mortality and on cause of death would not be available. In assembling a cohort of nonprofessional health care workers, one must usually rely on data gathered directly from their employers, except possibly for workers who belong to a labor organization or similar association. In the Stellman study of nonprofessional health care employees, in collaboration with Dr. Theresa Schnorr, a large union representing approximately 150,000 members employed in hospitals and other health-related facilities was a unique source of data. Most of the union membership’s health and life insurance plans and pension and death benefits derive from a self-insurance program managed by a Health and Welfare Benefit Fund. Most hospitals and health care institutions whose employees are represented by the union contribute to the Fund for all health, pension, and death benefits. Members who leave the employment of a con- tributing institution may elect to continue to be insured by the Fund, and members whose employers do not contribute may belong to the Fund individually. The Benefit Fund is a separate entity from the union and is governed by a board composed of labor and management representatives. The Benefit Fund maintains records, some of which are computerized and all of which are stored in a warehouse, on all members past and present. TABLE 2.— People employed in health care in 1981 Occupational categories No. Physicians, dentists, and related practitioners 828,000 Registered nurses 1,339,000 Therapists 251,000 Health technologists and technicians 643,000 Nursing aides, orderlies, and attendants 1,131,000 Practical nurses 403,000 Other health aides 317,000 Dental assistants 143,000 Health administrators 219,000 Maintenance and manual service workers ? “ See (1). Total number employed in health care industry amounted to 7,661,000, which is 89% of total workforce. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 COHORT STUDIES: HEALTH CARE INDUSTRY 97 Data contained in the Benefit Fund include records of all claims made, including diagnoses, hospitalizations, births, deaths, and disability. Each diagnosis is coded under a modification of the eighth revision of the International Classification of Diseases (Adapted). However, the coding clerks are not extensively trained in disease codification. Release of death benefits is contingent upon presentation of a death certificate to the Fund, and this too is maintained in the member file. The Fund also keeps a separate index card file of all deceased members. These sources can be searched either manually or through the modified diagnosis codes which are all computerized. Thus with a good deal of effort, one can ascertain patients with a particular diagnosis or members who have died while still associated with the Benefit Fund. However, one cannot define a cohort from these records because they reflect only those members who filed a claim during any period and not all eligible members. Recently, we obtained a computerized membership file from the union. To date this is the most complete computerized archive of all members of the union, past and present. It appears to be the only centralized source of data we have available for defining the cohort for a follow-up study. We did a computer analysis to determine whether the records of each of the 2,565 members who had died during the period from 1973 to 1979 and whose deaths were ascertained from the Benefit Fund’s index card file of deaths were also present in the membership file. A concordance of only 50% was found between the union’s membership file and the Benefit Fund’s mortality file. Because no systematic gaps in the concordance between the 2 files were found, we believe that the records of deceased members were not handled systematically by the union. This is not surprising; unlike the Benefit Fund which has the legal obligation to provide benefits to dependents and maintain addresses of retirees for pension purposes, the union provides services only to the living. Thus even in a central membership record, a large enough loss of data has occurred to require that a manual search of all records be done so that a cohort of past employees could be assembled. However, it is not clear if sufficient records have been maintained for accomplish- ment of that task. Furthermore, no current addresses would be available, nor would it be known whether the member was living or dead if the death was not recorded in the Benefit Fund file. In addition to the major clerical effort required, a substantial loss of cohort members would have to be anticipated. Despite the limitations described here, centralized records appear to be a superior data source to other available sources. However, it is estimated (Research Department, National Union of Hospital and Health Care Employees: Personal communication) that only 15% of the nonprofessional workforce belongs to a labor union, which is typical of other areas of work (less than 20% of the workforce in the United States belongs to a labor organiza- tion). For the vast majority of employees, records on nonprofessionals who are not union members would have to be assembled with the cooperation of the individual employer. We can make some estimates of the likelihood of success SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES of conducting a study based on data obtained from the employer. During the course of our ongoing study, 46 hospitals and nursing homes were approached for informa- tion on the history of use of ethylene oxide sterilizers in their facilities. An explanatory letter, background informa- tion on the study, subjects’ approval, and approval by the joint labor management board of the Benefit Fund were all presented to the institution. A second letter and a phone call followed. Among the 46 facilities, the following responses were obtained: Responses No. Percent Ethylene oxide units in use Now 16 62 Formerly | 4 Never 9 35 Questionnaires Returned 26 57 Not returned 12 26 Refused 8 17 Based on the lack of responsiveness by the employers, it is reasonable for us to assume that a researcher endeavoring to elicit cooperation from individual employers in the health care industry may encounter significant difficulties, difficulties perhaps severe enough to make it impossible to do the cohort study. The assumption is strengthened when we note that the employers in our study were already part of a collective bargaining agreement with the union and were already providing extensive information to a “central” source. The employers were therefore sensitized to similar requests. In addition, the research team came from a prestigious university and medical center that maintained close contacts with many of the employing institutions. Apparently, despite the positive factors that would ostensibly maximize the likelihood of cooperation, such cooperation was difficult to elicit. Legal issues about liability and culpability under federal health and safety laws and other concerns were the stated reasons for their refusal to cooperate. The difficulties in employer-employee rela- tions are an important aspect of most studies of occupa- tional health hazards. Researchers must take into account the highly politicized nature of the problem that exacer- bates all the other practical problems described here in management of a cohort study. Once a cohort is successfully defined, the investigating team’s ability to gather the necessary data on cohort members will be affected by the recordkeeping practices of the hospital, association, or state agency holding the roster. If records of terminated, retired, or deceased employees are not retained or kept up-to-date, the tasks of cohort identification, data gathering, and future follow-up will also be greatly complicated, if not impossible. ESTIMATES OF EXPOSURE An essential element of any occupational health cohort study is the linking of health outcome with working conditions. It should be ascertained that cohort members are indeed exposed to the conditions under study. The nature, level, and duration of exposure are also needed so that appropriate dose-response estimates can be made. The 98 STELLMAN potential health hazards in the hospital are many and varied (table 1), as is typical of many working environ- ments. Some examples include: 1) Members of the study cohort who hold the same or similar job titles could be potentially exposed to several different hazards. 2) Environ- mental hazards if present simultaneously may act synergistically. 3) The nature and level of exposure to any one hazard may vary from day to day, month to month, and year to year. 4) Most exposures will be poorly defined and the chemical identity of many compounds unidentified. 5) Few or no records will have been kept on the exposures. 6) Few, if any, linkages will be readily available between any particular exposures and the workers who perform the tasks. Ionizing radiation may be an exception to these possi- bilities because employees with known exposure are re- quired to be provided with a film badge or other personal exposure monitor at all times. Unfortunately, despite the potential utility of this data source for further elucidation of the effects, if any, of occupational exposure to ionizing radiation in health care settings, no systematic cohort studies have been done nor has any central registry been maintained of even a segment of the health care population assigned film badges. Of significance here is that many workers, such as laundry workers, who are not directly assigned to tasks handling radioactive substances or patients, may also be occupationally exposed to radiation from laundry contaminated by patients who received radioactive implants or by other wastes, but who have not been provided with radiation monitors. Such “bystanders” to occupational exposures are common throughout industry. Ethylene oxide, an excellent example for the application of the generalizations made above, is a widely used gas sterilant for nondisposable heat- or water-sensitive equip- ment. Ethylene oxide sterilizers may be found in the central supply facilities of many institutions. They may also be located in areas like the operating room; cardiac catheteriza- tion laboratories; ear, nose, and throat clinics; dental clinics; urology clinics; inhalation therapy laboratories; tissue banks; intensive care units; and laboratories where routine tests are performed. Health care workers in many job categories may be bystanders to exposure to ethylene oxide. On the other hand, workers with whom the main responsibility for gas sterilization rests may be exposed simultaneously to other toxins, such as highly reactive glutaraldehyde, formaldehyde, other sterilants, or to UV sterilization processes. Another complicating feature for cohort study will be that the use of ethylene oxide sterilizers will vary from institution to institution. Thus in any cohort which consists of members from many institutions, the certainty of exposure to this agent must be carefully corroborated. Finally, and again typical of many industrial situations, the conditions encountered in the workplace in past years may no longer apply when measurements are made by the research team and, if the study is prospective, may no longer hold in the future. This should be particularly true for studies in which there is current evidence of adverse effects, e.g., in regard to ethylene oxide, extensive sterilizer modification programs and other controls are mandated by TABLE 3.— Summary of sterilizer conditions and results: Survey No. 1° Upper probe Lower probe Mean Peak Mean Peak level, time, level, time, ppm sec ppm sec 1 1 1 1 10 150 1 1 1 1 1 1 12 23 1 1 4 23 1 1 6 28 1 1 “ See (12). the Occupational Safety and Health Administration in regulations published in 1984 (11). It should be assumed that exposures to ethylene oxide will be at or below a level of 1 part per million after this standard is in place. The primary exposure occurs during the transfer of sterilized materials from the sterilizers to the aerators where they undergo an air purge for several hours and during the changes in the cycles of the sterilizing process when the ethylene oxide is drained. The resultant exposures to the operators will be variable, however (tables 3, 4). Another interesting industrial hygiene aspect of the problem for one to note is that for several years the sterilization process did not include aeration; therefore, it is likely that some workers in the past may have been heavily exposed to ethylene oxide gas from handling the unaerated items. The exposures to ethylene oxide in the hospital can be contrasted to those industrial cohort studies which have been successful in establishing a firm link between a particular exposure and an outcome (/3). Asbestos ex- posure among a cohort of insulation workers, ionizing radiation among radiologists, vinyl chloride among vinyl chloride tank cleaners, and radon among uranium miners are some examples. In each instance, an intense and unique exposure was identifiable. Workers could clearly be iden- tified as primarily and consistently working with a defined exposure at levels higher than those likely to be encountered in virtually any service industry setting, such as health care. OUTCOME MEASURES The practical problems associated with accurate ascer- tainment of outcomes in any cohort study will apply to the cohort study designed around occupational hazards. How- TABLE 4.— Representative ethylene oxide levels from gas sterilizers” Sterilizer Location of monitor Ethylene oxide size, ft level, ppm 8.8 In front of open door 50 24 non om ” 150 24 noon nom ” 40 24 Room ambient air 0.1 “ All measurements were taken immediately upon completion of the gas sterilization cycle after normal purging of ethylene oxide from the sterilizer (Stellman JM, Aufiero BM: Unpublished survey results). NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 COHORT STUDIES: HEALTH CARE INDUSTRY 99 ever, an additional problem is associated with the question of the appropriate comparison group. The “healthy worker effect” is a term commonly applied to one aspect of the problem of control groups. This effect has already been widely discussed (1/4) and refers to the fact that the working population tends to be healthier, as measured by most health outcomes, than the general population which contains those not healthy enough to be employed. The current working population is a “survivor” population because only those healthy enough to hold a job will be assembled into such a cohort. It will usually take many years of observation in an industrial population before their health outcomes begin to approach the general population. One option often used by researchers in overcoming the healthy worker effect is the comparison of one working population to another. The use of a second industrial comparison is always fraught with the danger that the comparison group is also an exposed population, albeit exposed to different substances or conditions. An example previously discussed is the extensive research on coke oven workers which established their high levels of risk. In those studies, non-coke oven steelworkers served as the controls. In an analysis of the lung cancer risk of the steelworker comparison group, Kabat and I (15) found steelworkers to be at a lung cancer risk equivalent to a 2-pack/day smoker. Thus the extremely elevated risks observed for the coke oven workers may be an underestimate of true risk. STATISTICAL LIKELIHOOD OF OBSERVING AN EFFECT Ultimately, the utility of an industrial cohort even in a well-defined and followed cohort with specific known exposures will depend on the investigator having numbers great enough to provide the statistical power to detect an effect. In the Stellman and Schnorr study (unpublished), a disproportionately elevated rate of leukemia and lymphoma was observed. To test whether the rates of leukemia and lymphoma that can be attributed to occupational exposures are truly elevated, one would require approximately 100,000 person-years of exposure in the study group to detect a relative risk of 2.5, «=.05, B=.2 (assuming an age-adjusted rate in the unexposed population of about 17/100,000). In 1970, approximately 52,000 radiologic technicians were working in the United States according to the Commerce Department (I). A cohort study on the risk for excess leukemia and lymphoma among this group would require a minimum of 2 years if every technician were successfully enrolled in the cohort. An estimate of 2 years of observation for the study is a “best-case” analysis for several reasons: 1) Many of the technicians will have been employed for less than 5 years and thus could not have been occupa- tionally exposed long enough for the disease to have developed. The exposed population at any time will include new employees and will lose older, already exposed members for reasons related and unrelated to health. 2) Many of the technicians will have only minimal or no exposure to radiation. SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES 3) The age-adjusted rate is not equivalent to the age- specific rate. The average age of most occupational cohorts particularly in the technical professions is young, and hence the rate of the unexposed population will be less than in the approximations calculated here. 4) It is improbable that all the technicians could successfully be enrolled in the study. If we make similar approximations for estimating the study period needed to explore the possible relationship between nasal cancer and formaldehyde or between hepa- titis B and liver cancer among the approximately 118,000 clinical laboratory technicians, the years of observation required for successfully completing a cohort study become even greater. Assuming an age-adjusted rate of about 2/100,000 for each of these rare sites of cancer, we can calculate that it would require almost 5 years of observation of all the technicians in the United States to detect an excess risk of about 2.5, assuming exposure among the entire group and that, at the outset of the study, each member of the cohort had been exposed for a sufficiently long period to allow attribution of occupational environ- ment of any cancers observed. Neither of these presump- tions is justified. At least a 10-year study appears to be a more realistic estimate. The implications of the need for a 10-year study are profound, even if one were to obtain the sponsorship of an agency for such a long study, which in itself is not likely. It is almost certain that the Occupational Safety and Health Administration’s new standards will drastically limit ex- posures before the 10-year study would be completed (17). It is also apparent that a vaccine for hepatitis B will soon be produced at low enough cost to be available to most health technicians who would be considered among the highest priority recipients. Should these 2 events take place before the accrual of sufficient person-years of observation, the study would never be successfully brought to a conclusion. Interference with a study design from such social factors is to be expected for most research which appears to be establishing a human risk. Detection of an effect increases greatly once an outcome other than cancer or other chronic disease with similar statistical characteristics is considered. Adverse reproduc- tive outcomes are one example. The frequency of adverse outcomes from pregnancy is much greater than for cancer. The average couple can be expected to produce approxi- mately 2 births per lifetime, and birth defects occur with a frequency of about 2/100. Neural tube defects, which are much more rare, occur with a frequency of approximately 0.01-1.0/100 births. The rate of all cancers is about 120/100,000 and that of individual sites is much lower. In addition, the latency for a reproductive event can be as brief as 9 months. Thus one’s ability to detect an effect is orders of magnitude greater for reproductive outcomes than for chronic diseases such as cancer because the observation period is so much smaller as are the numbers needed for observation. The smaller study populations and observation periods for outcomes (such as reproductive outcomes) permit the researcher to be more selective in choosing appropriate comparison groups as well. A major drawback to all this is that studies of outcomes 100 STELLMAN such as reproductive events entail a much smaller number of years of exposure, so that the effects of low doses over long periods may not be apparent or may not even be occurring. Thus outcome of reproductive experiences may not provide a model for the scientific questions relating to long-term low level exposure to toxic conditions. In addition, some agents may produce no toxic reproductive effects at all and yet represent a health hazard involving different toxicologic mechanisms for the adult worker. Other end points for occupational cohort studies includ- ing use of biologic markers, such as chromosomal aberra- tions, body fluids, adducts to proteins, and DNA, have been widely discussed and would also be expected to yield a greater statistical power. However, the essential links between the presence of the markers and the actual development of disease, disability, or death remain to be made for virtually all these end points, thereby limiting their utility at the present time. CONCLUSIONS Cohort studies have clearly played a major role in demonstrating that exposure to certain occupational en- vironments can and does result in the development of occupational diseases such as cancer. However, the practi- cality of the cohort approach to the study of chronic diseases in many occupational settings is limited. Drawing upon research experiences in the health care environment, I have described the various pitfalls that can arise during the design, implementation, and interpretation of cohort studies. These include the inability of investigators to define and follow a cohort accurately, poor or nonexistent data on exposure, inconsistent or mixed exposure histories, and insufficient numbers of workers exposed with respect to statistical power, as well as limitations in the assemblage of an appropriate comparison group. It is essential that the researcher consider each of these potential pitfalls before undertaking a costly and time- consuming cohort study. One cannot avoid drawing the conclusion that cohort studies will frequently be unsuccess- ful unless they involve massive record linkage and nation- wide cooperation. One can also conclude that it is imperative for the public’s health and for the prevention of occupational diseases in the future that a national plan be developed to permit such record linkage to be accomplished outside the private employment sector. Social Security records are one potential source for such linkage because a cohort can both be defined and members located through its records of employers and their employees. Identification of certain processes by researchers as hazards to be studied could be linked to employer identification and to identification of all past and present employees who could then be contacted through the Social Security Administration records on the individuals. One can also reasonably conclude that some- times, no matter how complete the records, epidemiologists using classic cohort study design will not be able to provide the evidence needed for disease attribution. In such instances, animal studies, chemical structural correlations with substances of known toxic properties, and other similar evidence, as well as case-control studies, may be useful for estimation of the risk and for development of appropriate preventive responses. I emphasize that these methods have limitations also. The ultimate social utility of an industrial cohort study is for knowledge to be provided about potential risk factors so that further exposure is prevented and clinicians can diagnose as occupational diseases those illnesses that may have already occurred. The great majority of occupational exposures identified as risk factors can be eliminated through appropriate technology. When an authority determines that cohort studies cannot be performed, then the inability to design a study successfully can itself be incorporated into the prevention strategy. That is, many current strategies require or propose that human evidence of ill effects be established before standards are promul- gated that limit exposure levels. Knowledge of the limits of a cohort study in providing such evidence could modify this requirement and lead to the acceptance of alternate methods of risk assessment and in development of policy. REFERENCES (I) Bureau of the Census: Census of Population, 1970. Detailed Characteristics. Final Report PC(1)-D1. Washington, D.C.: U.S. Govt Print Off, 1973 (2) STELLMAN JM: Women’s Work, Women’s Health: Myths and Realities. New York: Pantheon, 1983 (3) MARCH HC: Leukemia in radiologists. Radiology 43: 275-278, 1944 (4) CoHEN EN, BROWN BW, BRUCE DL, et al: Occupational diseases among operating room personnel: A national study. Anesthesiology 41:321-340, 1974 (5) EDLING C: Anesthetic gases as an occupational hazard: A review. Scand J Work Environ Health 6:85-93, 1980 (6) Low S: Mortality experience among anesthesiologists. Anesthesiology 51:195-199, 1979 (7) BADEN JM, KELLY M, WHARTON RS, et al: Mutagenicity of halogenated ether anesthetics. Anesthesiology 46: 346-350, 1977 (8) FaLck K, GROHN P, SORSA M, et al: Mutagenicity in urine of nurses handling cytostatic drugs. Lancet 1:1250-1251, 1979 (9) AUFIERO BM, STELLMAN JM, TAUB RN: A novel approach in air sampling cancer chemotherapeutic agents. In Progress in Cancer Control. II. A Regional Approach (Mettlin C, Murphy G, eds). New York: Alan R. Liss, 1983 (10) STELLMAN JM, STELLMAN SD: Health effects of radio- frequency radiation in a cohort of physical therapists. Am J Epidemiol 112:442, 1980 (11) BARKO N (ed): OSHA proposes new EtO standard. Women’s Occup Health Resource Center News 5:1, 1983 (12) KorPELA DB, McJiLtoN CE, HAWKINSON T: Ethylene oxide dispersion from gas sterilizers. Am Ind Hyg Assoc J 44:589-591, 1983 (13) SCHOTTENFELD D, HAAS J: Carcinogens in the workplace. CA 29:144-168, 1979 (14) Fox AJ, CoLLIER PF: A survey of occupational cancer in the rubber and cablemaking industries: Analysis of deaths occurring in 1972-1974. Br J Ind Med 33:249-264, 1976 (15) STELLMAN JM, KABAT G: An assessment of the health effects of coke oven emissions germane to low-level ex- posures. Washington, D.C.: Environmental Protection Agency, 1978 Discussion Ill 2 D. Schottenfeld: A methodologic issue suggested by Dr. Jeanne Stellman’s presentation is the relationship of a standardized PMR analysis to the planning and conduct of a cohort study. Two commonly used risk measures of differential mortality in epidemiologic studies are the PMR and the SMR. As in the standardization procedure used to obtain an SMR, one can calculate a standardized PMR using the age-, sex-, race-, cause-specific mortality data of a standard population. The SMR is the statistic of choice when the charac- teristics of the population-at-risk are known. On the other hand, the PMR is useful for generating hypotheses about cause-specific risks when the available data consist only of deaths without knowledge of the characteristics of the population from which the data were derived. The cause- specific PMR is the ratio of the proportion of deaths from a specific cause in an exposed group to the corresponding proportion in an unexposed group and adjusted for age, sex, race, and other confounding variables. As noted by Kupper et al. (1), the cause-specific PMR may be used to estimate a cause-specific SMR under the assumption that the overall death rate is equal in each age group in the study and the comparison groups. If this assumption is untenable, then an elevated PMR may reflect a real difference in the risk of the cause-specific mortality or a significant decrease in risk, or both, for one or more other causes of death. The PMR analysis may be particularly useful when true differences in mortality exist for only | or 2 causes of death and when the larger residual mortality pattern for all other causes is similar in the study and comparison groups. Wong and Decoufle (2) explored the applicability of the PMR as a measure of risk in occupational studies. The cause-specific PMR equals the cause-specific SMR when each age-specific, all-cause SMR of the study group is equal to 1 (100%). Therefore, as a generalization, the cause- specific PMR may be converted into the cause-specific SMR as follows: Cause-specific PMR = cause-specific SMR/ all-cause SMR. The PMR will overstate risks when the study group’s overall mortality is lower than that of the comparison group, and, conversely, the PMR will underestimate risks when the study group’s overall mortality is higher than that of the comparison group (3). The utility of the PMR to provide early warning signals in an industrial surveillance ABBREVIATIONS: PMR=proportionate mortality ratio(s); SMR =standardized mortality ratio(s). I Conducted at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Address reprint requests to Lawrence Garfinkel, Epidemi- ology and Statistics Department, American Cancer Society, 4 West 35th Street, New York, N.Y. 10001. program will depend on the configuration of the age- specific mortality rates for other causes in the study and comparison groups, the magnitude of the true difference in risk in cause-specific mortality, the valid ascertainment of all deaths, and its capability to relate deaths to measures of exposure. J. Stellman: We received the entire membership file from the union, and that in itself is a tremendous act of trust for a union to turn over its entire computerized file. We must remember these files are maintained not by scientists but by the benefit fund and union for their specific purposes. In our analysis, we could only find a 50% match between the deaths that we located in the union mortality file, which was a manual file of index cards for every death certificate submitted for death benefit payment. Each of the deaths is supposedly “archived” onto the membership file that would be especially useful for a mortality study on such a transient group. P. Landrigan: In regard to proportionate mortality studies, I think none of us would pretend that PMR studies provide the definitive answer to anything. However, they do have a place in a hierarchy of studies that you can do to evaluate etiology: the place somewhere between the non- value of an individual case report and the goal standard of the cohort mortality study followed for a sufficient number of years to provide good information on latency. The great advantage I see in using proportionate mortality studies as a first tool to lever into a data set is that they can be done quickly. They provide a means for rapid response to some perceived crisis or for the quick assessment of some suggested hypothesis. We find that, if we obtain a proper set of data we can, with concentrated effort, do a PMR study in a matter of weeks. With all the resources we have available, it takes anywhere from 2 to 3 years to do an SMR study. S. Jablon: May I ask whether the fact that you could do it in a hurry is necessarily an advantage, especially for occupational studies? We all know about the healthy worker effect which is much stronger for diabetes and cardiovascular diseases than it is for cancer. If you do a proportionate mortality analysis on an employed popula- tion, you are almost guaranteed a high PMR for cancer mortality. The great danger then is that what you are going to do is scare a lot of people for no good reason; that is just a fact. Landrigan: 1 think the only way to answer that is to apply wisdom in the interpretation of your data. If you come up with a ratio of 1.5, even 2, I think it well behooves one to be cautious in its interpretation. If you come up with the higher ratio, you probably found something. Jablon: If you can persuade newspaper reporters that a PMR of 2 is not worth reporting in big headlines, that is fine, but even investigators are not always as cautious as you recommend. Dr. Selikoff made a point about the accuracy of death certification that is worthy of comment. 101 102 DISCUSSION 111 It seems to me that the accuracy of death certification for individual cancer sites is really the Achilles’ heel of the, cohort study. What are we going to do about it? We are not going to get autopsies on a large fraction of any cohort, and we do know, or think we know, that cancer is not a single disease but a collection of diseases. Furthermore, we want to know about the individual diseases not about just cancer, which is perhaps fairly well diagnosed on death certificates. I wonder if it is conceivable to mount a decent study of the accuracy of death certificates with respect to cause of death. There have been several studies, but all those that I know suffer from the same defect. They are based on particular autopsy series, and every autopsy series is a selected one. They include many cases which gave the attending physician a problem or represented diseases which commanded a particular interest. Is it conceivable that one could get a reasonably large unselected autopsy series that would permit an evaluation of accuracy of death certification for particular kinds of cancer in which we have interest, so that we would know which sites can be analyzed and which cannot? I see we have in the audience representatives of the Government and of other agencies who are in possession of vast sums of money. I wonder if this is something that can be accomplished. H. Rockette: I would like to make a point that is often overlooked in regard to the desirability of autopsies to confirm causes of death. From a scientific standpoint, a firmer diagnosis is desirable, but I sometimes believe people overemphasize the bias that results in many epidemiologic studies. Clearly, death certificates are insufficient for deter- mining incidence or prevalence for a disease, but in most occupational mortality studies, we compare cause-specific mortality for the study group with a control group. If the percentage of incorrect ascertainment in the study and control groups is the same, no bias results. For example, in the steelworkers cohort, it is not clear that in the original study the diagnosis would be different for men employed in the coke ovens than it would be for other steelworkers. If the association of disease and exposure has been highly publicized before the study was conducted, and particularly if development of a disease may result in financial remuneration, then a comparison may be biased. However, my point is that a certain percentage of misdiagnosis does not by itself invalidate the results of comparisons of mortality patterns in 2 groups. Jablon: May I point out a particular case? Consider cancer of the pancreas. In recent years, we have heard a lot of discussion about various factors in relation to cancer of the pancreas; my impression is that, based on death certificates, you just cannot really study this cancer. Rockette: I agree with you up to a certain point. Better diagnosis is certainly desirable. However, I still believe the number of times cancer of the pancreas appears on the death certificate is related to the actual occurrence of the disease. If a misdiagnosis occurs a certain percentage of the time, the conclusion relative to an excess or deficit in the study group will still be valid if the misdiagnosis in the study and control group is comparable. In fact, better diagnosis in the study group (by the epidemiologists requiring autopsies) might well introduce a bias if no autopsies are required in the control group. I would also like to comment in regard to the previous discussion on a PMR. One problem I think was overlooked by some of the speakers defending PMR studies is that you obtain only those deaths known to the employers. Thus you not only have the potential bias of the healthy worker effect, but you have a potential bias as to what deaths are known to the employer. This bias can lead to over- ascertainment or underascertainment for specific causes. For example, in the study on steelworkers, a larger proportion of cancers was unknown to the employer than of cardiovascular disease. A PMR study would have resulted in an underascertainment for cancers. However, we have done this same investigation for other cohorts, and a general statement is not possible because sometimes the cancers are underrepresented and sometimes they are overrepresented. The primary point is that deaths known to the employer, which is the factor most PMR studies use, may not be representative of the overall mortality experi- ence. For this reason, I think it is difficult to justify the PMR as a screening device. I am not sure we are better off doing many “quick and dirty” studies instead of fewer well- designed studies. J. Higginson: The kind of study that you mentioned has been conducted in Sweden where all deaths were autopsied in a specified period in Lund. I think we should stick to the word “occupation” rather than the old-fashioned term “worker.” Pathologists have been exposed to a wide range of potential chemical exposures, etc., and yet they are not legally workers. Now I suspect that in the long run the real problem in a cohort study will be the absence of a health effect. If so, how will you convince members of the study group that they are not at an unusual risk of cancer? Development of the appropriate cohort studies in such groups will be most difficult, although as seen in the occupational mortality study done in the United Kingdom, the differences in health experiences were enormous. In regard to overall death, a doubling was noted in social class 5 compared with social class 1, which implies an enormous burden of disease. I think we should be most conscious of this fact. In fact, unemployed workers since the time of Farr have shown far worse health experiences than the employed. Thus from a practical point of view, more attention should be paid to the importance of negative findings and whether concentration on probable negative studies may divert research resources from the study of greater cancer problems. It is a priority decision. We do not have the organizational structure to tackle large socioeconomic and occupational studies at the same time, and perhaps we are concentrating on the wrong type of cohort, on big industry rather than small. N. Mantel: I have been involved in 2 studies in which we use age-adjusted PMR. The proportionate mortality is used in instances when you just cannot help yourself. One of them involved work with Drs. Frederick Li, Joseph Fraumeni, and Robert Miller, and that was the work on cancer in chemists. The other was a study relating to breast cancer and the use of antipsychotic drugs in a mental NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 DISCUSSION 111 103 institution. I can say one thing in favor of the use of proportionate mortality. I am always concerned with age adjustment, especially in cancer. You should have the age intervals comparatively fine, but when you use the PMR, it is not so important to have fine age adjustments because as cancer rates rise sharply with age, so do total death rates. There is a self-adjusting feature in that. In fact, in connection with the work on the major antipsychotic drugs and breast cancer, we did not use information on just who received the major antipsychotic drugs. Our information on the makeup of the populations was not good, so we had to differentiate between breast cancer occurrence before and after the advent of the major antipsychotic drugs. I think we would not have encountered healthy worker effects in studies like this, but PMR have some value and feasibility as does use of age adjustments. You do not need fine age groupings. L. Kurland: I would like to address the point raised by Mr. Jablon on the accuracy of death certification. If a high . percentage of deaths also have autopsy confirmation as in Malmo, Sweden, or Olmsted County, Minnesota, the death certificates prepared by the pathologists will have a different order of accuracy than those prepared by clini- cians. Accuracy of death certification also varies with place of death and the age of the decedent. Deaths occurring in hospitals are more likely to have the clinical details and supportive data with which an accurate certification is formulated than those occurring elsewhere. Age at death often influences place of death; the elderly more often di¢ in nursing homes or at home than do younger persons. With respect to brain tumors, the diagnosis of a person who dies at home or in a nursing home, even though previously diagnosed histologically, if seen by a general physician who may not know of the past studies, will more likely be recorded as brain tumor, without specification of whether the lesion is benign or malignant, primary or metastatic. Thus if the patient is elderly, death is more likely to occur in the nursing home environment, and the cause of death will be less specific and probably less accurately reported. High autopsy rates help assure a high level of accuracy of death certificate diagnoses for obvious reasons. E. Lew: I would first retrace our thoughts to the reports of the Registrar General on occupational mortality that have been made since 1851. For some years now, authors of these reports have emphasized that the mode of life associated with a particular occupation may be far more important than the specific occupational hazards involved. The crucial problem frequently becomes one of differenti- ation between the effects of socioeconomic status and associated life-styles and the hazards of the workplace. In recent years, we have been confronted with large numbers of new chemicals, and this may have distorted our perspective, resulting in overconcentration on new chem- icals to the detriment of attention to the effects of the conditions of life characteristic of persons in specific occupations. This may be important, e.g., the hospital workers mentioned in Dr. Jeanne Stellman’s presentation. A compounding problem is that of the accuracy of death certificates for persons at the lower socioeconomic levels. In many circumstances, such as the asbestos workers in SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES Canada and migrant workers in the United States, the certifications have been of dubious value. J. Stellman: In women the situation is far worse because when we reviewed death certificates, I would say that about 409% of the women whose doctors were being paid from the union benefit fund still classified the women as housewives on their death certificates when they died. This is not a misclassification; there is no job classification as a worker for a housewife. Garfinkel: Even when you have autopsy or histologic material, you may still have a problem with diagnosis. I was amazed to learn when I heard a talk by Dr. Jacob Churg that a signal tumor like mesothelioma can easily be misdiagnosed and that it takes an expert to diagnose it correctly. I wonder if Dr. John Berg, as a pathologist, would give us some information about the problem of diagnosing mesothelioma. J. Berg: Mesotheliomas are particularly hard to diagnose because, by definition, they are in cavities where you see the spread of other cancers. For example, you do not have the nice transition from normal to atypical cells to cancer that establishes a primary bowel cancer. It is not that simple with mesotheliomas. Besides, the mesotheliomas are the great mimickers of both sarcomas and carcinomas; they even secrete mucin which only an epithelial cell should do. Making a diagnosis is hard, and I know that we, in our surgical pathology service, often just get a little piece of tissue because the tumor has spread so far that the patient cannot undergo an extensive operation, so you are limited in the amount of tissue obtainable. Often our diagnosis ends up just as an educated guess as to the nature of the cancer and how likely it is to be mesothelioma. I think that even an autopsy is not always the answer. 1 was going to do a study on cancer of the pancreas at the University of Colorado Medical Center in Denver, and I selected 10 cases initially. Among the 10 were 2 primary lung cancers, primaries diagnosed incorrectly by academic pathologists because they had no special training in analyzing where cancers originated nor any training in industrial diseases. Hence even in an autopsy series, error rates may be high for diagnoses for which the physician needs special experience. Error rates can sum as high as 20% on death certificates in many studies. Kurland: We should be reading Clinical Pathologist Conferences with heightened skepticism. I. Selikoff: 1 would like to thank Dr. Waterhouse for restoring some of my confidence in registry operations. A vast amount of work is done in registries and huge amounts of data are collected. Yet basically little has come out of this effort. What you presented to some extent restores my good feeling, but I think it is partly because of you and not necessarily because it is a registry. I do not know many registries that are used as a base for epidemiologic studies; yours is and that makes it unique. I suppose that the reason the Birmingham Registry was established was to do some good. You have done it, so 1 congratulate you not merely for keeping a good registry of a large population, i.e., 10% of the British population (we do 10% of the American population with our Surveillance, Epidemiology, and End Results Program), but your imaginative use of that Registry for epidemiologic purposes 104 DISCUSSION 111 and for monitoring of changes that occur. Otherwise, registries have been disappointing. L. Gordis: I would like to comment on the registry issue that Dr. Selikoff mentioned and on Dr. Waterhouse’s paper. Generally, registries have not been effective in alerting us to new environmental hazards or identifying clusters, whether cancers or congenital malformations. I think the observations of the astute clinician have probably played a greater role in identifying new hazards than routine monitoring through registries. In fact, their identifying new hazards would be a good strong case for more involvement of astute and caring clinicians in cohort studies or in epidemiologic studies of any kind. I also do not agree with Dr. Selikoff on another point, and that is that the registries in the United States have not been used for epidemiologic purposes. For example, let us look at the Los Angeles Registry. Drs. Mack and Henderson have published exten- sively on a variety of tumors. Using the New York State Registry, Dr. Greenwald has published extensively with other investigators. The Connecticut Tumor Registry has been a valuable resource. I think the way I would interpret what Dr. Selikoff said is that the registries have been used primarily for case-control studies. Therefore, I would like to raise the following question: Above and beyond the different kinds of valuable occupational studies that Dr. Waterhouse has described, what is the value of registries for cohort studies? In other words, should we be trying to convince our communities that do not have cancer registries of their potential value for cohort studies, or should we say the registries are to be used primarily for case-control studies? I think some discussion of that would be helpful. J. Waterhouse: What Dr. Selikoff and others have said underlines the importance of getting an occupational history. What goes on the death certificate is frequently a terminal occupation unrelated to the occupation relevant to any exposure the person may have experienced and to the cause of death. This type of study is expensive, but, as Dr. Berg said, “You get what you pay for.” To return to the registry and its place, I think it can be used genuinely for cohort studies and also for morbidity as well as mortality studies. We happen to be fortunate or unfortunate, as you wish, by having something like one- half the British rubber companies located within our region, and so we can do a sizeable morbidity study (which we are engaged in now). As I understand, one of the difficulties for researchers in the United States is finding registries for the areas where the factories are situated that use materials in which you are interested. Naturally, I am inclined to say that your registries should be extended and better funded. 1 can tell you that it has been an uphill struggle for us in England. We had to get money for all our studies from the outside. Nobody came forward with money or a request to do a study. If you put more money into registries and you extend that area, I think you can find they can be of tremendous value. N. Breslow: To reinforce what Dr. Waterhouse has said, one may look to the Scandinavian registries for excellent examples of registry contributions to cohort studies. Studies of nickel workers in Norway and brewery workers at the Carlsberg plant in Denmark are 2 that come immediately to mind. J. Stellman: I would like to go back to what Dr. Kurland said about registries. He indicated that he highly favored population-based registries and not hospital registries. I readily understand that reasoning when one only considers epidemiologic purposes, but I think pragmatic issues are relevant as well. The nitty-gritty of getting the hospital to submit accurate data to the state for inclusion into the population-based registry is long and complicated. The presence of a hospital-based registry greatly assures us that the data that will be submitted to the state are of better quality than they would otherwise be. Hospital administra- tion needs that extra step along the way. Also, if one wants to do a case-control study from the registry, get interview data, or examine the patient while the patient is still alive, the hospital registry is essential. In many states, we must wait 2 years until we know who the patient is, or was, and then it often becomes infeasible to do any of these studies. Here in New York State we have worked together in planning a study on patients while they are in the hospital. Later, in collaboration with the state, the total denominator will be obtained. That benefit is going to trickle over to the state in the next step, I am sure you will agree. On another issue, I have a problem with some of the statements that have been made. I think that I would like to turn around what Dr. Gordis has said and ask what is the place of the registry in cohort studies and what is the place of the cohort study for “weak associations.” Its place is weak, particularly when we consider the public health impact. What is the impact of merely a 20% elevated lung cancer mortality in nonsmoking workers, indicated by an SMR of 1.2, which is certainly a weak association? If we translate what we are calling weak associations from an epidemiologic point of view to a public health perspective, the social costs can be tremendous. We may want to reevaluate our social response to weak associations. W. Haenszel: 1 think one of the messages that comes through clearly in the presentations is that a large element of serendipity is in the conception and conduct of the studies. A lot depends on the initiative and imagination of the investigator: the kind of resources he has and the nature of the effects that appear. I would like to illustrate this serendipity aspect in giving some of the earlier history of the coke oven studies. Actually, as I told Dr. Rockette earlier, the sequence of events that triggered all this was a mass chest x-ray survey in Pittsburgh in the late 1940s. Dr. Gilliam, then head of epidemiology at the National Cancer Institute, assigned a Public Health Service commissioned officer, Dr. McClure, to Pittsburgh to conduct the studies to exploit the survey data. When these were completed, Dr. Ciocco recognized that the people who were x-rayed could form a potential study cohort. Many x-ray stations were at the entrances of several of the steel plants in Pittsburgh, and the idea came forth that, if we were going to use this cohort, more information on these workers could be secured from the plant records. With this thought in mind, Dr. Ciocco NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 DISCUSSION 111 105 examined the possibilities and decided that he would go for all the information available. Why stop with the workers who had been x-rayed? He would attempt to go back and reconstruct a cohort of all employees as of “x” years earlier and follow them. The tool that he had was that many of the medical directors of the steel plants were alumni of the University of Pittsburgh Graduate School of Public Health. Good rapport with the medical departments existed, and he was able to secure financial support from the National Cancer Institute. Staff of both the school and the Institute were interested in having a study of an employee population with well-defined occupational exposures underway. As many of you know, in those days the Institute was persona non grata with many employers. They did not trust the Institute, and the SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES Institute had to support studies of occupational carcino- genesis indirectly through such agencies as the University of Pittsburgh Graduate School of Public Health. REFERENCES (1) KUuPPER LL, MCMICHAEL AJ, SYMONS MJ, et al: On the utility of proportional mortality analysis. J Chronic Dis 31:15-22, 1978 (2) WONG O, DECOUFLE P: Methodological issues involving the standardized mortality ratio and proportionate mortality ratio in occupational studies. J Occup Med 24:299-304, 1982 (3) DECOUFLE P, THOMAS TL, PICKLE LW: Comparison of the proportionate mortality ratio and standardized mortality ratio risk measures. Am J Epidemiol 111:263-269, 1980 SESSION 1V Methods of Follow-up and Classification Problems Chairman: Leonard Kurland Co-Chairman: Leon Gordis Ce Chairman’s Remarks ' Leonard Kurland ? I am particularly pleased to have been invited to participate in this Workshop to honor Dr. Cuyler Ham- mond. Most of you are aware that he and my predecessor, Dr. Joseph Berkson, were not fully in agreement on the smoking and lung cancer issue. However, I do believe those early controversies stimulated higher quality studies which at least established for all of us, including Dr. Berkson, the relevance and importance of Dr. Hammond’s cohort studies. I am also sure that Dr. Berkson, on behalf of the Mayo Staff, would have much preferred to be here today, but because that cannot be, it is my great pleasure to do so. This segment of the Workshop will deal with cohort studies, classification of cancer, cancer control features, and records linkage. Because linkage of information is a key feature in the assembly and follow-up of cohorts and the identification of end points for outcome and disease classification, I would like to emphasize the value of records linkage. We shall hear about the linkage of computerized listings in Canada that combines information on persons-at-risk and end results such as cancer. This most valuable and efficient system for potential early detection of risk factors on a grand scale is denied the people of the United States by congressional rules generated by mis- directed fear that if computers can be misused to threaten freedom, they will be so used by government. Although independent access to the numerous population lists containing demographic and mortality data is available, information on occupation and other environmental items and the essential follow-up capability from Social Security and Internal Revenue files are denied epidemiologists. Congress could consolidate the files as in Statistics Canada and revise the rules so that safeguards on confidentiality and review systems could be maintained to prevent misuse and still permit the assembly of data for such constructive purposes. The current inefficiencies of follow-up greatly increase the cost of this research in the United States and reduce the enthusiasm for many desired epidemiologic ‘studies which could provide answers to serious questions on environmental exposure. Answers based on hard data rather than on speculations in the public media could reduce the fear and uncertainty of many in the population who believe that they have been injured through toxic or radiation exposures, even though these may be regarded by expert analysts as insignificant in quantity or duration. The hard data through linkage-directed studies could provide valid answers to many serious liability questions. I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Department of Medical Statistics and Epidemiology, Mayo Clinic, 200 First Street, S.W., Rochester, Minnesota 55905. The evolution of records linkage systems is usually traced to the initial suggestion of William Farr (1803-87) concern- ing their potential for cross-sectional and longitudinal research. Records linkage, as an efficient means of generat- ing cohorts and providing follow-up, was first defined in 1946 and thereby predates the (modern) computer, although it reaches its ultimate in efficiency, thanks to the computer systems developed by Newcombe in Canada. Credit for the use of the term “records linkage,” however, is given to Dr. Halbert Dunn, who was Director of the Bureau of the Census (United States). In 1946, he wrote: “Each person in the world creates a book of life. This book starts with birth and ends with death. Its pages are made up of the records of the principal events in life. Record linkage is the name given the process of assembling the pages of this book into a volume” (1). The application of this concept extends from the records in the family practice of medicine to the examples to be described on a national scale in Canada. The system at Mayo Clinic follows the decision of the founders to organize a medical dossier on each of their patients to help assure the best medical care of the patient and is, I believe, a major factor in the success of the Mayo Clinic. This unit- linked record included clinical, demographic, socio- economic, and follow-up information collected in the course of medical care, whether that was provided in the hospital or outpatient setting, laboratory, or autopsy room, or through house calls or nursing home visits. It also includes correspondence with the patient or the patient’s family. Epidemiology at Mayo Clinic, as related to the delineated population of Rochester and Olmsted County, Minnesota, has benefitted serendipitously from this records linkage development. In recent decades, that remarkably complete record system has facilitated descriptive, case control, and cohort studies on the local population that frequently provide unique information on incidence, risk factors, and outcome of disease in what may be one of the most comprehensive and efficient efforts in the country (2). The long-term goals of mass records linkage are ap- plicable to the comments of several of the speakers at this session. We can thus return to Dunn (/) who also recognized that “. . . establishment of a nation-wide system of records linkage for all persons in the country will become an invaluable adjunct to the administration of health and welfare organizations and at the same time produce coordinated statistical knowledge of great value.” Dr. William J. Nicholson, who opens this session, will emphasize the importance of the role of age and duration of exposure to asbestos based on some of the classical cohort studies done at Mt. Sinai Hospital. A description of the near complete national records linkage available through Statistics Canada with examples 109 110 CHAIRMAN’S REMARKS of its use in following cohorts with various diagnostic, therapeutic, and occupational risk factors will be given by Dr. Geoffrey Howe. Dr. John Berg will review the nosology and classification of cancer and will emphasize the need for the epidemiologist to work closely with the pathologist. We will hear a discussion on biologic markers in cancer control and a description of genetic and acquired factors by Dr. Paul Kotin. He will review many of the forms of cancer for which specific agents have been cited and will identify the items in data collection and analysis that provide for improved population surveillance and cancer prevention. REFERENCES (I) Kurland LT, Molgaard CA: The patient record in epi- demiology. Sci Am 245:54-63, 1981 (2) Dunn HL: Record linkage. Am J Public Health 36:1412- 1416, 1946 Selection Factors in Cohort Studies William J. Nicholson 2 ABSTRACT —Cohort studies play an important role in the quantitation of cancer risk among occupationally exposed individ- uals. Properly conducted cohort studies can develop important data on the age, time, and exposure dependence of cancer risk. Such information allows identification of possible selection effects which may be present and allows generalization of risk estimates to other exposure circumstances.— Natl Cancer Inst Monogr 67: 111-115, 1985. My first assignment at Mount Sinai School of Medicine was as a biophysicist working on analytical problems associated with the identification and measurements of asbestos in air, water, and tissue samples. However, Dr. Hammond’s fine work sparked my interest in epidemiology, and I began to work on some cohort studies with him and with Dr. Selikoff. In these efforts, Dr. Hammond provided superb guidance on the philosophy of and methodology and pitfalls in epidemiologic studies. We have several methods with which we can identify a possible hazard in human populations. With cohort and case-control studies, we can identify risks at various levels of significance in populations exposed to carcinogenic agents. Often, the identification of such risks can usually be done more efficiently by case-control studies because of the smaller populations involved. Although more time-con- suming, cohort studies have the great advantage of helping us to: 1) account for the effects of varying exposure circumstances; 2) identify effects of interactions among agents; and 3) treat confounding variables in a more definitive way than can be done by case-control studies. Of particular importance is the ease with which time-related effects can be elucidated. However, the sensitivity of cohort studies to time effects is a two-edged sword. Improper consideration of age, latency, calendar year, and time- related healthy worker effects can lead to seriously flawed studies. The proper treatment of the above time effects has been discussed extensively in the literature. In this presen- tation, I focus on other time- and age- or exposure-related effects of particular importance in quantitative risk assessments. Here our interest goes beyond the identification of a hazard associated with exposure to some agent in a specific work circumstance or workplace. We seek information that will specify parameters of an appropriate exposure response model, including the time and age dependence of any agent- I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Environmental Sciences Laboratory, Mount Sinai School of Medicine, The City University of New York, New York, N.Y. 10029. associated risk during and following exposure. With such knowledge, we can generalize the results from specific studies to other exposure circumstances, make better comparisons between cohort studies, project risk among previously exposed populations, and compare existing models of carcinogenesis. Indeed, such detailed information is required for regulatory action by the Occupational Safety and Health Administration, Environmental Protection Agency, and other regulatory agencies. Further, it will aid in the identification of high-risk groups that might be followed in beneficial surveillance or intervention programs. Finally, quantitative information is required by legislative bodies for the development of appropriate social programs, such as compensation for work-related illness. RADIATION-RELATED LEUKEMIA At this time, few studies exist that provide such general information; figure 1 contains data from research that does, i.e., from the Atomic Bomb Casualty Commission (I, 2). The absolute risk of different types of leukemia for different age groups is shown for the 35 years subsequent to the atomic bomb explosions. The data demonstrate that after a short latency the absolute risk increases rapidly, reaches a peak in from 5 to 20 years, depending on age, and declines thereafter. Had the relative risks been displayed, the increases would have been less dramatic, but the subsequent decreases would have been much more rapid. The data clearly indicate our need for long-term studies to demonstrate the full time course of risk, including its age dependence, and the need to look at different cancers separately because of their different time course of risk. With such information and detailed exposure response data, one can model leukemia risk, extrapolate to other radiation circumstances, and compare the induction of radiation-induced leukemia to the induction of leukemia from other agents. ASBESTOS-RELATED LUNG CANCER Figure 2 shows information that has been obtained in a study of asbestos factory workers that Dr. Hammond has worked on with Seidman and Selikoff (3). It depicts the relative risk (observed:expected deaths) of lung cancer in a group of workers who were employed for short periods in a factory where thermal insulation was produced (average employment time was 1.5 yr). Figure 2 is the asbestos analog of figure 1. In each, the short-term exposure to a carcinogen resulted in a significantly elevated relative and absolute risk within 10 years from exposure. A remarkable feature of figure 2 is that the relative risk had not decreased over the 35 years of follow-up, even though the absolute risk of lung cancer risk rose about 100-fold over the 111 112 NICHOLSON All Forms of Leukemia Age < 15 ATB 15-29 ¥ £2 = 0 5 10 15 20 25 Acute Leukemia Age < 15 ATB 15-29 30-44 45+ x — i JJ | —— 0 5 10 15 20 25 Chronic Granulocytic Leukemia Age < 15 ATB ~ i oc 0 5 10 15 20 2» YEARS AFTER EXPOSURE FIGURE 1.—Schematic model of influence of age at time of bombing and calendar time on leukemogenic effect of radiation (heavily exposed survivors). Data were from Okada et al. (I). ATB=at time of bombing. observation period. In multistage models of carcinogenesis, asbestos appears to act like a late-stage carcinogen that multiplies the underlying risk of cancer in the absence of exposure, but one that continues to do so for long periods after exposure has ceased. This continuing reaction can be accounted for partly by the fact that many inhaled asbestos fibers remain in the lungs for long periods. These resident fibers can act in concert with cancer initiators, RELATIVE RISK 2 I 1 IF ] 0 y . : 0 10 20 30 40 YEARS FROM ONSET OF EXPOSURE FIGURE 2.—Ratios of observed:expected deaths from lung cancer among short-term asbestos insulation production employees, according to time from onset of employment. Data were supplied by H. Seidman and I. J. Selikoff. some of which could be inhaled years after the asbestos exposure. The same data according to time from onset of exposure and age at the start of 10-year observation periods are presented in table 1. Considering that the group was employed for longer than 9 months, one sees that the risk is approximately constant according to years since onset of exposure in each age of observation category. This is the same constancy that was seen in figure 2. If the average relative risks according to age decades of observation (the category labeled “all”) are compared, one sees a decrease in the age decade from 50 to 59. This decrease may be attributed to the fact that older workers were given jobs that entail less exposure to asbestos dust, but it may also be an age-related effect separate from job classification TABLE |.— Relative risk of lung cancer during 10-yr intervals at different times from onset of exposure® Length of di Age at start of period, yr SXposure, mo exposure 30-39 No. of cases 40-49 No. of cases 50-59 No. of cases <9 5 0.00" 3.75 2 0.00¢ 15 6.85 1 4.27 3 2.91 4 25 — 2.73 2 4.03 6 All 3.71 1 3.52 7 2.58 10 >9 5 0.004 11.94 4 9.93 8 15 19.07 2 11.45 5 5.62 5 25 — 13.13 6 7.41 8 All 11.12 2 12.32 15 7.48 21 ¢ Data were from (3). Dashes indicate no data were available. b No cases were seen; 0.35 cases were expected on the basis of average relative risk in the overall exposure category. ¢ No cases were seen; 3.04 cases were expected as in b. 4 No cases were seen; 0.66 cases were expected as in c. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 SELECTION FACTORS IN COHORT STUDIES 113 because it is noted proportionally in the longer and shorter exposure categories and thus appears not to be strongly related to total exposure. Similar data on the relative risk for lung cancer in insula- tion workers were determined by Selikoff et al. (4). In figure 3, the relative risk is plotted according to age for those who were between 15 and 24 years and 25 and 34 years at the onset of exposure. The data for each group show a roughly linear rise with age and, consequently, with years from onset of exposure. The separate curves are parallel, reach the same maximum point, and then fall, again in parallel. The 2 curves are separated by the 10 years difference in the ages of onset. These parallel curves clearly indicate that the relative risk is independent of the age of onset of exposure and that the important parameter is the time since onset. Had the attributable risk been plotted, the slope and maximum for the older age group would have been two to four times greater than for those exposed at younger ages. The similarity of relative risk curves for 2 age groups further supports the concept that asbestos acts like a late-stage carcinogen. If we combine the curves of figure 3 and plot them according to time from onset of exposure, we obtain the data shown in figure 4. A linear increase with time from onset of exposure is seen for 30 years (to about the time when most insulation workers retire). After 40 years, the relative risk falls significantly rather than remaining con- stant after cessation of exposure as might be expected from the linear increase with continued exposure. The decrease may be due partly to the preferential elimination (by death RELATIVE RISK » T Age at onset eo |5-24 years | © 25-34 years ! ! ! ' 30 40 50 60 70 80 90 AGE FIGURE 3.-—Ratios of observed:expected deaths from lung cancer among insulation workmen according to age and age at onset of employment. Data were supplied by H. Seidman and I. J. Selikoff. SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES RELATIVE RISK ® ALL WORKERS _| O CIGARETTE SMOKERS 0 10 20 30 40 50 60 YEARS FROM ONSET OF EXPOSURE FIGURE 4.—Ratios of observed:expected deaths from lung cancer according to onset of employment in insulation work. Data were supplied by H. Seidman and I. J. Selikoff. from lung cancer and cardiovascular disease) of smokers from the population under observation. However, this is not the full explanation as a similar fall occurs for those individuals who were smokers in 1967. In calculating the relative risk of lung cancer in smokers, we used the smoking-specific data from Hammond’s American Cancer Society study of 1 million people (5). The similarity of the 2 curves in figure 3 suggests that age effects cannot fully account for the decrease. Selection processes, such as differing exposure patterns or differing individual biologic susceptibilities may play a role, but the exact explanation for the effect is not understood. A decreased risk at older ages appears to be a general phenomenon seen in many mortality studies. It occurred in a group of workers in a factory where asbestos products were made (6), and we categorized the workers by the intensities of their exposure. The effect was important in all exposure groups; thus the decrease was not the result of preferential elimination of those with the highest exposure. The phenomenon is also seen among chrysotile miners (7), but the peak occurred at later times from onset of exposure. This delay may have occurred because the miners began employment when they were 5-10 years younger than did the workers in the 2 previous groups. Before 1930, miners typically began employment at age 15 or even at age 12; most insulation workers began between ages 20 to 25. Thus the peak appears at about the same age as opposed to the same time from onset of exposure than for those who produced insulating materials, but the data on this point are limited. To appreciate the effect of this observed lung cancer time and age dependence on the results of an epidemiologic study, I calculated the excess risk of lung cancer for different observation periods for a hypothetical group 114 NICHOLSON TABLE 2.— Estimates of the percentage of the maximum expressed excess risk of death from lung cancer’ Age at start of Follow-up Yr from onset observation, yr |g yr 20yr Lifetime of exposure 20 02 32 55 0 30 34 65 55 10 40 69 91 56 20 50 97 81 55 30 60 73 55 46 40 65 55 41 38 45 70 37 29 29 50 4 The maximum expressed risk is that manifest 7% yr after the conclusion of the 25-yr exposure that began at age 20 for the individual. exposed at age 20 for 35 years to an air concentration that will lead, at its peak, to a doubling of lung cancer risk. The time course of the risk will be proportional to that of figure 4. Table 2 lists the percentage of the maximum potential excess risk of death from lung cancer for follow-up periods of 10 and 20 years and a lifetime beginning at different ages. As the hypothetical exposure is one that will lead to a 100% increase in lung cancer mortality at age 55, the greatest increase is seen at 50 years from onset of exposure in the 10-year follow-up group. Most other observation categories manifest significantly lower increases. Those begun earlier do not reflect the full 25 years of exposure. Those begun later reflect the decrease in risk we are considering. Most notable are follow-up periods beginning at age 65, at which the lifetime follow-up demonstrates only a 38% increase in respiratory cancer. Cohort studies are often conducted of retiree populations because of the ease of follow-up of individuals in a retirement program. To the extent that the experience of insulation workers is applicable to other exposed groups, a study of retirees can seriously under- estimate effects of asbestos exposure at other ages. The above data demonstrate that significant selection effects are manifest in populations exposed to asbestos at older ages and long times from onset of exposure. These effects must be taken into account properly if comparisons are to be made between populations that have significantly different exposure and age distributions. I suggest that it is the result of a combination of age, exposure, and other selection effects. To the extent that it is exposure related, one would not expect such a significant decrease to be manifest in groups exposed to low concentrations of asbestos, as in environmental exposure circumstances. Thus extrapolations made with data from high-exposure groups having a significant number of long-term individuals could significantly underestimate risks at low exposures (for which a linear exposure and dose-response relationship were used in calculation). ASBESTOS-RELATED MESOTHELIOMA In contrast to lung cancer, in which asbestos appears to be acting as a late-stage carcinogen, it appears to be acting as an early one in mesothelioma. This deduction can be made from an analysis of figure 5, which shows the absolute risk of AGE AT ONSET 1000 A CAGE 25 yr [a ° ® > AGE 25 yr wn ® x < 500 ° yu . x =z 3 x 200 w a Lgl © 100 oc a 50 0 x = < w e 20 A oo 10 1 1 | | 1 ! ! 10 20 40 6080 20 40 60 AGE YEARS FROM ONSET OF EXPOSURE FIGURE 5.—Death rates for mesothelioma among insulation workmen according to age and age at onset of employment and to time since onset of employment. Data were supplied by H. Seidman and I. J. Selikoff. mesothelioma among those who first worked with in- sulating materials before and after they were 25 years old. The curves for absolute risk are parallel and separated by the approximate 10-year difference in initial employment times of the 2 groups. Instead of the relative risk being independent of the age of exposure, one finds that the absolute risk is independent of the age of exposure and depends only on time from onset. When the data are combined by age, the right side of figure 5 displays the absolute risk of developing mesothelioma among insulation workers according to time from onset of exposure until about 45 years, after which it decreases significantly. Some of the decrease could be due to improper diagnosis of mesothelioma in older age groups, but the magnitude of the decrease appears greater than could be explained by such a cause. We may be seeing age- or exposure-related selection effects in the mesothelioma risk of asbestos similar to those seen in its action in lung cancer. EXTRAPOLATION OF ASBESTOS RISKS To make generalized estimates of asbestos exposure risk in different circumstances, it is clearly necessary that we take into account the different time and age courses of lung cancer and mesothelioma and consider the selection effects that appear to be manifest in occupational groups. The effect of these different time considerations can be seen in table 3, which shows the risk to lung cancer and mesotheli- oma of individuals of various ages from low exposures (similar to those seen in schools or office buildings). In developing the data of table 3, I used exposure response data from 11 studies. The important feature of table 3 is the dramatic difference in the lifetime risks of lung cancer and mesothelioma according to age. A 10-year-old girl exposed NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 SELECTION FACTORS IN COHORT STUDIES 115 TABLE 3.— Range of lifetime risks of death from mesothelioma and lung cancer| 100,000 persons® Age at Smokers Nonsmokers Sex onset of Mesothe- Lung M S exposure, ; g esothe- Lung yr lioma cancer lioma cancer Female 0 15.2 3.2 16.2 0.3 10 9.6 32 10.3 0.3 20 5.6 3.2 6.1 0.3 30 29 3.2 3.2 0.3 50 0.5 21 0.5 0.3 Male 0 11.5 5.0 13.6 0.4 10 7.0 5.0 8.4 0.4 20 3.9 5.1 4.9 0.4 30 1.9 5.1 2.5 0.4 50 0.3 3.9 0.4 0.3 4 Exposure to 0.01 femtomole asbestos/ ml was for 5 yr at 40 hr/wk according to age at first exposure and smoking. for 5 years has a three times greater chance of developing mesothelioma than lung cancer even if she later becomes a smoker. In contrast, after age 30 the lifetime risk of lung cancer for female cigarette smokers is greater than that for mesothelioma. Note that for lung cancer, it does not matter greatly at what age the asbestos exposure is experienced. On the other hand, the lifetime risk of mesothelioma, which SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES depends on time from onset of exposure, decreases significantly at older ages because the years of life expectancy over which the cancer can develop are limited. REFERENCES (I) OKADA S, HAMILTON HB, EGAMI N, et al (eds): A review of thirty years study of Hiroshima and Nagasaki atomic bomb survivors. J Radiat Res (Tokyo) 16(Suppl):1-64, 1975 (2) National Research Council Committee on the Biological Effects of Ionizing Radiation: The Effects on Populations of Exposure to Low Levels of Ionizing Radiation. Wash- ington, D.C.: Natl Acad Press, 1980 (3) SEIDMAN H, SELIKOFF 1J, HAMMOND EC: Short-term work exposure and long-term observation. Ann NY Acad Sci 330:61-89, 1979 (4) SELIKOFF 1J, HAMMOND EC, SEIDMAN H: Mortality experi- ence of insulation workers in the United States and Canada. Ann NY Acad Sci 330:91-116, 1979 (5) HAMMOND EC: Smoking in relation to death rates of one million men and women. Natl Cancer Inst Monogr 19: 127-204, 1966 (6) NICHOLSON WJ: Case study 1: Asbestos—the TLV approach. Ann NY Acad Sci 271:152-169, 1976 (7) N1cHOLSON WJ, SELIKOFF 1J, SEIDMAN H, et al: Long-term mortality experience of chrysotile miners and millers in Thetford Mines, Quebec. Ann NY Acad Sci 330:11-21, 1979 = Use of Computerized Record Linkage in Follow-up Studies of Cancer Epidemiology in Canada! Geoffrey R. Howe? ABSTRACT —Procedures involving the use of computerized record linkage and national mortality and cancer incidence files for follow-up purposes in cohort studies which have been developed in Canada during the past decade are described. The results of some specific studies are presented as well as the advantages and limitations of such methods and the desirability for future research and development in this area.— Natl Cancer Inst Monogr 67: 117-121, 1985. The use of record linkage in chronic disease epidemiology is a well-established technique (7). The term refers to the comparison of 2 records containing identifying information in order for one to ascertain whether those 2 records refer to the same individual. In follow-up studies, record linkage can be a useful tool in the ascertainment of the status of individuals enrolled in such studies at some time by comparisons of records from the cohort with cancer incidence or mortality records. When large numbers of individuals are enrolled in cohort studies, such an approach to follow-up potentially can yield considerable savings in time and cost. For studies of diseases which in the statistical sense are rare (such as cancers of specific sites) and for exposures that may lead to a relatively small increase in risk, large cohort studies are required to give such studies sufficient statistical power. Even for exposures associated with a small relative risk, if such exposures are widespread in the population, the attributable risk and resultant public health problem may be substantial. Alternatively, as is frequently true (e.g., in occupational studies), determina- tion of exposure to a specific carcinogen may be difficult. Consequently, a group defined as “exposed” may contain small numbers of individuals with exposure to that carcinogen. This dilution effect may effectively lead to a small measurable relative risk, even though the true relative risk associated with that carcinogen exposure may be high. Thus interest is increasing in the use of large cohort studies in cancer epidemiology and consequently in the application of record linkage techniques as the primary tool for follow-up. Here I describe developments in the theory and applica- tion of computerized record linkage techniques to cohort studies in cancer epidemiology that have been initiated and conducted in Canada during the past decade. More than 20 such studies are now in progress; the investigators come I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 National Cancer Institute of Canada Epidemiology Unit, Faculty of Medicine, McMurrich Building, University of Toronto, Toronto, Ontario M5S 1A8, Canada. from national research institutes, universities, federal and provincial governments, industry, and from various or- ganizations outside Canada. An outline of the methodology which has been developed will be given as well as some examples of the application of this methodology to specific studies, followed by discussion of the problems and future potential of this approach to cohort studies in cancer epidemiology. MATERIALS AND METHODS The objective of those setting up a computerized record linkage system is to be able to compare all records from | file (e.g., an occupational cohort) with all the records from another file (e.g., national mortality records) to determine which record from the first file forms a link with one from the second file, i.e., refers to the same individual. Such a system involves two components: 1) the development of the appropriate probabilistic theory to quantify the believability of any such links and 2) the development of an appropriate computerized system to implement routinely and practically the results of the probabilistic theory. Probability theory.—The initial development of the theory of record linkage and its application to medical research studies was done by Newcombe and his co- workers (2). A detailed account of some of the early applications of this work in various areas of medical research has been published (3); Newcombe has continued to be a leading figure in further developments which have taken place in Canada during the past decade. The generalization of the probability rules involved in record linkage has been given by Howe and Lindsay (4). In essence, when considering the comparison of 2 specific records, one compares each identifying item such as surname; given name; day, month, and year of birth; and observes an outcome. This outcome may be defined as specifically as required, i.e., full agreement on all characters of surname with a particular value, e.g., “Smith,” or agreement on the initial letter of the given name but disagreement on the remaining characters, e.g., “John” versus “Jeremiah.” If the identifying items concerned are independent, the odds in favor of the 2 records being a true link may be expressed as a product of terms, in which the numerator is the probability of obtaining that outcome in the set of all links, the denominator is the probability of obtaining that outcome in the set of all nonlinks, a term in which the numerator is the overall probability of obtaining a link, and the denominator is the overall probability of obtaining a nonlink. It is customary for one to use a logarithmic transform and thus to define a weight which is 117 118 HOWE log, of each of these product terms. The weight so computed is specific for each possible outcome for each identifying variable. By adding the weights for all identi- fying items for the comparison of any 2 particular records, one obtains a total weight which is a measure of the relative odds in favor of a true link between any 2 particular comparisons in the given data sets. In a computation of weights for the individual outcomes, 2 components are required. The frequency weight is a measure of the relative frequency with which the particular value of the identifying item occurs in the files being used; weights become larger as the identifying item becomes rarer. The transmission weight estimates the relative frequency with which a particular item is misrecorded or changes over time and becomes more negative as the corresponding level of agreement in the matched set becomes rarer. Specific formulas for these weights have been given (4). The frequency weights may be obtained by a simple tabulation of the identifying items on the files being used for linkage, but the transmission weights can generally only be esti- mated once a preliminary linkage has been conducted. The system described in the next section uses an iterative approach and gives initial approximate values for these transmission weights, then, with the resultant links that are obtained, these weights are recalculated, etc., until self- consistency is achieved. Computer system.—A generalized iterative record link- age system (4) has been developed jointly by members of the Epidemiology Unit of the National Cancer Institute of Canada and the Health Division of Statistics Canada. The 2 files being linked are first sorted with the use of a six- character code of surname. Only records that contain the same value of this code are subsequently compared because a comparison of all records on both files is not practical, in view of the excessive time and cost involved. Thus records emanating from the same individual that have different codes will not be linked, though in practice such dis- crepancies appear to happen no more than 1-2% of the time. The next step is a comparison of all pairs of records with the same value of the surname code. For each such comparison, a record is generated containing the outcomes of the comparisons of all the identifying items with a corresponding approximate total weight for that compari- son as described above. Outcome records for which the total weight is above a specific lower threshold are then kept for further processing. This file of potential links thus contains essentially all candidates for true links, and further processing may be done on this much reduced volume. A new set of transmission weights is calculated from these potential links as described previously, and the new values are used to re-weight the potential links. This process is iterated. When self-consistency has been achieved (usually requiring only 2 passes), the outcome file is sorted by total weight to produce a file which contains links in order of decreasing believability. The next step is the grouping of records which have formed links with each other and then resolution of any conflicts. For example, when each record on the 2 component files refers to a unique individual, only a 1:1 link is possible. A cutoff point must then be established for the total weight, so that one accepts links with a weight above that value and rejects links with a weight below that value. The establishment of this cutoff point generally has to be done empirically, though Fellegi and Sunter (5) have developed a likelihood theory, which may be used if the value of a number of parameters relating to error rates is known. Alternatively, if further identifying data are available which are not in machine-readable form, one could define an area of possible links lying between 2 threshold values and manually resolve these links using the further identifying data. However, in the absence of further data, such manual resolution will generally introduce more erroneous links than establish correct ones (6). In practice, further refinements can be added in any particular record linkage depending on the resources available. The applica- tion of such refinements to one particular linkage has been described by Newcombe et al. (7). APPLICATIONS The use of the generalized iterative record linkage system (4) described above as a tool for follow-up of individuals in cohort studies in Canada has been made possible by the existence of the Canadian national mortality data base maintained by Statistics Canada. This contains records of all deaths which have occurred in Canada since 1950 in machine-readable form and those of Canadian residents occurring in the United States. The national cancer incidence reporting system contains data for all cases of cancer registered in Canada since 1969, with the exception of the province of Ontario. We hope that data for Ontario will soon be added to the national system. When this is done, it should prove an invaluable end point, particularly for cancers with a low fatality rate. Four typical cohort studies which have been conducted with the generalized iterative record linkage system and the national mortality data base are described below. Fluoroscopy Study The primary objective of those who conducted this study was quantitative assessment of the carcinogenic risk of low doses of low linear energy transfer ionizing radiation, particularly as it affects the female breast. The cohort consists of all individuals first admitted to Canadian institutions for treatment for tuberculosis between 1930 and 1952 and for whom hospital records were still available in the 1970s. Information was collected from these records on identifying items and treatment for tuberculosis, par- ticularly by collapse therapy, which involved substantial exposure to fluoroscopy. Dose computations (8) indicate that many of the women enrolled in the study received substantial doses of radiation to the breast. The investiga- tors ascertained mortality of the cohort between 1950 and 1980 by linking records from the cohort to the mortality data base to establish fact, date, and cause of death. In addition, a death clearance of the cohort between 1940 and 1949 was also accomplished with data recently assembled by Statistics Canada for those years, though cause of death is not available. A preliminary analysis of results relating to breast cancer mortality between 1950 and 1977 for women known to be alive (from hospital records) in 1950 has been conducted and the results compared with those from corresponding studies (9). Table 1 shows the relative NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 EPIDEMIOLOGIC RECORD LINKAGE STUDIES 119 TABLE 1.— Results of several cohort studies with use of computerized record linkage to the Canadian mortality data base for follow-up Study (yr of Reference Treatment given Reference Cause of Relative risk” follow-up) study group group death Males Females Fluoroscopy £2] Fluoroscopy Canadian population Breast cancer — 1.59% (1950-77) No fluoroscopy Canadian population ee 0.99 Isoniazid (10) Isoniazid (1952-60) No isoniazid All cancer 1.01 0.93 (1962-73) (1952-60) Lung cancer 0.99 0.94 Bladder cancer 1.13 e Labor Force (12) Ten percent sample of Canadian population All causes 0.834 Survey (1965-73) Canadian labor force All cancers 0.884 Lung cancer 0.954 Canadian National (13) Possible occupational Lung cancer 1.20 Railways exposure to diesel (1965-77) fumes Probable occupational No occupational 1.357 exposure to diesel fumes exposure to diesel fumes 9 Age and year are standardized as appropriate. b P< 0.0001. ¢ Numbers are insufficient for stable estimates. 4 p<0.05. ¢ P (trend) < 0.001. mortality experience of the women by exposure status compared with the Canadian population and shows a highly significant excess breast cancer mortality in the exposed group. A more detailed analysis of the entire female cohort is currently being conducted. Isoniazid Study Isoniazid, introduced into Canada in 1952 for the treatment and prophylaxis of tuberculosis, is still widely used in countries where the prevalence of tuberculosis is substantial. Reports that the drug is carcinogenic in animal species have also raised the possibility of a carcinogenic effect in man. A cohort study was conducted with the use of records from the national tuberculosis register maintained by Statistics Canada. Information was extracted from all records for patients treated between 1952 and 1960 in all institutions (with the exception of Ontario due to the current lack of Ontario incidence data). The cohort so formed contained a substantial proportion of individuals who had been treated with isoniazid. Inasmuch as the drug was generally not used on an outpatient basis until after 1960, those individuals not recorded as having received isoniazid may be regarded essentially as nonexposed during that period. Mortality and cancer incidence were deter- mined with computerized record linkage and mortality records from 1950 to 1973 and incidence records from 1969 to 1973. The results (10) do not indicate any evidence of excess cancer risk either for all neoplasms combined or for those of lung or bladder, the main site of interest (table 1). However, we cannot exclude an effect which might SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES manifest itself after a latent period of 15 years, so an update of the linkage is planned in the near future. Labor Force Survey Study Establishment of general monitoring systems for adverse health effects arising from occupational exposure are difficult and costly. One of the few examples is the decennial supplement published by the Registrar General for England and Wales (1/7). A system for such monitoring has recently been established in Canada (/2). The cohort consists of 700,000 individuals, forming an approximately 10% sample of the Canadian labor force enrolled in a survey between 1965 and 1971. The occupation and industry in which the members of the cohort were employed during those years and their social insurance numbers are recorded. Unfortunately, the social insurance number does not generally appear on a death certificate, so that the probabilistic techniques described in the previous section had to be used in the follow-up of the cohort with respect to mortality. Investigators used the master index of social insurance numbers to obtain identifying information, such as full name and date of birth, and the record linkage system to determine mortality. The analysis of the approxi- mately 20,000 deaths occurring between 1965 and 1973 has been reported, and the linkage has recently been updated to the end of 1979; this analysis is still in progress. The overall mortality, overall cancer mortality, and mortality for lung cancer for the entire cohort with respect to the Canadian population are shown in table 1. We will use these data to generate and test hypotheses, and they should prove a 120 HOWE TABLE 2.—Summary of other cohort studies with use of computerized record linkage to the Canadian mortality data base for follow-up Exposure of interest Approximate Source of cohort No. of subjects Morticians Formaldehyde 1,500 Province of British Age at first birth 300,000 Columbia International Nickel Nickel 62,000 Company Falconbridge Company ” 12,000 Ontario miners Uranium 16,000 Eldorado Nuclear 21,300 Company Atomic Energy of External 20,000 Canada radiation Alberta Cancer Registry Survival 175,000 Newfoundland fluorspar Radon daughters 2,000 miners Nutrition Canada Diet 20,000 Survey valuable tool for monitoring new occupational hazards during the coming years. Canadian National Railway Study This cohort consisted of approximately 44,000 indi- viduals who retired from the Canadian National Railway Company and whose deaths between 1965 and 1977 were determined by the record linkage system and the Canadian national mortality data base (13). The occupation in which each individual was employed at the time of retirement was classified by its probable level of exposure to diesel fumes and coal dust. The results showed a highly significant dose-response relationship between lung cancer and pos- sible exposure to diesel fumes and coal dust, though the increase in risk was small (table 1). This study illustrates the utility of large cohorts in reducing sampling error for relatively common cancers and the consequent utility of computerized record linkage that enabled the mortality to be determined at low cost. It also illustrates the difficulty of one’s interpreting occupational studies in which 1) specific levels of exposure to potential carcinogens are not known, and 2) the resulting misclassification may well introduce a substantial bias in the estimation of risk. Other Studies Table 2 shows in summary form a number of the other studies now being conducted with the use of computerized record linkage in conjunction with the mortality data base and national cancer incidence reporting systems. They illustrate a wide range of exposures that may be readily investigated with the system. When complete, these studies will make a most valuable contribution to cancer epidemi- ology in general. DISCUSSION The use of health records to supplement follow-up procedures in cohort studies is well recognized. The approach, which has been developed in Canada during the past decade, differs in that it relies entirely upon such procedures to establish the status of members of the cohort at a specific time. Our use of the probability theory to quantify such linkage procedures has been a major step forward because it eliminates the subjective biases which may be introduced by human clerical procedures and gives appropriate statistical interpretations to the believability of the links found. The other major improvement is com- puterization of the approach systematically so that con- siderable economy in time and money may be achieved and thus make large cohort studies feasible at relatively low cost. However, the obvious limitations and drawbacks to this approach should be fully recognized. In the first place, it is essential that adequate identifying data be available for all cohort members, including particularly full given name, surname, and full date of birth. These items alone may not be sufficient to distinguish all duplicates unless the given name and surname are rare; if supplemented with a few items (e.g., place of birth, parents’ names, etc.) they should prove adequate. Similarly, the comparison records must also have adequate identifying information, and if they do, a high degree of reliability in assessment of the believability of any observed links is achieved. A second limitation is the necessity for having a data base (for outcome status, such as mortality) that provides complete coverage of the geographic area in which the outcomes in the cohort occur. National mortality records essentially provide this sort of coverage as should a national cancer incidence reporting system, though the problem of migration from the country must be considered. This latter problem generally tends to be small according to the Canadian experience. However, inadequate coverage is probably provided if only single provincial or state registries are used, unless the follow-up time is relatively short because intracountry migration is much more common than is intercountry migration. The 2 problems of adequate identifying information and complete coverage are the major limitations to development of a system such as the Canadian one. However, the recent development of the national death index in the United States could facilitate the development of parallel systems there. One partial solution to the problem of a lack of identifying data could be use of 1) computerized record linkage for establishment of the definitive status of a large proportion of the cohort and 2) alternative standard manual techniques for individuals who show possible links to the outcome file. This type of procedure could sub- stantially reduce the cost of follow-up. The other limitations implicit in the type of cohort study described are common to most large cohort studies. Retrospective cohort studies which can provide immediate answers to questions depend on the existence of exposure data and preferably information relating to possible confounders. Such data sets do exist particularly in the occupational context, though they may require substantial work before they can be in a form appropriate for analysis. Because such a process generally involves computerization of the records, they may then be in an appropriate form for computerized record linkage. The lack of appropriate data sets with identifying, NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 EPIDEMIOLOGIC RECORD LINKAGE STUDIES 121 exposure, and covariate information is probably the largest current limitation to the extension of the type of study described above. Therefore, efforts must be made toward incorporation of such information wherever possible in records that are being established currently, frequently for other purposes. Employment records in particular provide a rich potential source for future studies, and this fact has been recognized by several Canadian companies that have deliberately set up procedures for developing data bases that may be incorporated in the future into such cohort studies. Some further work in methodology is indicated, particularly in the handling of nonindependent identifying information; refinement of the computation of weights (especially for items which are associated with status at follow-up, e.g., age); and simplification of the computer systems involved. Nevertheless, the state of the methodo- logical art is certainly adequate for present purposes. However, what are required are empirical validation studies of the use of these techniques by comparisons of mortality determined by computerized record linkage with that determined by standard conventional procedures. Presently, only indirect evidence on the efficacy of these procedures is available (6). Finally, a comment must be made about confidentiality. A major potential concern of computerized record linkage is that records for an individual could be brought together from many files, so that the resulting composite record could be a major breach of that person’s privacy. This, of course, is a problem which is general in medical research and is not unique to computerized record linkage. However, such linkages do raise a possibility of large-scale breaches of confidentiality. In Canada, all linkages which involve vital status records are handled physically within Statistics Canada under the auspices of the official Statistics Act. No records containing identifying data are released to re- searchers after vital status information has been added; this procedure appears more than adequate to safeguard individual interests. SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES REFERENCES (I) MACMAHON B, PuGH TF: Epidemiology Principles and y Methods. Boston: Little, Brown, 1970, p 95 (2) NEwCOMBE HB: The use of medical record linkage for popu- lation and genetic studies. Methods Inf Med 8:7-11, 1969 (3) ACHESON ED: Medical Record Linkage, London: Oxford Univ Press, 1967 (4) HOWE GR, LINDSAY J: A generalized iterative record link- age computer system for use in medical follow-up studies. Comput Biomed Res 14:327-340, 1981 (5) FELLEGI IP, SUNTER AB: A theory for record linkage. J Am Stat Assoc 64:1183, 1969 (6) NEwcoMBE HB, SMITH ME, HOWE GR, et al: Reliability of computerized versus manual death search as in a study of the health of Eldorado uranium workers. Comput Biol Med 13:157-169, 1983 (7) NEwcoMBE HB, SMITH ME, PoLIQUIN C, et al: Final Computer Procedures for the Eldorado Mortality Searches (ENL-Link-2). Ottawa: Eldorado Nuclear, Sta- tistics Canada, 1983 (8) SHERMAN GJ, HOWE GR, MILLER AB, et al: Organ dose per unit exposure resulting from fluoroscopy for artificial pneumothorax. Health Phys 35:259-269, 1978 (9) Howe GR: The epidemiology of radiogenic breast cancer. In Radiation Carcinogenesis: Epidemiology and Biological Significance (Boice JD Jr, Fraumeni JF Jr, eds). New York: Raven Press, 1984, pp 119-129 (10) HowE GR, LINDSAY J, CoppPOCK E, et al: Isoniazid expo- sure in relation to cancer incidence and mortality in a cohort of tuberculosis patients. Int J Epidemiol 8:305-312, 1979 (11) Registrar General for England and Wales: Occupational Mortality 1970-1972. Decennial Supplement. London: HM Stat Off, 1978 (12) HOWE GR, LINDSAY JP: A follow-up study of a ten-percent sample of the Canadian labor force. I. Cancer mortality in males, 1965-73. JNCI 70:37-44, 1983 (13) HOWE GR, FRASER D, LINDSAY J, et al: Cancer mortality (1965-77) in relation to diesel fume and coal exposure in a cohort of retired railway workers. JNCI 70:1015-1019, 1983 Problems in Classification of Cancer for Epidemiologic Research 2 John W. Berg? ABSTRACT —In vital statistics and most epidemiologic studies, cancers have been classified mostly by site of origin alone. This continues to be true even though it is continually being demon- strated that among cancers of a site important subsets with different epidemiologies almost always are present. Reasons for epidemiologists’ failure to use all the information contained in the standard cancer classification are explored as are problems that arise from the nature of the classification, from the nature of the cancers being classified, and even from patient characteristics that determine how much information on the cancer can be gathered. The solution to the problem of too little information is generally difficult, but pathologists can say more about the epidemiologic implications of their various diagnoses and epidemiologists can learn to use these diagnoses in their cohort and other studies.— Natl Cancer Inst Monogr 67: 123-127, 1985. Others at this Workshop have paid tribute to Dr. Cuyler Hammond’s professional achievements. The influences of these in my case has been matched by his personal influence on my career. I have been unusually fortunate in the number of senior, talented individuals who have allowed me to work with them and who have been helpful and supportive in so many ways. Dr. Hammond was my first contact with epidemiology, and so he showed me the area where I would find the most pleasure working these past years. I find my topic particularly appropriate because the first time Dr. Hammond and I worked together was on the Committee on Nomenclature and Classification of Disease of the College of American Pathologists. The committee’s charge at that time was to devise a code for current diagnoses being made by pathologists on surgical, bio- chemical, and autopsy material. Most of the committee members were young practicing pathologists who were up- to-date regarding nomenclature in their special fields but who knew little about classification or codes. If Dr. Hammond had not also been a committee member, I doubt that they could have produced much of anything, much less the quite successful “Systematized Nomenclature of Pathology” (/). He taught us, among other things, that we must have a structure that reflected pathologists’ views of their material and a nomenclature that they actually were ABBREVIATIONS: SNOP=Systematized Nomenclature of Pathol- ogy; MOTNAC=Manual of Tumor Nomenclature and Coding; ICD=International Classification of Diseases; ICD-O=ICD for Oncology. ! Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Supported in part by a departmental gift from the R. J. Reynolds Industries, Inc. 3 Departments of Pathology and Preventive Medicine and Biometrics, University of Colorado School of Medicine, Denver, Colorado 80262. using and would continue to use for their diagnoses. The two most important concepts in pathologic diagnoses are the part of the body affected and the lesion found in that location. When a code, such as the “International Classifi- cations of Diseases,” (2), mixes two concepts, problems arise. For example, in the Malignant Neoplasm section, both “stomach” and “lymphoma” are major rubrics, so that the coding of the not infrequent primary lymphomas of the stomach becomes arbitrary and controversial. In the SNOP, we made every effort to separate different concepts. Different fields were created for topography, morphology, etiology, and function. Within each field the contents were again structured, different kinds of agents were separated in chapters on etiology, and different kinds of pathologic processes in morphology and in function. Because we based our assignments on the advice of the best authorities we could find, SNOP often proved to be a better educational tool than existing dictionaries. For example, by locating exactly where an unfamiliar lesion was in the SNOP, one would learn whether current thinking considered it a congenital anomaly, a disorder of growth, or a true neoplasm. Another lesson we learned from Dr. Hammond about a good classification is that, as implied in the previous paragraph, facts, not merely opinions, should determine the classification structure and resolve arguments. As a corollary, we should make the product a nomenclature as well as a code by using only the best names for a category. Less appropriate names including eponyms and older terms that were less well-defined are better omitted. Yet another principle, also based on the supremacy of facts, was that a classification should be tested thoroughly before being introduced. The SNOP fell short of ideal particularly in this regard. In fact, it was tested quite thoroughly for completeness, for ease of coding of diag- noses, and for acceptability by pathologists. It was not tested, as far as I know, for reproducibility of coding, for correctness of coding, or for the ease and completeness of retrieval. In this regard, the committee was realistic, if cynical. Much material would be coded and little would ever be retrieved. A pathologist or clinician would usually use retrieval to find an illustrative case for teaching, not for a total, in-depth analysis. I raise these points partly because history is one of the subjects in this Workshop, but mostly because it is relevant to any discussion of today’s problems of classification. The widespread acceptance of SNOP, which included transla- tion into other languages, has had a special impact on cancer classifications. (In retrospect, this explains and justifies Dr. Hammond’s involvement in what superficially seemed to be a project remote from his American Cancer Society responsibilities.) There were classifications of cancer that separated site and histology long before SNOP, but not only did these differ from country to country but 123 124 BERG usually each major center had its classification. I know that the classifications used in this country at Memorial Hospital for Cancer and Allied Diseases in New York City and at M. D. Anderson Hospital and Tumor Institute in Houston both differed greatly from that of the “Stan- dardized Nomenclature of Diseases and Operations” and from each other. The Armed Forces Institute of Pathology used yet another scheme, etc. I think most groups used only one histology code but I also know that some used a site- specific code, so that squamous cell carcinoma or fibro- sarcoma would have one histology code number at one site and another number at another site. The SNOP and its successors have changed all this because SNOP came at a time when pathologists were coming to consensus in many areas of tumor classification. It reflected and supported as much of a standard universal language as could be identified in the mid-1960s. Because of this, the tumor morphology section of SNOP was used in the American Cancer Society’s 1968 “Manual of Tumor Nomenclature and Coding” (3). A widely useful site code was created by adaptation of the malignant neoplasm section of the then current Eighth Revision of the ICD. Both hospital-based and regional cancer registries saw the logic and utility of a pathologist-oriented morphology code and an ICD-ori- ented site code. Group after group adopted MOTNAC, and translations were made into other languages. The sub- committee responsible for the “Neoplasm” Section of the Ninth Revision of ICD agreed that they had to supplement the site rubrics with a histology classification. Once’ this decision was made, the MOTNAC code was the only viable candidate. This was expanded, updated to a degree, and put out by the World Health Association as the “Interna- tional Classification of Diseases for Oncology” (4). To complete the circle, when the College of American Pathol- ogists redid and expanded SNOP as Systematized Nomen- clature of Medicine, they used the ICD-O histology classification for their Neoplasm Section. Thus one great problem of cancer classification is solved: We have a standard, internationally sponsored and used classification for cancer histology. In one sense, the most important problem now is that this classification is almost never used in epidemiology. Our central focus in this discussion is to explore why this is true. I would agree first of all that if the only follow-up on a cohort is to be the death certificates, there is no reason for any kind of detailed classification of cancer; the details are too likely to be wrong. Even general cancer site information is not to be trusted. In I study I observed in England, a diagnosis of bone cancer on a death certificate correctly meant primary cancer of bones less than one-half the time. In this country, the net misclassification for rectal cancer was 30% in 1 study I did. Histologic information is even less trustworthy. However, we hope that those who conduct cohort studies will use more cancer data than that on death certificates, given that the certificate data are inaccurate and represent a biased subset of cancer patients as well. At least one can expect more studies to work backward from the death certificate, as Selikoff does, and collect the “best informa- tion” on the patients. For more and more cohort studies, patients are either being followed with periodic medical examinations or are matched against cancer registry files. Because cancer differs from most diseases in that verifica- tion of the diagnosis usually means microscopic examina- tion of cells or tissues, histologic information is available almost routinely. One must seek further to find out why it is so rarely used. Three general explanations seem plausible. Histologic information might not be relevant to most epidemiologic investigations, it might not be understood, or the classifica- tion might actually put barriers in the way of its use. I think the last two points are true, and I shall go into detail about the last point, even to suggest remedies, but I have long insisted that epidemiologic studies must include histologic information. Although it is true that, for most cancer sites, one histologic type (perhaps with minor, irrelevant morphologic variations) dominates the picture, it also is true that for almost every site there are other histologic types with known different causes, known different patho- genesis, or at least known different epidemiologies in the broad sense of that term. Some will be important for study as entities, but in any case a mixture of all types means that information about the principal cancer type will be blurred at best and at worst completely obscured by the accom- panying “noise.” As to the epidemiologic importance of rare tumors, each time I list those that have been responsible for the recognition of a new cause of human cancer the list grows longer. Now, of course, we can add Kaposi’s sarcoma to mesothelioma, hepatic angiosarcoma, vaginal adenocarcinoma, etc. It still seems to be true that epidemiologists sit back and wait for alert pathologists to call these new disease complexes to their attention rather than monitoring incoming data themselves. This is one reason I believe epidemiologists do not fully understand histologic information. Training should be strengthened in this area. The classification itself is a problem and is most obvious when one compares a pathologist’s list of the few cancer types associated with a particular organ with the much longer list of histologic types assigned to that site in any reasonably large cancer registry data set. By providing different rubrics for every term that exists as an entity at some site, the morphologic code even in its restricted MOTNAC form provides too many places to code cancers for any 1 site. Of the 43 sites I work with, 39 contain at least 2 histologic types of importance, the median number is 6 types, and only 9 sites give rise to more than 10 types worth keeping separated. Yet even the abbreviated site-type tables in the Third National Cancer Survey and Surveillance, Epidemiology, and End Results monographs indicate how much more diagnoses are used for these cancers. In the basic data, most sites include more than 30 histologic diagnoses and for some sites like lung more than 50 are provided. A few of the extra terms in the longer list are remnants of outdated or idiosyncratic classifications, but most are fairly respectable synonyms of the preferred standard terms believed to describe different kinds of cancer at that site. The trouble is that what are synonyms at I site are importantly different terms for another site (if the terms always were synonyms, they should not have received different code numbers). As far as I can tell, “papillary carcinoma, 805__,” “papillary adenocarcinoma, 826__,” NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 PROBLEMS IN CANCER CLASSIFICATION 125 ”» and “adenocarcinoma, 814__" are synonomous for epi- demiologic and prognostic purposes when applied to bowel or ovarian cancers. However, in the breast, “papillary adenocarcinoma” is a special kind of cancer epidemi- ologically and prognostically, quite unlike “adenocar- cinoma,” and, of course, though “papillary carcinoma” in the breast means a papillary adenocarcinoma, in the tongue, larynx, etc., papillary carcinoma means a special type of squamous cell carcinoma. There are also differences by location in how specific a term is. In the lung, “adenocarcinoma” means a particular type of lung cancer with a different epidemiology (to some extent) and prog- nosis than more common types of lung cancer. In the prostate, “adenocarcinoma” is a more general term but for practical purposes has been as specific as one needed to get with histologic typing. In the stomach, “adenocarcinoma” is a nonspecific term, often used even when no histologic confirmation of the cancer diagnosis has been made. To be specific, one has to say what kind of an adenocarcinoma it is. For example, for prognostic purposes, is it a medullary carcinoma or a signet-ring cell carcinoma? For epidemi- ologic purposes, is it a diffuse or an intestinal type? Among breast cancer diagnoses, adenocarcinoma for most people is an extremely nonspecific term and even suggests that the pathologist is not sure whether the cancer is primary in the breast or metastatic from another site. With nothing in the classification to assist one in the understanding of these nuances, proper organization of retrieval is difficult indeed, even for most pathologists. Correct or not, this state of affairs was explicitly accepted when this classification was created and reaffirmed each time a new group adopted it. Coding was easy and this certainly helped in the acceptance of the code because a lot of coding would be done before serious efforts at retrieval were made. It also meant that a single, site-independent code number of each term was available. The alternative would have been creation of a highly structured classifica- tion for each site. Only then would a complete site-type code number have meaning. Retrieval of specific histologies across sites would not be at all simple. Coding would be harder because of all the site-type lists that would have to be consulted. In some instances, such an approach might have been justified but it was rejected here for one overriding reason: This classification was going to have to serve multiple purposes. Pathologists word their diagnoses partly to communicate with each other and partly to communicate with clinicians. For the clinicians, patholo- gists try to indicate what the prognosis of the particular cancer is and increasingly to identify cancers that respond to particular treatments. It is reasonable that a classifica- tion with this orientation would take precedence over one important for epidemiology if the 2 classifications were different. By providing for coding of descriptive detail beyond the immediate needs of the clinician, the current classification not only provides a way to accommodate clinical discriminations among cancers but increases the chances that epidemiologically important separations will be recorded even if they are not clinically important at the moment. At this point in the discussion it seems obvious that the remedy for the oversupply of histologic categories is simple. SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES For each site, we should have a simple standard guide for grouping the diagnoses. Up to a point, such guides exist in almost every description or classification of cancers of a particular site. The most explicit structured classification schemes now are found in the World Health Organization series, the “International Histological Classification of Tumours.” However, these were created by pathologists for pathologists, without explicit reference to external criteria that would indicate the importance of any of the recom- mended separations or groupings. In this series, the degree of detail varies from site to site as does the compre- hensiveness of the lists of synonyms. Even the basic structure of the classifications differs widely without obvious reason. To gain widespread acceptance, the com- mittees of authors were primarily senior, politically important pathologists. With no objective external criteria necessary, it is not surprising that some of their categories and groupings appear to be representing personal views instead of a widespread consensus or a data-dictated decision. What is needed, but in general does not exist, is a site- specific classification of cancers designed especially for epidemiologic purposes. As long as we are thinking of the desired ideal, site divisions and histologic classes should be chosen for epidemiologic relevance. Beyond this, criteria for any classification should be specifically addressed for a cancer classification but they rarely are. Classification rubrics are named and illustrated by the archtypical example. Equally important is that the boundaries be well described and rules be given that determine inclusion in or exclusion from the category. Two kinds of evidence should support the definition of the category: 1) evidence that the category is real, i.e., that the members of it differ significantly from members of other categories; 2) evidence that the criteria for classification are good enough and clear enough that they can be correctly applied by the great majority of individuals who will be classifying the day-to- day material. In this regard, quantitative rules are harder for one to use reproducibly than are qualitative rules. The classification should not depend on tests that are not likely to be done in many instances (a point often overlooked in staging schemes that rely on expensive and invasive procedures). The goal should be optimal separation of entities, neither too coarse a separation so that important specific entities are buried in general categories, nor so fine a separation that members of one natural group are assigned to many different categories. Mesotheliomas are just one example of an epidemiologically important cancer the initial increase of which was missed in large part because the cases were coded to so many different places. Although the problem of reorganizing the current classification of cancer so it can be used in epidemiologic studies has not been solved, 1 believe we are in sight of a workable solution. We have some rare and some common cancers that we can associate with particular etiologic agents. Although lung and stomach are the only 2 cancer sites I can think of where different types of epithelial cancer can be separated on the basis of pathogenesis, in all organs one can assume that different pathogenetic pathways for cancers arise in different tissues. Thus deductively, lymphomas and sarcomas of a site should be separated 126 BERG from the carcinomas of epithelial origin and from each other. When we do not know causes and have not described the pathogenesis of a cancer, can we still make a reasonable estimate of the likelihood that it has the same or different epidemiology than another cancer type from the same site? Yes, because we have been doing this for some time with regard to cancers of different sites. We look at cancer data from different countries and conclude that if 1 cancer is relatively more frequent in one country and 1 relatively more frequent in the other, then the 2 cancers probably have different etiologies. Although international compari- sons are beginning to be done for histologic types within a site, e.g., see (5), the demonstrations required each time that pathologists from different parts of the world are using or can learn to use histologic terms in exactly the same way are costly and delay progress. Cancer classification criteria are more homogeneous now within this country, and I suggest that we can now use case material from different United States populations when we look for histologic types of cancer that may be different epidemiologically. This extension is presently available and not only could simplify histologic classifications, but it could also indicate which of the many subdivisions of site that we find in ICD and ICD-O that are likely to be epi- demiologically important, and which are not. It even offers a chance for the introduction of some objectivity into the classification process. We have one good precedent in the way sex ratios served, as Kreyberg demonstrated (6), to separate lung cancers of different etiologies. As an extension of this, I suggest formalizing the concepts of “demographic difference” and “demographic distance.” The first term refers to the testing of cancer types for verification if the differences as to age, sex, race, and possible geographic occurrence are statistically different. The second term, which corresponds to the “taxonomic distance” used for organization of data on biologic species, etc., refers not to the statistical significance of population differences for different types of cancer but to how far apart cancer types are in a standard multidimensional space based on age, sex, race, and other usable measures. These concepts have been presented elsewhere (7) and will be developed in more detail as testing validates or changes the present methodology. Even in preliminary form, however, the concepts have proved useful in simpli- fying one important but neglected type of cohort study: monitoring large population groups for increases in rela- tively rare cancers that could signal that we have another new human carcinogen. In such a monitoring system one does not want to repeat the mistake that caused mesotheliomas to be overlooked, i.e., the division among so many sites, mentioned above. The ideal monitoring system must separate every type of cancer that should be justly considered a separate entity and coordinate all of the synonyms and terms for closely related cancers. Similar reasoning should apply to subsite separation and to additional categories proposed for site or histologic classification. Changes that improve discrimina- tion among different kinds of cancer should be favored and promoted. Changes that do not help in such dis- crimination should be deplored and ignored. Some of the other problems of cancer classification are also immediately solvable in theory, others not. The solvable problems are those concerned with the classifica- tion scheme itself; most are concerned with the site classification. As suggested above, more site distinctions are provided than we need. When there are 7 partly overlapping subsites for the tongue, we can be sure that many cancers will cross boundary lines by the time they are diagnosed. Many also will be described by clinicians who have forgotten or never bothered to learn the minutiae of the classifications and so use them incorrectly if at all. This is unfortunate because at least one important distinction can be made: Cancers of the base of the tongue have a different epidemiology from the others as well as different standard treatment and prognosis. Site distinctions that deserve to be maintained in epidemiologic studies also have problems with definitions. For example, particular problems exist for the clinicians or pathologists in distinguishing cancers of the hypopharynx from those of the supraglottic part of the larynx and in properly locating cancers in the general region of the rectum, sigmoid, and rectosigmoid junction. These bound- ary lines are in areas of high cancer frequency, so that minor biases will shift relatively large numbers of cases from one category to another. Another problem is that some cancers are particularly susceptible to be coded to different sites on the basis of what appears to the clinician or pathologist to be a minor variation in phrasing. An example would be the assignment of a primary site to a fibrosarcoma involving the parotid gland and extending anteriorly from it into the cheek area. Some will call it a sarcoma of the parotid, others a sarcoma of the cheek. In the latter case the coder has the choice of coding it to the cheek mucosa, to the skin of the cheek (actually to “skin of other parts of the face”) if skin fixation is mentioned, to connective tissues of the head, face, and neck (my choice), or to the “wastebasket category 195” in ICD and ICD-O where “cheek, NOS” is listed. Careful attention to the rules should exclude the last option, but all of the other choices can be justified on the basis of only trivially different phrasing describing the site of the tumor. Finally, among the tractable problems are a few in the logical organization of the codes. Among the worst is that in ICD-O, retroperitoneal sarcomas still are classified with cancers of the digestive system. That might have made sense in the days when most abdominal masses were not biopsied. Today, most cancers assigned to the retro- peritoneum as primary site are soft tissue sarcomas. In fact, after the leg, the retroperitoneum is the most common site for soft tissue sarcomas to arise, yet I know of no one except myself who groups them with the other soft tissue cancers for reporting and study. Mostly just a potential problem at this time is that of conflicting nomenclatures: one classification for clinical purposes and a second one, incompatible with the first, needed for epidemiology. The important example at the moment is the classification of stomach cancers. The “intestinal ”-“diffuse” separation that is the key histologic one for epidemiologic purposes is almost never used by pathologists in diagnostic reports because it is in conflict with the classification used to convey prognostic informa- NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 PROBLEMS IN CANCER CLASSIFICATION 127 tion to clinicians. It is possible that similar difficulties could arise for cancers of other sites if epidemiologic study suggests a special classification. The discouraging thing is not that different purposes may require different emphases in classification, but that in the past 20 years that we have known about the difficulty with stomach cancer, no one to my knowledge has tried to see if a new classification could be devised that would serve both needs. Unclassified material is the largest of the intractable problems. Diagnostic efforts are applied unevenly to different parts of the population especially in that few and milder diagnostic procedures are used for older patients. Above 85 years, less than one-half of the cancers will be biopsied at most internal sites. Because the distribution of cancer types within a site as well as the distribution of sites of origin of cancer are age related, this means that undiagnosed cases are not a random sample of all cancers. For some sites, such as lung, we know that even more biases exist. It is easier for a physician to biopsy squamous cell cancers of the lung and to obtain cytologic specimens from them than other types of lung cancer because more arise in large proximal bronchi. Hence the squamous cell type will be recognized as such preferentially. We tend to place other types of lung cancer in the unclassified category more often because classifiable material is less easy to obtain. That some specific diagnoses require more generous amounts of tissue than do others is also true. For example, it is hard for me to imagine that a mixed adenosquamous cancer of the lung would be recognized as such on a small biopsy or on cytology. Usually, only the squamous or the glandular element would be seen and the cancer (mis)- labeled accordingly. A more frequently encountered prob- lem is that the more undifferentiated a cancer is, the more tissue must be examined before a confident assignment to a specific underlying type can be made. Undifferentiated tumors tend to be extensive and inoperable; generally, they also are more frequently seen in the elderly (Berg JW: Unpublished observations). The bias is compounded be- cause age itself decreases operability and so decreases the amount of tissue available for study. Another property of some differentiated cancers is that the longer they are present the more likely it is that an undifferentiated clone will emerge and become the main type. Thus the same cancer could be typed if biopsied early but not typed if it was allowed to grow in a patient who delayed seeking help. Because the more I look, the more complexity I find, I call the problem of unclassified cancers intractable at the present time. A closely related and also fairly intractable problem is the one of uncertainty. Even with a great deal of informa- tion on a case, there will be times when physicians and consultants will be uncertain about the site of origin of a cancer or about its histologic type. Skilled observers can reduce the number of cases for which this is true, but as the decisions become harder in a “gray” area, skill merges into dogmatism, particularly if the expert has no easy way to confirm his conclusions. I have left the problem of erroneous classification until last because I believe it is the least important at present. Probably the greatest source of incorrect pathologic SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES diagnoses is the coder who, with no way to encode the word “probably,” makes an expression of uncertainty equivalent to a firm diagnosis in summary statistics. Beyond this, the most frequent errors would be found for newly introduced diagnoses and in instances when full discrimination has no clinical importance. Because I believe that the more specific a diagnosis is, the more likely it is to be correct, I would use the diagnostic expertise of a consultant in a cohort study above all to review uncertain and nonspecific diagnoses, not to confirm the specific ones. In conclusion, I would say that the most important (because it is the most general) problem in the classification of cancers for epidemiologic studies is the distance that separates epidemiologists from the pathologists and clini- cians who generate the histologic, topographic, and death certificate information that the epidemiologists have to use. Two derivative problems that concern me involve the epidemiologists’ failures to use classification information. In cohort studies, this is their failure to go back from death certificates to collect better information or their failure to separate histologic types when incidence cancers are tabulated and studied. When population cohorts are being monitored by cancer registries, again they fail to separate different types of cancer and to watch for the increase in rare cancers such as those that have heralded new environmental carcinogens in the recent past. Pathologists also could help narrow the distances their diagnoses seem to lie from epidemiologic relevance by describing the epidemiologic or at least demographic connotations of new diagnostic categories they introduce, but, even more importantly, by relating diagnostic categories to their precursor lesions. Are black-white and male-female dif- ferences in lung cancer types explainable by different propensities to squamous metaplasia of bronchial epi- thelium and can this in turn be explained by sex, race, occupation, poverty, or by specific nutritional defects? Combined pathologic—clinical-epidemiologic attacks on these questions are needed, but many questions and the opportunities for response will not be recognized until much more use is made of the information contained in the full site-histology classification of cancer. REFERENCES (I) Committee on Nomenclature and Classification of Disease: Systematized Nomenclature of Pathology. Chicago: Coll Am Pathol, 1965 (2) World Health Organization: International Classification of Diseases, 9th ed. Geneva: WHO, 1977 (3) PErcY CL, BERG JW, THOMAS LB (eds): Manual of Tumor Nomenclature and Coding, 1968 ed. New York: Am Cancer Soc, 1968 (4) World Health Organization: International Classification of Diseases for Oncology, 1976. Geneva: WHO, 1976 (5) CORREA P, JOHNSON WD (eds): An International Survey of Distributions of Histologic Types of Breast Cancer. Geneva: UICC, 1978 (6) KREYBERG L: Histological Lung Cancer Types. Oslo: Nor- wegian Univ Press, 1962 (7) BERG JW: The epidemiologic meaning of histology in lung cancer. In Lung Cancer: Causes and Prevention (Mizell M, Correa P, eds). Deerfield Beach, Florida: Verlag-Chemie Int, 1984, pp 117-129 Rewards for Cancer Control With Use of Biologic Markers and End Points Paul Kotin 2 ABSTRACT —The increasing availability of biologic markers and end points offers significant potential for improvement in cancer control. These tools are underutilized today, but it is clear that they can be major assets to pathologists, clinicians, epidemi- ologists, and ultimately to programs designed for preventive or therapeutic intervention in populations and individuals at high cancer risk.—Natl Cancer Institute Monogr 67: 129-132, 1985. Probably no aspect of the pathobiology of cancer has been as tragically neglected in the past as the topic assigned to me. I say this because there are great prizes from the use of biologic markers and end points that are ours to win if we will. Let me list 5 of these rewards (table 1); I will elaborate on each of these items and conclude with methods for follow- up and classification. The true significance of individual susceptibility to cancer and its impact on the distribution of the disease has not emerged because of ignorance which we hide behind the euphemism of “host factors.” We can be far more specific than that. In fact, we can relate variations in susceptibility to an array of specific factors whether these are acquired or genetically determined, as shown in table 2. I believe it is important to distinguish the enhanced susceptibility to cancer in which environmental experiences play an etiologic role from the “predetermined” high risks seen in hereditary syndromes. For example, the combina- tion of genetic susceptibility and environmental agent (sunlight) is well known in skin cancer pathogenesis. However, acquired susceptibility involves two possible sets of mechanisms. In one, the lesions that affect susceptibility are actually part of the carcinogenic process, e.g., meta- plasia with atypism in the bronchus or ulcerative colitis in the colon. In the second, although the acquired factors provide a milieu conducive to pathologic proliferation, the underlying abnormal tissue is not part of the neoplastic process. We find this in the severe scarring associated with alcoholic cirrhosis in liver cancer and in interstitial fibrosis from asbestos in lung cancer. We have here the criteria for the pathologic entity known as scar cancer. We can be specific with regard to unusual risk of can- cer from genetically determined, hereditary syndromes (table 3). My use of “et ceteras” in table 3 signifies that the list of cancers associated with chromosomal disorders, chromo- somal fragility, and single gene disorders is growing. Of course, the crucial question is whether the increased cancer Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 24505 South Yosemite, #339, Denver, Colorado 80237. risk is exclusively due to the genetic defect or whether exogenous factors also play a role in the increased risk. The report of Miller and Todaro (/) on the enhanced trans- formation of cells from patients with Fanconi’s anemia by the oncogenic simian virus 40 directs attention to this possibility. The step from susceptibility factors to precancerous lesions is a precise one, although the two entities are often grouped as one. As shown in table 4, the special nature of precancerous lesions becomes apparent in the compre- hensive definition presented by Koss (2). I must emphasize that no biologic inevitability of malignant neoplasms from precancerous lesions is guar- anteed. Some may quarrel with the classification of carcinoma in situ as a precancerous lesion, but this labeling represents the jargon of pathology and refers to histologic appearance, not to natural history which represents the view of the clinical oncologist. To put this another way, it is the pathologist who ultimately characterizes the precan- cerous lesion, but it is the clinician who addresses its treatment. In this difference of approach and concept, we come to an area that is as controversial as it is vexing: the correlation between etiologic influences or agents and the histopathology and natural history of cancer sites. Even so, these correlations are of great concern in epidemiology (table 5). Biologic variability certainly makes possible sites or histopathologic patterns, or both, other than those listed in table 5. Epidemiologic considerations would suggest that these rubrics are most useful in relation to classification and follow-up. In fact, biologic markers and end points would appear to be especially useful in the surveillance both of individuals and populations in relation to factors of unique susceptibility and the pathogenesis and natural history of cancer. These tools would provide a mechanism for the measurement of the impact of preventive or therapeutic intervention. At this point, 1 turn to the fifth reward given in table 1: a method for surveillance of persons and populations and the measurement of impact of preventive and therapeutic intervention. Let me begin this second part of my charge with the general observation that any investigation of a population chosen for epidemiologic study of cancer risk must judiciously take into account the relative homogeneity of the exogenous factors to which the population is exposed and individual variation within that population. Table 6, which derives from this view, outlines some elements for classification and follow-up of both a study population and the individuals who constitute it. Medical specialists have had a deplorable tendency in the past to underestimate the importance of a complete clinical 129 130 KOTIN TABLE 1.— Rewards from use of biologic markers and end points® 1) Identification of susceptibility factors in the response to exposure to carcinogens 2) Identification of “hereditary syndromes” characterized by predetermined unusual risks to cancer 3) Characterization of precancerous lesions and states 4) Correlation of cancer site, morphogenesis, histopathology, and natural history with etiologic influences 5) A method for surveillance of individuals and populations and measurement of impact of preventive and therapeutic intervention “ Host factors, congenital and acquired, in combination with the pathobiologic characteristics of neoplasms, provide an opportunity for surveillance and scientific intervention. history, including emphasis on cancer aggregates, site- specific cancers, and other unusual aspects of high or low cancer incidence. Similarly, we have not paid sufficient attention to the need to probe high-risk habits or exposure to high-risk environments. The exception here, of course, has been the work of Dr. Hammond whose pioneering contributions and leadership have been invaluable. As seen in table 6, an array of tools and procedures is available for the profiling or screening of populations. These not only provide information in individuals but can also yield serendipitous returns. For example, cytology is an established method for identifying abnormalities on epithelial surfaces. My personal experience with a sputum cytology screening program is a case in point. A special cytology program for workers exposed to asbestos was initiated at the Manville Corporation as a feasibility study to see whether it might have beneficial effects on the incidence and natural history of lung cancer. At about the same time, a no-smoking program was also initiated. After a no-smoking program had been in place a couple of years in the largest West Coast plant where the occupational exposure was well below standards established by the Occupational Safety and Health Administration, Dr. TABLE 2.—Susceptibility factors in cancer? Cancer site Susceptibility factor Genetic Skin Xeroderma pigmentosum Skin Fair skin Skin Pigmented nevus Acquired Liver Cirrhosis Hepatitis B infection Lung Bronchial metaplasia with atypism Diffuse interstitial fibrosis Colon Ulcerative colitis Leukemia/lymphoma Aplastic anemia Immunosuppression Skin Burn scars Cervix Early age intercourse Multiple sex partners TABLE 3.— Hereditary syndromes associated with unusual risk of cancer® Syndrome Cancer type Leukemia Lymphoreticular, epithelial, and mesenchymal cancers Leukemia and lymphoreticular, epithelial, and nervous system cancers Leukemia and lymphoreticular cancer Down’s syndrome IgM deficiency Ataxia-telangiectasia Wiskott-Aldrich syndrome Etc. Etc. Etc. 4 Chromosomal and immunobiologic abnormalities can serve as bases for usual risks. Saccomanno reported the disappearance of all severe atypias observed in those workers in whom these had been detected in earlier specimens (Saccomanno G, Kotin P, Chase GR: Manuscript in preparation). The co-effects of cigarette smoking and asbestos exposure in increasing the risk to lung cancer are well known. The apportionment of risk to the 2 factors is a much debated matter. Yet the beneficial impact of cessation of smoking, and, as a corollary, its role in the etiology of lung cancer, was quantifiable from an epidemiologic viewpoint and scientifically compatible from a carcinogenic mechanism viewpoint. A second approach to surveillance of workers involves a combination of periodic clinical examinations, laboratory work-up, x-ray diagnosis, and special tests. In the general population, special high-risk factors also point to special diagnostic procedures, e.g., mammography in high-risk women. The utility of the data accumulated by application of the three alternatives will, in large measure, be determined by the quality of methods for data storage and retrieval as well as the adequacy of the follow-up procedures. Current technology, combined with rigorous scientific investigation, can assure maximum information in characterizing cancer risks and then epidemiologic patterns. The “whipped cream TABLE 4.— Precancerous lesions® Anatomic site Pathologic classification Colon Villous polyp Skin Arsenical keratosis Breast Intraductal papilloma Urinary bladder Papilloma Stomach Atrophic gastritis, intestinalization of mucosa Skin, larynx, oropharynx, Leukoplakia urinary bladder Skin, malignant melanoma All epithelial surfaces Pigmented nevus Carcinoma in situ 4 Skin, visceral organs, and the hematopoietic and lymphopoietic systems have distinct and specific susceptibility factors. ¢ All anatomic sites may have lesions at increased risk of cancer formation. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 BIOLOGIC FACTORS IN CANCER CONTROL 131 TABLE 5.— Correlation of etiology with site and histopathology of cancer” Site Cancer Agent or influence Sinus Adenocarcinoma Woodworking Liver Angiosarcoma Thorotrast Vinyl chloride Skin Squamous cell Arsenic Lung Undifferentiated small Ionizing radiation cell, central Lung Bronchiolo-alveolar, Asbestos peripheral Skin Keratoacanthoma Oil field work Marrow Leukemia Benzene Mesothelium Mesothelioma Asbestos “ These correlations, in common with all biologic phenomena, are characterized by exceptions. on the pie” (and I am delighted to witness its attainment) is the expanding scope of record-linkage mechanisms and a National Death Index. The approaches to prevention include a wide variety of mechanisms for reducing exposure to hazards in all compartments of the environment and life-style. Cessation of smoking or reduction or elimination of occupational carcinogen exposure, or both, have documented successes, e.g., beta-naphthylamine and urinary bladder cancer, and high-boiling point petroleum factions and waxpressor- scrotal cancer. The potential benefit of chemoprevention will have to await confirmatory epidemiologic demonstration of reduc- tion in risk. Biologic markers and end points can be identified in TABLE 6.— Surveillance of populations® 1) Characterization of the study population Adequate clinical history including familial experiences with cancer Exposure to high-risk environments: workplace, urban residence Presence of high-risk cultural traits or habits: smoking, alcohol ingestion, fat diet Existence of increased individual susceptibility factors, e.g., fair skin or precancerous lesions, bullous polyp (i.e., endogenous or exogenous factors, or both) 2) Screening of population Cytology: sputum, urinary, fecal Profiling (bioanalysis) for immunologic, enzymatic, hormonal, or cytogenetic abnormalities Clinical diagnostic procedures, x ray for interstitial fibrosis, occult blood in stool Mammography Periodic physical examination 3) Development and maintenance of data base with adequate follow-up and appropriate record linkages National Death Index Cancer registries 4) Measurement of impact of preventive or therapeutic intervention Preventive: removal from or correction of high-risk environment, chemoprevention Therapeutic: removal of high-risk or precancerous lesions 9 Population monitoring and surveillance is a leading edge concept in modern preventive medicine. every component of the carcinogenic process from the examination of cells for cytogenetic abnormalities through the successive stages of pathogenesis and morphogenesis of cancer to the histopathologic types and natural history of cancers in man. These markers can be concomitants of the TABLE 7.— Biologic markers and end points Markers Determinants Cancer Cytogenetic abnormalities Metaplasia with atypism Tumor related-specific antigens Immunologic abnormalities Precancerous lesions Visceral organ scarring Enzymatic abnormalities High-risk environments High-risk habits Unusual (rare) cancers Cancer histology and site Down’s syndrome, genetic Cigarette smoking, acquired Immunologic competency Immunologic deficiency, IgM deficiency, genetic Anatomic abnormality, acquired Ulcerative colitis Genetic, villous polyp Alcohol, cirrhosis acquired Xeroderma pigmentosum, genetic endonuclease deficiency Workplace carcinogens: Asbestos Halo ethers Polycyclic hydrocarbons Cigarette smoking Ethanol ingestion Vinyl chloride Asbestos Asbestos, scar cancer acquired Acquired Acquired Leukemia Lung, larynx, urinary bladder Liver, colon Lymphoreticular, epithelial, mesenchymal cancers Colon Liver Skin Mesothelium Lung Skin/lung Lung, larynx Esophagus Angiosarcoma, liver Mesothelioma, pleura peritoneum Lung, peripheral bronchiolo-alveolar SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES 132 KOTIN carcinogenic process (fibrous) or, in fact, critical to the evaluation of a cancer (metaplasia with atypia). In table 7, the variety of markers and the broad array of cancer types and cancer sites to which they are related are evident. I believe it fair to conclude that the existing markers and end points are underused in current population studies, and their expanded use can only contribute significantly to cancer control. In the past decade, we have witnessed an inevitable explosion in markers and end points growing out of our increasing understanding of the carcinogenic processes at the cellular level. Although such markers, e.g., DNA adducts, are still confined to the experimental laboratory, workers in the fields of cancer biology and control should be prepared to incorporate them into well- designed epidemiologic studies at the appropriate time. REFERENCES (1) MILLER RW, TODARO GJ: Viral transformation of cells from persons at risk of cancer. Lancet 1:81-82, 1969 (2) Koss LG: Precancerous lesions. In Persons at High Risk of Cancer. An Approach to Cancer Etiology and Control (Fraumeni JF Jr, ed). New York: Academic Press, 1975, pp 85-102 Co-Chairman’s Remarks ' Leon Gordis ? I never had the pleasure or privilege of working with Dr. Hammond as so many of you have. My knowl- edge of him is through his papers. Frequently, as a student of Dr. Lilienfeld’s, I would hear references made to the papers of Dr. Hammond that were held up as models of high quality research. Dr. Kotin discussed the issue of markers which I think is a terribly important one. I believe that in the next few decades we will see tremendous advances in biologic markers of susceptibility. However, one problem is that biologic markers are often obtainable only with blood samples, skin biopsies, or some other invasive procedures. We often have marker data only on the patients, and I would like to raise the question of what we can do practically to get data relevant to appropriate markers on a total defined population for purposes of cohort studies. Dr. Kotin made a reference to susceptibility factors including sexual intercourse at an early age for cancer of the cervix. To me, an important question is why some women who have sexual intercourse at an early age develop cancer of the cervix, whereas others do not. What are the factors that determine susceptibility in women with this kind of exposure? Can we use markers to identify bio- logically susceptible subgroups of the population either now or in the future? Given a certain exposure, who would be susceptible? Dr. Berg addressed a most important issue of the classification or nosology of disease, and I heartily agree with him, but I would like to add that we probably have to go even beyond the usual classification use characteristics such as estrogen receptors. I think the nosologist has to go beyond the International Classification of Diseases and even its more refined subdivision, and I hope we will make major advances in this area in the next decade. I would like to raise three general questions for con- sideration and discussion. I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Department of Epidemiology, School of Hygiene and Public Health, The Johns Hopkins University, 615 North Wolfe Street, Baltimore, Maryland 21205. Implicit to me in these discussions is the issue of a comparison between cohort and case-control studies, even though the issue has not been directly addressed. 1 would like to ask a naive question: Given that we can do either a case-control or cohort study in a certain situation, and that the level of prevalence of exposure and level of risk of outcome is adequate for either, and that money is not a problem, I would like to know if the expert group assembled here believes that truth is more likely to come from a cohort than from a case-control study. Some have tended to downgrade case-control studies and believe that the cohort study is the answer to our prayers, but I wonder what opinions you may have. The second question is: What is the future of prospective studies? Do we need another major prospective study, e.g., of coronary heart disease? Should there be another study like Framingham in view of the resources available and the specific questions that need to be answered? Finally, we are often not dealing with either straight- forward cohort or case-control studies. We are dealing with mixed study design, hybrid studies. For example, in Hagerstown, Maryland, Dr. Comstock presides over a valuable serum bank of about 26,000 specimens that were drawn in the mid-1970s during a mid-decennial census. By having these blood samples, which probably reflect the premorbid condition of the population, we can follow this population and, as some of them develop a rare disease, match them with controls, thaw their sera, and do whatever assays seem appropriate for studying premorbidity risk factors for certain diseases. Essentially, this becomes a case-control study “piggy- backed” on a cohort. The last example that I might give is investigation of a food-borne disease outbreak. Such an outbreak will often come to our attention because of cases in our community. We identify the persons afflicted and then the exposed persons without the disease. We then calculate the attack rates in people exposed to various foods. One might ask the semantic question: Is this a case control or a cohort study? Therefore, my point is that for teaching and discussion purposes, it is useful for us to distinguish the two, but in real life, we often end up with various hybrids and mixed study designs, which is probably the way it should be. 133 Discussion IV 12 E. Lew: I would like to refer briefly to a slide which Dr. Nicholson showed of the estimate of the observed risk ratio. The figures for the lifetime ratio were high at the youngest ages and dropped with advance in age. I believe your highest age category experienced only 29% of the maximum excess risk. The slide focuses on relative risk. Inasmuch as death rates increase with age, the absolute extra mortality (i.e., extra deaths/ 1,000) will increase with advance in age even though the relative risk decreased with age. This is another way of looking at risk, which is frequently illuminating. W. Nicholson: For lung cancer in asbestos workers, I spoke of the excess mortality as relative risk because relative risk was independent of the age of exposure and depended only on time from onset (duration) of exposure. Indeed, if one looks at added or attributable risks, those exposed first at age 25 to 35 would have a threefold to fourfold greater increase (for the same exposure) compared with those first exposed before age 25. The point of the slide (my table 2) that I showed of the hypothetical cohort was to illustrate the dramatically different relative risks one obtained for the same exposure circumstances when different observation periods were used. I also wanted to suggest that it is a way of comparing what would otherwise be disparate observation circumstances. N. Breslow: My question is related both to that slide and also to the one in which you showed that the incidence of mesothelioma is starting to decline after 60 years from initial exposure. To what extent are these results due to the high correlation between the time from onset of exposure and the calendar period in which that first exposure took place? You mentioned the possibility of differential doses in different periods, but I would like the point clarified. Nicholson: It is unlikely that differing intensities of exposure over time can explain the decrease. Let me just consider mesothelioma in which the absolute mortality appeared to fall rather dramatically in a short time. As can be seen from the right side of my figure 5, the risk of mesothelioma death increases about 2.5 times for each 5 years between 20 and 40 years from onset of exposure. The point for ages above 50 is about ten times lower than the continuation of the curve that figure 5 would suggest. Thus zero exposure for at least the first 10 years of employment would be required to produce such a low value. There is no indication that any significant exposure changes did in fact take place during the 1920s. Other I Conducted at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Address reprint requests to Lawrence Garfinkel, Epidemiol- ogy and Statistics Department, American Cancer Society, 4 West 35th Street, New York, N.Y. 10001. possible causes of a reduced risk, such as poorer diagnosis of mesothelioma in older men, also would not account for such a significant drop. Breslow: Is it true that everyone who would have been exposed in 1925 and who was alive in 1967 is in your cohort? Nicholson: No. Breslow: Is there a possibility of differential selectivity? Nicholson: No; they had to be members of the union in 1967, and some individuals who would have terminated membership to work elsewhere could have been lost. However, the group is a stable one because the pay is high. Death before 1967 would be the most likely cause of exclusion. If one looks at the numbers, which are con- secutive, few are missing from the group. N. Mantel: I was wondering about those last data (table 3) when you were talking about the risk associated with a 5-year exposure to 0.01 fiber/ml when you have various ages. Were those the ages at which they began the 5-year exposure? (For table 3, see page 115.) Nicholson: Yes. Mantel: I think what is interesting about this table is that mesothelioma, which is specific for asbestos, drops off sharply, but because lung cancer is not so specific for it, it does not decline sharply with age. Nicholson: Basically it does not decline because the time course of lung cancer is determined by the course of lung cancer from cigarette smoking. The cigarette smoking is the driving feature there, and so you will see lung cancers in the men between the ages of 50 and 70 years. If you get an asbestos exposure any time before that, it will multiply the cigarette smoking risk. Contrastingly, the asbestos exposure is driving the mesothelioma, and that takes a long time to become manifest. Those exposed at older ages do not live to the period of high risk. The point is that you have to take these differences into account when doing any quantitative risk analysis. Mantel: Presumably, you have people who would have had their single year of exposure at age 50. At that time, they hired people at pretty advanced ages. Nicholson: The manifestation of that effect was seen in my table 1 shown earlier. The people who were exposed when 45 to 55 years old and observed from age 50 to 59 beginning 5 years after exposure had a tenfold increase of lung cancer deaths immediately. Mantel: On those other charts in which you showed the risk ratio first going up and then going down with age, 1 believe the main reason for the ratio decreasing with age is that the background rates are rising. Nicholson: No, that is not the point I was making. I think something beyond the rise in background rates is involved, and that is the point I was trying to make. Mantel: Well, background rates do rise sharply with age. Maintenance of a high risk ratio and an increasing risk ratio when background rates are rising is difficult. 135 136 DISCUSSION 1V Nicholson: After age 65, the rate of increase in lung cancer mortality rates is not that great. The increase was about 25% during the years of observation in Dr. Selikoff’s study. Even in birth cohorts, none showed as much as a twofold increase in risk after age 65. Most increases were considerably less. In contrast, the fall in my figure 4 (from 6 to 2 in relative risk) is greater and confined virtually to those age 65 or older. I understand your point, but I believe much of the effect that we see is not due to an increase in lung cancer mortality rates. I. Selikoff: I want to comment on Dr. Nicholson’s presentation from a practical public health point of view. He showed that the situation was different for mesotheli- oma and for lung cancer; for lung cancer, asbestos multiplies the background risk. The background risk was different at each age, i.e., a 20-year-old had 20 years of background, a 30-year-old had 30 years, etc. Although this background risk usually included cigarette smoking, other risk factors were also in the background. At whatever point the asbestos comes into the picture, that background risk is multiplied. From a practical point of view, I am reminded of recommendations made by industry that 50-year-olds should be hired because they would not live long enough to develop cancer. It does not work that way. First of all, industry does not hire 50-year- olds. Secondly, when they do hire one, they take him/her as they find him/her with a great deal of background risk. Therefore, adding asbestos exposure at age 50 results in an extraordinary risk 5 or 10 years later; they are going to get a lot of cancer at age 55 to 60. On the other hand, mesothelioma has virtually no background risk. We just do not see it, by and large, in the absence of asbestos. For mesothelioma, the risk is pro- portionate to duration from onset, i.e., achieved age. Therefore, it is important that children at ages 5, 6, or 10 years be protected from exposure to asbestos in schools. From a practical point of view, we can apply this biostatistical information on how we handle the risk associated with asbestos. The winnowing out of the age factor that Dr. Nicholson has done is translated into important public health approaches. With regard to the question Dr. Gordis raised about the usefulness of prospective studies, and about the feasibility of biologic markers that Drs. Kotin and Chase reviewed with us, 1 believe prospective studies are useful and ob- taining biologic markers is feasible. In our current study in which we are examining 3,000 men, we are not only doing the usual procedures but also immunologic studies, B-cells, T-cells, all the T-cell markers, immunoglobulins, etc., the whole range of in vivo and in vitro immunomodification because we have to find out why some of them develop cancer and others do not. As for the hypothesis that their immunologic status plays a role, we are even defining their psychosocial stress status. From the theory that stress increases the risk of the development of cancer, we are establishing what these people are like, and we will follow them prospectively, holding everything else constant, including their immunologic status, to see if their psychosocial stress makes a difference. Incidentally, we have good information that such stress can effect immunologic status. As you know, the im- munologic status of widows, after the first 6 months of their husbands’ deaths, is significantly depressed. We are also testing nucleosides in urine. We know that changes in urinary nucleosides are substantial in the presence of cancer. We also have data on asbestos workers who do not have cancer. Will this provide us with a marker? I am not sure that a bank of 20,000 sera is necessarily the entire answer in this regard because that only tells us what these people were like at one point in their lives. They may change 1, 3, or 5 years from now and obviously the ultimate outcome of cancer or other disease may not be related to the biochemical markers that were present at the time the blood was first drawn. Therefore, 1 suggest that these cohorts have to be followed prospectively. They have to be defined, at entry, but they also have to be reexamined. We have to see serial results and if secular changes are present, which means that epidemiologists will no longer be working alone. They are going to be working with clinicians, laboratory scientists, biochemists, etc. I strongly suggest the answer to your question is not only are prospective studies useful, they are going to have to be done, but they will have to be done in a much more sophisticated way than simply characterizing a group upon admission into the study. L. Gordis: Dr. Comstock, do you want to comment on the practical side of drawing repeat samples in view of what it took to get the 26,000 sera? G. Comstock: There is no single path to glory, even in epidemiology. Again, what has to be done depends on the question being asked. If you are interested in a charac- teristic that is relatively constant throughout life in comparison with other biologic characteristics, then the blood drawn initially may be sufficient. If the characteristic of interest changes over time, repeat specimens may be necessary. In our situation, we obtained nearly 26,000 sera from a general population sample in 1974 and have been following the group to see who develops cancer. We found obtaining postdiagnostic serum specimens in a nonteaching private hospital remarkably intricate. The record room and laboratory notify us each working day of patients admitted with a diagnosis of cancer who were in the study population. Our nurse then contacts the physician to get permission to talk to the patient and obtain informed consent. She then notifies the technician who arranges for a convenient time to get the blood. There can be delays at each step, and we sometimes have to wait until the patient is discharged and at home. The cost per specimen is an expensive process, but when changes associated with the development or treatment of cancer are of interest, it seems to be the only way we can proceed. L. Garfinkel: How much does it cost to do the whole range of tests on | person? Selikoff: The expensive part is getting the patient in front of you and to get him to put his arm out for us to take blood. We draw 90 cc of blood; it is just as easy to take 90 as 10 cc, but all the work is in convincing the patient to come in and in designing the cohort. Once he is there, it is not really expensive. The immunologist used to be able to do 2, 3, or 4 immunologic studies in a day. Now with NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 DISCUSSION 1V 137 technology such as the automated cytofluorometer, they can do 50 to 60 a day. We brought the immunologist into the potential of epidemiologic studies, and this cooperation is going to increase. All the automation that is now in the laboratories will be put to use for “biochemical epidemiology.” This was not true 5 years ago, but it is now. The laboratory cost will be small in comparison to epidemiologic costs. N. Wald: I would like to make two comments. First, the question of biochemical epidemiology will be discussed later in this Workshop, but at this stage it is important to say that the approach must be specific, so that a particular hypothesis is tested rather than using a “blunderbuss” approach in which multiple tests are performed on biologic specimens which have been collected. The problem is that, in the absence of a hypothesis, it is impossible for one to assess the significance of differences in the concentration of a substance measured in specimens from cases and controls (such data simply form the basis of formulation of a hypothesis) and also multiple testing may deplete valuable specimens. Secondly, I believe that the results from Dr. Howe's prospective study on diesel fume exposure represent the first that have shown an association between such exposure and lung cancer. Dr. Howe, would you comment on this association and tell us if you think it is causal? G. Howe: I think the association between exposure to diesel fumes and lung cancer that we found in our study is an interesting one. Although it seems that it could not possibly be due to sampling error, two systematic errors have to be considered. The first is the lack of smoking data for the cohort and also indications that smoking is unlikely to have produced the observed association (e.g., the specificity of the association). One can rule out the possibility altogether of a residual confounding effect. The other potential bias is from a misclassification of exposure to diesel fumes by virtue of the job titles used as a surrogate for exposure. However, this bias would have under- estimated any real effect. Evidence from animal studies attests to the carcinogenicity of some components of diesel fuel; some other epidemiologic evidence does also, though this is weak. At this time, we do not have sufficient evidence to suggest the observed association is causal, but our study does raise the possibility which obviously needs further investigation. J. Higginson: Some assume that it will be possible to identify individual susceptibility easily for many common cancers. However, migrant studies indicate the overwhelm- ing effect of environment. Moreover, it may well be that so many factors are involved in individual susceptibility that individually they will not be identifiable and thus suscepti- bility will appear as random. The problem of measuring dose is the weakest aspect of cohort studies, i.e., the determination of what exposures actually occurred over time. Asbestos is one of the few substances we can measure because the fibers in the lung can be detected when the person is dead. Even this situation is far from satisfactory. Whenever you are talking about a retrospective cohort going back over 20 years, it becomes extraordinarily difficult to calculate or extrapolate past exposures. Another assumption frequently made is that measure- SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES TABLE 1.— Mean annual incidence of bronchogenic carcinoma in men by cell type and time per 100,000 population (Olmsted County, Minnesota)* 1955-64 1965-74 Rate No. Rate No. Rate Types of Total 1935-54 carcinoma No. No. Adenocarcinoma 50 8 22 15 6.1 27 9.2 Large cell 44 5 1.4 14 58 25 84 Small cell 40 8 21 IS 6.1 17 6.0 Squamous cell 9% 13 35 25 103 58 194 Other and 10 3 — 6 — 1 — unknown? Total 240 37 75 128 10.1¢ ~30.9¢ 43.3¢ 4 Data are modified from those in (7). b Rates were not calculated. ¢ Rates were age-adjusted to the United States white population, 1960. ment of DNA adducts is going to solve that question. A mass of somatic cells has to be examined of which only 1 cell will become malignant, and presumably there will be a background of many other adducts to which an individual is normally exposed. Accordingly, unless a person is exposed to a high level of the suspected carcinogen, you assume a correlation between finding a lot of adducts and the cancer; I am not optimistic. I suspect it is going to require more sophisticated approaches, e.g., DNA sequenc- ing or a similar technique, through which specific muta- genic change in the malignant cell is identified rather than adducts. L. Kurland: In his written text, Dr. Berg suggested that at times difficulties between the epidemiologist and the pathologist may arise because epidemiologists have a tendency to sit back and wait for alert pathologists to call new disease complexes to their attention rather than monitoring incoming data themselves. I recognize that many epidemiologists do not fully understand histologic procedures or distinctive features of histologic diagnoses. Some of us are making an effort, and I think that most of us do our best to work with those we believe are competent pathologists. Dr. Berg, because you have spent some time on lung cancer issues, I thought I might take a few moments to show you the results of a survey in Olmsted County (Minnesota) in which we studied the histologic materials (tables 1, 2). Perhaps these results may shed some light on the question of exogenous factors that might explain the trend in the histologic types of lung cancer. Firstly, Dr. Lewis B. Woolner is a pathologist who reviewed all histologic materials on lung cancer in Olmsted County collected over the past 40 years; these data are actually being updated, so that the study will cover almost 50 years. Dr. Woolner will also review these slides. The incidence rates for bronchogenic cancer in this County population over a 40-year period increased steadily until the most recent 5-year study just completed in which a slight drop is apparent compared with the previous 10 years. The rate in females which began to go up in the 1955- to 1964-period is still rising. However, what I particularly wanted to show you is the extent of informa- 138 DISCUSSION IV TABLE 2.— Mean annual incidence of bronchogenic carcinoma in women by age and time per 100,000 population (Olmsted County, Minnesota)? 1935-64 1965-74 1935-74 Age, yr No. Rate No. Rate No. Rate 35-54 7 4 7 8 14 5 55-64 6 9 12 37 18 19 65-74 8 19 12 50 20 30 275 5 22 7 37 12 27 Total 26 3 38 9 64 5 4b 10° 6b 4 Data are modified from those in (/). b Rates were age-adjusted to the United States white population, 1960. tion we have by cell types. Table 1 provides the cell types for males. Squamous cell carcinoma shows a rapid and steady rise, but the rise is not limited to squamous cell carcinoma. The other types, such as large cell, small cell, and adenocarcinoma, parallel the rise of the squamous cell cancer. This suggests that the agent (or agents) responsible is similar for all 4 cell types. The same seems to apply to the females. As you can see in the last period covered, there is a rise not only in adenocarcinoma which is the predominant form in females, but the increase is also noted for the other 3 cell types as well. I merely wanted to show this as an indication that we are sensitive to histologic needs in population-based studies. N. Petrakis: 1 have a few comments in regard to biochemical markers to which Dr. Comstock alluded. Genetic markers tend to be permanent and, depending on their expression, we probably always have them. Other types of biochemical markers present some problems that I will discuss later. They relate to possible differences in results depending on whether the analyses were made on freshly obtained specimens or were done on specimens stored for prolonged periods. Freezing and prolonged storage can have significant effects on certain biochemical compounds. This has pertinence in reference to case-control versus cohort studies of serum levels of retinol and cancer. In preparing for this Workshop, 1 found a number of case-control studies which demonstrated the association of low levels of vitamin A and cancer. Investigators made the analyses on freshly drawn serum; I was surprised that none of them ever considered that the low retinol levels might be due to secondary effects of infection associated with ulcerating neoplasms, rather than being due to a direct etiologic relationship. In contrast, in long-term cohort studies in which blood was obtained years before, the development of cancer should allow the evaluation of retinol levels long before possible effects of the neoplasm. However, the problem here is whether serum retinol is affected by long-term storage in the deep freeze, as well as the question as to the biologic meaning of the results on a single specimen of blood. S. Stellman: As Dr. Comstock eloquently pointed out, Jmany different routes lead to scientific truth. I have been involved in both case-control and cohort studies of smoking and cancer, to name an example, and always find it easier to accept an association reached by studies of different designs done among different populations by different investigators. However, sometimes results do not agree, e.g., with the relationship between lung cancer cell type and smoking dosage. In an American Health Foundation study pub- lished in 1977, Dr. Wynder and I found a 4:1 ratio between the slope of the dose response for Kreyberg type I (epidermoid) lung cancer compared with that for Kreyberg type II (adenocarcinoma). In Cancer Prevention Study I, the dose responses were the same for the 2 types. The reason for the different results may lie in differences in sources of data, histologic confirmation, and perhaps even the period when cancers occurred. On the question of doing prospective studies based on biochemical markers, we are already in a position to benefit from the most extensive and expensive such study done to date, i.e., the Multiple Risk Factor Intervention Trial. If our gathering here has any purpose at all, it is to review what we have learned from three decades or so of cohort studies, and to apply these lessons to the prevention of future disease. The most direct way by which we can measure the results of our accumulated wisdom is to perform intervention trials, of which the Multiple Risk Factor Intervention Trial is the prime example. Yet we have heard little about this particular study. I am not an expert in this area and would welcome comments from those who are. W. Haenszel: I would like to indicate one area in which cohort studies will work with regard to case-control studies of smoking and lung cancer. I believe that the weakness in case-control studies is the attempt by those performing them to time the sequence of previous events. The respondents have difficulty in providing precise information. Case-control studies do well in coming up with an estimate of risk among people who have stopped smoking. When you want to study the experience of former smokers over time, as Doll and Peto did in 1976, 1 think that is when you need the type of data you can acquire from cohort studies. S. Shapiro: No one doubts that case-control studies have contributed enormously when the concern has been with rare conditions and the size of study group as well as the need for long periods of follow-up have posed serious practical problems. I do not doubt that case-control studies will continue to be important not only for investigations of rare diseases but also for uncovering issues that should be researched through cohort studies. With respect to biomarkers, case-control studies can produce significant results, but in some situations bio- chemical changes may accompany the clinical manifestation of the disease, and this will cloud inferences that might be made. A case in point is breast cancer. Case—control studies of endocrine function were common a number of years ago; now the major inquiries are based on cohort studies. Earlier, someone commented on the need to be highly NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 DISCUSSION IV 139 targeted when we consider biochemical investigations. We went through an experience in the breast cancer project that provides an object lesson of how important this can be. We were persuaded by biochemists interested in the etiology of breast cancer to draw blood specimens during 2 suc- cessive screening examinations. This was at the risk of losing women because they had not agreed to have blood drawn when they started to participate in screening. We did it, and fortunately we did not lose women. Specimens were in storage for about 12-14 years, and, up to this point, nobody has come up with a solid proposal for using the plasma. This was not for want of our trying to elicit ideas. A number of years ago, I met with a group of endocrinologists and immunologists in the hopes of identi- fying a hypothesis that could be tested with the specimens that were drawn and stored. No consensus was reached on any idea and nothing has happened since. A lot of lessons can be learned from the Multiple Risk Factor Intervention Trial project; one of them is that randomized trials do have some value. Issues have been raised about the project that center on the nature of the analysis of results. Stallones’ recent criticism bore down heavily on the statistical approaches taken and interpreta- tion of results that depart from the rationale behind randomization. In my view, this does not diminish the importance of randomized intervention trials. I think that on the question of whether another Framingham-type study should be initiated, we are dealing with competition for resources which, in a community laboratory study, would be large. If we had the resources, without cutting back in other areas, I think it would be worth starting another study. One reason is that risk factor relationships may be undergoing significant change. We do not know, at this point, whether the incidence of coronary heart disease has been changing and, if so, what the responsible factors are. Actually, although it is well established that mortality from coronary heart disease has decreased, we do not know to what extent this is due to changes in incidence and how much credit should be given to treatment. Knowledge about changes in tobacco con- sumption or cholesterol level does not lead us directly to an estimate of change in incidence. Other environmental and physiologic variables may be undergoing change, and the weight of their associated risks for coronary heart disease may be altered. G. Chase: I think your point on case-control and cohort investigations is an excellent one. There are times when it has been proposed, particularly in the observational setting, that a cohort study should be done first and then a follow- up of any significant leads with a case-control study. I have also seen the reverse proposed when a case-control study should be done first, and then, if findings are pertinent, perhaps a cohort analysis on the data is recommended. Situations arise that permit a case for the logic of either one of those approaches. I raise a flag for caution that statistically or analytically, we want to avoid duplicating the same material when we are using the same cohort for the second or follow-up investigation. I do not have any blanket answers for that. We have heard about some of the innovative uses of the limited data that are available from past working cohorts. SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES At least in the future, I think we are going to see the availability of much more complete data in the occupa- tional setting due to the new technologies now available for capture and storage. We are now moving into a time when, at least from now on, we will be capturing the laboratory data in machine readable form directly from national laboratories, and, in fact, we are into that mode already. I am referring to such techniques as capturing data from on-line lung function testing directly, much more complete and comparable job codes than formerly, more available industrial hygiene data, etc. All these will be or are currently linked in some areas and available for investigators. These, at least for the future, will present a lot more opportunities and also perhaps provide an innovative choice of controls. When different industries record com- parable data, they provide an opportunity to trade employee information for purposes of collecting a control population for study. We are entering into a broader area here. Dr. Kurland, as the cell types were changing through time, what were the relative percentages? Was the relative percentage holding or were squamous cells increasing in the same way as in the absolute rate? Kurland: I cannot recall; it is my impression that the numbers are small in the 1935- to 1944-period. However, I believe the change over time is fairly uniform by cell type. Nicholson: I would like to comment on a remark Dr. Chase made about the availability of extensive health and exposure data that would be useful for evaluating future disease risks. Could you expand on that because one of the issues that often arises in a health hazard evaluation is the lack of access to some of the existing industrial data? The collection and availability of it in the future, as you suggest, would be of great benefit. What do you view the appropri- ate role of industry to be? Chase: First of all, this is my personal view. I think that one of the advantages of those data being available is to free the investigators from spending so much time on the “sleuthing mode” and getting them more into the “analyti- cal mode.” They have always had the wherewithall to go into more depth if they were concerned about the complete- ness of their data, e.g., verification of the completeness of the cohort. All those tools that are present today would be available, but the data would be there for more analytical work without their having to retrieve them. I see that aspect as helping the whole investigative mode. A. Lilienfeld: When comparing case-control and prospec- tive studies, it is important that you keep in mind that in case-control studies, one is dealing with survivors of those individuals who had been exposed or not exposed in the past, probably many years in the past. It may well be that the factor under study may have increased the mortality in the exposed group and, therefore, in a case-control study of the survivors, the relative risks will have been underestimated. Whether one decides to conduct another Framingham- type study will depend on the level of knowledge of the disease with which one is dealing. When the Framingham study was started, investigators were limited to test those hypotheses that were based on the existing level of 140 DISCUSSION 1V knowledge with regard to coronary heart disease. The only way this end point could be ascertained was by clinical examinations. We now know that the incriminated risk factors only explain about 35% of the disease. However, if anyone has any good ideas for explaining the remaining 60-65% of coronary heart disease, it may be worthwhile to initiate another Framingham study. In addition, it may be desirable that we try to determine the risk factors for arteriosclerosis; if noninvasive tech- niques become available to measure this end point, it may be worthwhile to conduct a Framingham-type study to ascertain the relationship of those risk factors to arterio- sclerosis. However, we may be able to do this by case- control studies. Howe: I think the close relationship between the cohort and case-control designs is practically as well as theo- retically important. One can regard the subjects in a case-control study as being samples from an underlying cohort with various exposures; the data obtained from those subjects are used in the estimation of the prevalence of exposure in the cohort immediately before the develop- ment of the index case or cases. One should distinguish between situations in which that cohort can be immediately identified and those in which it is a purely theoretical point. In the former situation, the problem of selection bias is essentially overcome and this is, of course, one of the major problems with the conventional case-control study. For example, we have used this approach in a cohort study of 30,000 female tuberculosis patients exposed to various levels of fluoroscopy. We have traced and interviewed patients with breast cancer in one particular Canadian province and have ascertained and interviewed controls from a random sample of the remainder of the cohort in that province. I am not sure that I altogether agree with Dr. Lilienfeld about the importance of the prevalence bias in regard to cancer. For those cancers, such as breast or bladder cancer with a reasonable survival rate for patients, the study should be designed to reach all patients soon after diagnosis and thus few should be lost due to death. With cancers with a high fatality rate, such as lung or stomach cancers, I think it unlikely that etiologic factors will have more than a marginal effect on the survival; so again the prevalence bias is unlikely to be a serious problem. Lilienfeld: I think it is important to point out that in many prospective studies, investigators have started with people in the 25- to 35-year age group. However, it is quite possible that the factors of etiologic importance with respect to the disease of interest may have started much earlier. Perhaps it may have been desirable to have started with a 13-year-old group, if one wanted to study hormonal factors. Mantel: I think no conflict exists between prospective and case-control studies as to which should be conducted. It depends on the logic of the situation at the time. However, if you did do a prospective study, the proper method of analysis for it should take the passage of time into account. One of the conditions is who has survived up through so many years from the beginning, i.e., the same issue of survivors that Dr. Lilienfeld raised. Also, if you were doing a case-control study, you would want to take the passage of time into account, and this can be done. You stratify on age, so that the question of age and surviving with age would be much the same. Now I want to come to the question of looking for susceptible individuals as opposed to looking for cause of disease or cancer. If we look at our experience in the laboratory, we find that for many agents it is not the case that one animal is susceptible and another is not. It may be just a matter of tolerance thresholds. I understand that mosquitoes will not bite some individuals. This gradation of sensitivity does exist. I think it would be a mistake to try to identify susceptible populations rather than causative agents, and I can see one great disadvantage to it. The tobacco industry could refuse responsibility for your lung cancer. You are just a susceptible individual. You should have known better than to smoke. Gordis: 1 would like to comment on susceptibility. We should look carefully at people who are heavy smokers but who do not develop lung cancer and try to identify the protective factors that are operating. In occupational settings, certain industries may not be able to lower toxic exposures to a safe level for the entire employed population. Therefore, we should somehow try to identify people who are susceptible and provide appro- priate counseling and special protection. Higginson: You can reduce the dose of a carcinogen in inbred mice so that only 10% get cancer. If you cannot distinguish these animals from noncancerous animals in which you can analyze the animals in depth, I doubt that it can be assumed that you will be able to do so in humans easily. I think we should have at least some biologic hypothesis for susceptibility before conducting surveys or analyses without specific goals in mind. The second point that worries me a little bit is the assumption that prospective “fishing-type” cohort studies will identify many new carcinogenic risks based on the success of retrospective and prospective cohort studies. I know of no historic support that prospective cohort studies have demonstrated new carcinogenic factors that were not suggested by retrospective cohorts and case history studies. I am open to correction. Wald: The question of susceptibility is an interesting one, but the term “susceptibility” is used by different people in different ways. For example, some people refer to dif- ferences in the way people smoke (e.g., the depth of inhaling) or differences in their diet (possibly their con- sumption of vitamin A) as affecting their susceptibility to lung cancer, whereas others restrict its use to differences in predisposing genetic factors that affect the risk of lung cancer. I suggest that it is best to use the latter meaning and simply refer to such variables as method of inhaling as factors which influence the dose of carcinogen and refer to such factors as diet as alternative environmental causes of lung cancer that may interact with smoking in a particular way. Used in this way, one might say that a person is susceptible if he is genetically predisposed to a disorder and liable to be at risk if exposed to a particular environmental agent. Clearly, such an interaction is involved in the etiology of coronary heart disease, in which tobacco NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 DISCUSSION 1V 141 smoking, diet, and hypertension interact with a genetic susceptibility. Another example of susceptibility is a disorder, such as phenylketonuria, which might be regarded as a completely genetic condition, but if an affected individual was not exposed to phenylalanine (or only the small quantities essential for life), the individual would never get the disease, so from this point of view the disorder can be regarded as having an environmental etiology. Clearly, phenylketonuria is a disorder due to exposure to an environmental agent in a susceptible individual. Having defined susceptibility precisely, then our trying to identify groups who are particularly susceptible to de- veloping a particular disorder (although I would suggest that in this pursuit it would be essentially the identification of genetic subgroups) is better explained by using genetic markers than by alternative methods such as seeking to identify a bimodal distribution of disease risk factors in the population. J. Stellman: The specificity and predictability of these tests is the basic question. Among coke oven workers, the strongest correlate to lung effects was years on the job in coke ovens and not the genotype or phenotype of the different workers who were studied. In the best of all possible worlds, one would of course want to be able to figure out who is best suited for which job and try to mold that accordingly. We are far from the best of possible worlds. When I hear a statement that someone has to accept the risk, I begin to wonder and worry about how we make the decision on who that SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES someone is, particularly when workers do not make these decisions with total freedom of choice. A worker’s decision to accept risk is made in the face of high background levels of unemployment and also without adequate knowledge of risk. This is particularly frustrating in light of all the insurance data and other data that exist that researchers can use. I am sure several of us have been in situations in which corporate medical directors who want to do studies have come to us, but the industrial relations department sees to it that those studies are not funded even though the whole medical staff wants to do that study. I think that both the absence of good tests and the social realities about how decisions are made and how informa- tion is dispensed militates against use of genetic or other selection processes. Gordis: I am saying that one approach to environmental hazards is trying to identify susceptibles. 1 think this approach does not exclude any other. You may find a situation in which you can reduce risk or exposure, or both, for 100% of the population. Then you may find a situation in which this is not possible so you try to identify susceptibles. REFERENCE (1) ANNEGERS JF, CARR DT, WOOLNER LB, et al: Incidence, trend and outcome of bronchogenic carcinoma, Olmsted County, Minnesota 1935-74. Mayo Clinic Proc 53:432-436, 1978 SESSION V Data Analysis in Cohort Studies Chairman: Steven D. Stellman Co-Chairman: David Schottenfeld Chairman’s Remarks ' Steven D. Stellman? A point comes in every study when the problems of design and conduct have been tackled and solved and the data are nestled snugly in the computer. Epidemiology textbooks give some clues about what types of analyses can be performed, but what happens most often depends more on the creativity and ingenuity of the investigator, simply because studies differ so much from each other. Given the interrelationships between variables; known, suspected, and unknown sources of confounding; and the unique problems which arise in real life, the possibilities for analysis are endless. Even a small study (few cases, few variables) can lead an investigator in -an endless study of relaticnships between variables. The problems and pitfalls of data analysis are the subject of this session. Most of you are familiar with CPS 11, the new American Cancer Society prospective study which was launched in the fall of 1982. Few of us could name scientific studies done in the past that worked so well and proved so useful that we would do them in exactly the same way today if given the opportunity. Yet the most remarkable thing about CPS II is the care we have taken to conduct it in a manner nearly identical to the way in which Dr. Hammond ran the original study 20 years earlier. Of course, we have much more sophisticated electronic gadgetry with which to organize and sift the data, but the design and the use of volunteer researchers as the backbone of data collection remain practically unchanged. This new study is at once a tribute to Dr. Hammond’s masterful organization of the first study and a reminder that sound scientific ideas age well. However, the changes in our concepts and the complexity and sheer quantity of new hypotheses require us to exploit to the greatest extent possible the latest analytical methods as well as computing machinery. Table 1 presents a sampler of multivariate methods that have been used in recent cohort studies and are likely to be used more often in the future. To begin, all studies require standardization of some kind, even if only on age. The Mantel-Haenszel method is a technique we have learned from our earliest schooling. Many of the other methods you see here are currently used. Toward the bottom I have listed some less frequently used methods, but even these are turning up more often in the literature. As we begin to accumulate data on CPS-II, we are going to experiment more and more ABBREVIATION: CPS II=Cancer Prevention Study II. "Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Epidemiology and Statistics Department, American Cancer Society, 4 West 35th Street, New York, N.Y. 10001. with these techniques, in the hope that we will get to understand our data better. One advanced technique we have been considering is factor analysis. This is a fairly complex procedure which so far has not been used as much by epidemiologists as by psychologists. I think it will prove valuable in the future, particularly in dietary studies, in which many highly intercorrelated items are measured for the same individual. I do not like to use a mathematical method, even from a canned program like the Statistical Package for the Social Sciences, unless I understand the mathematics involved, so I started to read a standard textbook on principal components analysis and axis rotation, a common vari- ant of factor analysis. The concept is summarized in the equation: % v = A v, where X is a variance-covariance matrix, A is an eigenvalue, and v is its associated eigenvec- tor. I did not understand this in the context of biosta- tistics, but I knew I had seen this equation before. On further reflection, 1 realized I had seen it when I was working as a statistical thermodynamicist. Any physical chemist would recognize this as a form of the Schrodinger equation: HW = EV, where H is the Hamiltonian operator for a system, E is the system’s energy, and V¥ is the wave- function for the system. This equation, when it can be solved, yields the physical properties of any physical or chemical system. It happens to be an equation which 1 solved thousands of times (by computer) in my doctoral work. Suddenly, I found myself looking not at a strange biostatistical equation but at an old friend. This old friend of an equation has been helpful in analyzing our new study data. One of the main study hypotheses in CPS II concerns diet. We asked our subjects the number of times per week they ate 28 specific food items, such as beef, pork, chicken, green leafy vegetables, squash, etc. Diet is such a complex topic that 28 items probably are not nearly enough, but it is about the maximum one can ask about and reasonably expect reliable answers in a questionnaire the size of ours. Factor analysis seemed an ideally suited data reduction method for grouping these 28 items into a smaller number of dimensions for meaningful analysis in a way that might ‘make physical sense. Our strategy was to do a principal component analysis on these 28 items to obtain a smaller number of dietary “factors” or dimensions, then to com- pute distributions of these factors or scores, and divide people into high, medium, and low exposure categories. The analysis presented in table 2 is a sample of about 5,000 men, which represents an extremely small portion of our data. Factor analysis of the dietary items for this group of men produced a list of which items are correlated with which others. These have been grouped to form 7 factors. In the middle column of the table, under the heading “Items 145 146 S. STELLMAN TABLE |.—Sampling of multivariate methods used in cohort studies TABLE 3.— Relative odds for scoring “high” on 7 0 Simple stratification 1 Mantel-Haenszel method 2 Discriminant analysis and its offspring Multiple logistic risk model (Framingham Study) Logistic regression Confounder score 3 Log-linear modeling; analysis of information (Kullback- Cornfield) 4 Less frequently used but promising Factor analysis Cluster analysis Canonical correlation Multiple analysis of variance included,” are the 7 groups of food items. Remember, these groupings were “found” entirely by computer algorithm and were not predicted in advance. Upon inspection, though, each group appeared to have a logic to it. I have given names to these 7 groups in the right-hand column. The first factor which explains almost 209% of the variance in the data consists of green leafy vegetables, raw vege- tables, tomatoes, carrots, cabbage, etc. I looked at these and decided they were “vitamin-rich foods” because they are high in vitamin A and ascorbic acid, 2 vitamins commonly associated with cancer prevention. The second independent dimension included pork, hot dogs, sausages, eggs, ham, smoked meats, and beef. I thought it would be logical to call it “high fat meat and eggs.” The third dimension, which explained the next highest proportion of the variance, was chocolate and ice cream. I could think of no better name for this factor than “dessert.” The fourth was oatmeal, shredded wheat, cold cereals, bran, and corn muffins; I called it “breakfast.” This is not a facetious appellation. You may be aware that certain studies of longevity and life-style have implicated eating breakfast regularly as a facfor in prolonged life. The next item was fish, liver, chicken, and pasta. Except for the inclusion of pasta, which I do not understand, these comprise “high protein-low fat meats.” The sixth factor is white bread and rolls, brown rice, whole wheat, barley, and dietary factors” Fact Relative odds: Retor Smoking category Current Pipe or No. Name aoker Ex-smoker cigar 1 Vitamin-rich foods 0.56 1.06 0.95 2 High-fat meats and 1.60 0.95 0.95 eggs 3 Dessert 0.64 0.67 0.58 4 Breakfast 0.33 0.72 0.68 5 High-protein meats 0.95 1.07 1.08 6 Carbohydrates 1.55 1.12 1.07 7 Spreads (margarine +, 1.25 1.29 1.21 butter —) 9 Relative odds for a nonsmoker = 1.00. potatoes. These are clearly “carbohydrates.” The last factor consists of 2 food items highly correlated with a negative sign: butter and margarine; I call it “spreads.” My analysis is not much deeper than this so far, and I am looking for ideas on how to proceed. One point that is evident is that each of these 7 dimensions is powerfully correlated with background variables. 1 have chosen just one to display: educational attainment. We ranked subjects in figure 1 according to their score on factor 1 (vitamin-rich foods) and compared the distribution of this score between groups of men with different educational levels. The first important observation is that of the people with an eighth- grade education or less, over one-half scored low on the vitamin scale. People in the highest educational group scored in the highest third of vitamin-rich foods. This pattern is repeated over and over for different food factor scores, when correlated with different background variables. The importance of this observation cannot be emphasized too strongly. It colors our entire analysis and the way we think about our data. Although it is difficult to articulate, perhaps the best way I can describe the concept is to point out that nobody is exposed to only one thing. Life-style exposures tend to run in groups. A broad dimension is best described as general health behavior. People who eat large TABLE 2.— Principal component analysis (varimax rotation) of 28 dietary items Factor Percent of Cumulative N 3 : Items included Name o. variance variance 1 19.6 19.6 Green leafy vegetables, raw vegetables, Vitamin-rich foods tomatoes, carrots, cabbage, citrus fruits and juices, cheese, squash 2 8.4 28.0 Pork, franks, sausage, eggs, ham, High-fat meat and eggs smoked meats, beef 3 5.6 33.6 Chocolate and ice cream Dessert 4 4.7 38.3 Oatmeal, shredded wheat, cold Breakfast cereals, bran and corn muffins 5 4.6 42.9 Fish, liver, chicken, pasta High-protein-low fat meats 6 4.1 47.0 White bread and rolls, brown rice, Carbohydrates whole wheat, barley, potatoes 7 3.8 50.8 Butter and margarine Spreads NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 CHAIRMAN'S REMARKS 147 v | =u 26.7 > | LOW 35.0 404 y 51.7 464 - i fe on . 39 | MEDIUM 4 ae 339 . w _ ~~ FIGURE 1.—Percent of males scoring o aE : ; en y 327 y high, medium, or low on vitamin-rich a a2 363 food scale according to their educa- a a tional attainment. HS=high school; i GRAD =graduate. -— ’ 462 | HIGH ~ 39.4 LN 322 22.4 233 15.9 TO 8th SOME HS SOME COLLEGE GRAD N= GRADE HS GRAD COLLEGE GRAD SCHOOL 6254 (5.0%) (7.3%) (19.3%) (27.1%) (18.5%) (22.9%) (100%) quantities of vitamin-rich foods also tend to be non- smokers, they exercise more than others, see their doctors more frequently for check-ups, and generally tend to be healthier. This is not a new or startling concept, but use of the multivariate analysis in this way emphasizes the interrelationship and interpenetration of these variables. As a last example, table 3 shows the calculated relative odds for scoring high on each of the 7 dietary factors according to smoking categories. Current smokers had SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES relative odds of 0.56. That is, they consumed one-half as much vitamin-rich food as do nonsmokers. They consumed high fat meat and eggs 60% more frequently than non- smokers. Other food factors showed similar effects. If 1 could draw a single substantial conclusion from these data, it is this: One dare not attempt to study dietary factors in the causation of tobacco-related cancers (lung, larynx, etc.) without explicit adjustment for smoking, which is a powerful confounding factor. Multivariate Cohort Analysis 2 Norman Breslow * * ABSTRACT —Modern methods of categorical and survival data analysis are usefully applied to the multivariate analysis of follow-up data that arise in epidemiologic cohort studies. They provide a formal basis for extending analyses based on the standardized mortality ratio into the multivariate domain so as to permit simultaneous consideration of such risk factors as age, duration, and intensity of exposure; age and calendar year of follow-up; and personal characteristics. Analogous methods are available that control for demographic variables internally, without reference to vital statistics or other standard rates. Various model structures allow for the effects of different variables to combine in an additive, multiplicative, or mixed (additive relative risks) fashion. Illustrative analyses are provided of the relationship between respiratory cancer mortality and arsenic exposure in a cohort of Montana smelter workers. — Natl Cancer Inst Monogr 67: 149-156, 1985. Standardization of rates is the epidemiologic technique traditionally used in analyses of mortality data from long- term follow-up studies of exposed populations. Calculation of the SMR relative to general population rates for each cause of death may identify I or more diseases as possibly related to the exposure. Comparison of 2 or more SMR or directly standardized rates for these diseases among dif- ferent subgroups defined on the basis of exposure can then help in establishment of a dose-response trend or give other evidence for a causal effect. These traditional techniques are limited, however, in the extent to which they can disentangle the effects of a number of different risk factors in a multivariate framework. Directly standardized rates have large variances when the number of strata used for adjustment becomes so large that only a few deaths occur in each one. Although SMR tend to be more stable numerically, doubts may arise about their comparability when the demographic factors used in their calculation are confounded with the particular exposures being analyzed. My objective is to describe how biostatistical methods developed in recent years for the multivariate analysis of categorical and survival data may be applied to cohort ABBREVIATIONS: SMR =standardized mortality ratio(s); exp =ex- pected. ! Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Supported in part by Public Health Service grant 5-K07- CAO00723 from the National Cancer Institute. 3 Department of Biostatistics SC-32, University of Washington, Seattle, Washington 98195. 4The technical assistance of Mr. B. Langholz is gratefully acknowledged. investigations in epidemiology. Advantages of this methodology include the ability for one to summarize succinctly the joint effects on mortality of multiple risk variables and to adjust the effects of particular exposures for a multiplicity of confounding factors. The construction of explicit quantitative models facilitates extrapolation of the results into the low dose range, as is required for risk assessment and the setting of industrial standards. Testing the goodness-of-fit of such models enables one to select from among competing summary measures those that give the best description of the results or to identify circum- stances in which any simple summary is likely to be misleading. Follow-up data for 8,014 male workers at a Montana copper smelter (/, 2) illustrate the discussion. All workers were employed for at least 1 year and were on the payroll sometime between 1938 and 1956; the follow-up period extended from 1938 through 1977. The largely descriptive analysis is oriented toward determination of whether airborne arsenic exposure is related to mortality from respiratory cancer. Brown and Chu (3, 4), using much the same data, have attempted the more ambitious task of inferring which stages of a putative multistage carcinogenic process are affected by the arsenic exposure. SELECTING VARIABLES FOR ANALYSIS An understanding of how cause-specific mortality rates vary among different population subgroups and over time is essential for the design and analysis of epidemiologic studies. It helps to ensure that measurements are taken on relevant variables and that proper adjustment is made for their confounding effects. Quantitative study of the effects of such variables may yield important clues about the natural history of the disease process. We mention here a few critical variables for occupational carcinogenesis. Similar lists may be drawn up for other problems. Age at Follow-up Incidence rates for carcinomas at major anatomic sites increase in proportion to the fifth or sixth power of age. This may reflect the fact that age is a surrogate for cumulative nonspecific exposures rather than any change in host susceptibility due to the aging process itself. Calendar Year at Follow-up In the United States, strong secular trends are evident in rates of mortality from lung cancer (up), stomach cancer (down), and cardiovascular disease (down). Presumably, these are related to changes either in life-style (smoking, diet, exercise) or the general environment. When a cohort study is conducted over a long period during which 149 150 BRESLOW exposures are changing, it is important for one to consider calendar year as well as age as a potentially confounding factor. This is the approach taken in our analysis of the Montana smelter data. Age at First Exposure Variations in mortality rates according to age at exposure have implications for possible causal mechanisms. If the exposure under study affects an early stage of a multistage carcinogenic process, the excess incidence will be less dependent on the age at which it occurs than if later stages are involved (5). Hormonally dependent or developing tissues may be especially susceptible to malignant trans- formation at certain ages. Calendar Period of First Exposure Because of changes in the workplace over time, the calendar period of an industrial exposure may be strongly correlated with its intensity. Table 1 shows strikingly different SMR for respiratory cancer relative to white males born in the United States among Montana smelter workers employed before and after 1925, which was the year in which introduction of a selective flotation process supposedly reduced airborne arsenic concentrations (2). Time Since First Exposure Inasmuch as most chronic diseases require a latent period before the effect of exposure on risk becomes manifest, the first 5 or 10 years of employment are often excluded from the observation period in industrial cohort studies. In our example, SMR are not notably elevated until 30 years after first employment (table 1); however, the effect is probably secondary to that of period of hire in this instance. The excess incidence of leukemia following point exposures to radiation, notably among A-bomb survivors or patients treated for ankylosing spondylitis, increases for a few years after exposure and then declines. On the other hand, relative risks for solid tumors at “radiosensitive” sites take longer to reach a maximum and may stay elevated indefinitely. Such patterns are again indicative of different stages of activity in a multistage process (5). Time Since Last Exposure Excess disease rates may continue to rise, remain constant, or fall rapidly following cessation of exposure. The pattern has obvious implications for disease control measures as well as causal mechanisms. However, inter- pretation of mortality data from industrial studies in relation to termination of employment may be particularly difficult due to the influence of health status on the decision to terminate. Intensity of Exposure Unfortunately, good quantitative measures of dose are rare in epidemiology. A smoker’s recall of his smoking history, an A-bomb survivor’s recollection of location at the time of the bomb, and dosimetry badges worn by radiation workers provide some of the best data, imperfect as they are. Measurement of exposure in industrial studies, often based on the history of work in different areas, is TABLE |.— Variations in respiratory cancer SMR among Montana smelter workers Factor analyzed Level No. of deaths SMR (X100)” Test of significance Period of first employment 1885-1924 115 362 xi=139.5 1925-55 161 164 (P < 0.0001) Age at hire, yr <24 69 255 X3=52 25-34 116 222 (P <0.07) 35+ 91 184 Birthplace United States 198 180 x= 285 Foreign 80 381 (P < 0.0001) Yr since first employed” 1-14 101 165 X3=24.0 15-29 59 185 (P < 0.0001) 30+ 116 315 Yr since last employed” None 110 230 x3=32 0*t-9 84 227 (P =0.20) 10+ 82 181 Arsenic exposure” Light only 153 160 xi=44.4 Moderate 91 339 (P < 0.0001) Heavy® 32 434 Age at follow-up, yr 40-49 21 166 x3=25 50-59 80 199 (P < 0.48) 60-69 117 228 70-79 58 223 Yr at follow-up 1938-49 34 403 xi=284 1950-59 65 294 (P < 0.0001) 1960-69 94 211 1970-77 83 151 “ SMR was calculated with reference to United States mortality rates for white males by age and calendar year. ® Time-dependent exposure variable lagged 2 yr (see text). ¢ Employees worked in moderate or heavy arsenic exposure area for at least 1 yr. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 MULTIVARIATE COHORT ANALYSIS 151 usually much cruder. Sample measurements may have been taken from each area long after the fact and can only give a rough indication of personal exposures. This is true of the data available to us for the Montana cohort that consisted of the work history and a classification of work station into “heavy,” “moderate,” and “light” arsenic exposure levels. Despite these limitations, workers who spent some time in moderate-to-heavy exposure areas appeared to have a substantially higher respiratory cancer risk than those who did not (table 1). Personal Characteristics Data on genetic and behavioral factors may help us determine which individuals are at high risk from industrial exposures. Smoking histories are of particular importance for studies of lung cancer. Although they have been obtained for a sample of the Montana workers (6), these data were not available to us. However, the fact that foreign-born workers had higher respiratory cancer rates than those born in the United States (table 1) could well be related to different smoking patterns. Problems of Interpretation The analysis of multiple risk factors may be seriously complicated by correlations among the variables, especially those that depend on time. For example, the persons still being followed during the later years of the Montana study were on average older, had been hired earlier, were terminated for a longer period, and were more likely to have had a history of moderate-to-heavy arsenic exposure than those under observation during earlier years. Further- more, because continuation of employment in 1938 was a condition of entry into the study even for those hired earlier, the correlation between period of hire (pre- or post- 1925) and duration of time since first employment was particularly strong. Although joint consideration of such variables in a multivariate framework is essential if one is going to try to disentangle their separate effects, strong correlations may mean that a completely unambiguous interpretation is not possible. As an illustration of the different perspective given by a multivariate over a univariate analysis, table 2 presents an analysis of variance of SMR for the Montana cohort according to period of hire, duration of time since first employment, and level of arsenic exposure. The change in SMR with time since hire is largely explained by the effect of period of hire and the strong correlation between the 2 variables. However, period of hire and level of arsenic exposure have independent effects. Additional analyses identified birthplace and calendar year as also having independent effects on the SMR. Similar conclusions hold when the data are analyzed by other methods that make no reference to standard mortality rates. Interpretation of cohort studies with mortality as an end point is also complicated by the “healthy worker” selection phenomenon. Because a person’s health influences his/her acceptability for employment, the likelihood of a job transfer, and the decision to terminate, many of the time- dependent exposure variables mentioned above are cor- related with an unmeasured variable, i.e., health status, that SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES TABLE 2.— Analysis of variance based on a multiplicative model for SMR: Respiratory cancer deaths in Montana smelter workers > Source of variation” Degrees of freedom Chi-square Period of hire and 3 41.0 duration of employment Period alone 1 39.5 Duration after period 2 1.5% Duration alone 2 24.0 Period after duration 1 17.0 Period and arsenic level 3 77.6 Period alone 1 39.5 Arsenic after period 2 38.1 Arsenic alone 2 44.4 Period after arsenic 1 332 4 See table 1 for definition of factor levels. b Value is not statistically significant; all others have P <0.0001. is clearly related to the mortality end point. During the first few years following entry into a new exposure category, a worker may be at lower risk of death than someone with less exposure simply by virtue of the fact that he/she was necessarily still employed at that time and therefore presumably healthy. Unfortunately, there is no completely satisfactory way for anyone to resolve the problem. For a rapidly fatal disease like lung cancer, one could try to compensate for the effect by lagging the time-dependent exposure variables by 1 or 2 years, so that deaths and person-years are classified in the category in which disease onset occurred, before the symptoms of illness mandated a change in job status (7). We have adopted a lag period of 2 years for time-dependent exposure variables in our analysis of the Montana data. GENERALIZED LINEAR MODELS The object of a multivariate analysis is a description of the joint effects on disease incidence or mortality of several different risk factors. This is easily accomplished with the use of generalized linear models as described by McCullagh and Nelder (8). We denote by z a p-dimensional (row) vector of regression variables. Some of these including age, duration of employment, and cumulative exposure to a specified agent depend on a time variable z Statistical interactions between 2 or more different risk factors are represented by cross-product terms. The defining characteristic of generalized linear models is that the effects of the regression variables are expressed through the linear predictor zf, where f is a p-dimensional (column) vector of unknown regression coefficients. Sta- tistical inferences regarding the effects of different risk factors are made by estimation of the unknown B param- eters and testing whether some of them may equal zero. As illustrated in table 2, one typically fits a semihierarchical series of models of increasing complexity, comparing goodness-of-fit measures to determine which variables are most appropriately included in the final equation. Additive Versus Multiplicative Effects The next step in constructing the model is for one to specify how the linear predictor is related to the disease 152 BRESLOW rate. Denoting by A = A(7]z), the rate at time ¢ for someone with risk variables z, the model is defined by the equation h(M\) = zB. The function A that relates the linear predictor to the rates of interest is known as the “link function.” Its role is most easily appreciated if the regression variables are divided into 2 components z = (zy, z;), such that z represents base-line factors (age and calendar year) related to the spontaneous cases, whereas z, represents the exposures of special interest. Likewise, the regression coefficients are partitioned into sets 8; and B;. A third set of variables may sometimes be needed to represent the interactions between exposures and base-line factors, al- though this can often be avoided by selection of an appropriate link. Two common specifications for the link function are the identity (A = A) and the log (A = log A). The first describes an additive model in which the effect of the specific exposures is to add an excess risk z;8; to the background rate z; 3. Under the log-linear model, the background rates exp (z;B1) are multiplied by the relative risk exp (2,8). Which of the two structures is more appropriate depends on assumptions made regarding the nature of the disease process. For example, with Rothman’s (9) component- sufficient cause paradigm, “independent” factors or those that contribute to different disease pathways have effects that combine in a nearly additive function, whereas “complementary” factors or those that contribute different parts to the same pathway have close to multiplicative joint effects (1/0). On the other hand, the multistage theory of carcinogenesis leads to additive structures if the 2 risk factors affect the same stage of the process and to multiplicative structures if distinct stages are affected (7/17). Several authors have considered the problem of dis- criminating between additive and multiplicative models using epidemiologic data (/2-14). A promising approach is to imbed them in a more general family of link functions that includes both as special cases. Thus Aranda-Ordaz (15) suggests the family of power transformation A = (A\%—1)/ 6, which yields the identity link at 6 = 1 and the log link in the limit as 6 — 0. However, it often happens that the data are insufficient for one to make a clear choice between additivity and multiplicativity. Additive Relative Risk Functions A mixed model with both additive and multiplicative components has been used by Thomas (/3) and Berry (/6) among others. Here the disease rates are assumed to satisfy A(t|z) = exp (z181){1 +228}. Because 2 linear predictors, z1B1 and z282, are involved, this model has a composite link function (/7). Its defining feature is that the relative risk is a linear function of the exposure variables. Composite and linear models may be considerably harder to fit in practice than the log-linear model, due to irregularities in the likelihood surface. Moreover, the usual SE do not adequately measure the uncertainties in estima- tion of the B coefficients, and tests of significance are therefore best performed with the use of the likelihood ratio criterion (18). A recent analysis (/9) of lung cancer mortality in the British doctors’ study (20) illustrates some of these points. They found that the effects of age and cigarette smoking definitely combined multiplicatively rather than additively. When appropriate transformations of dose (No. of cigarettes/ day) were used, both additive and multiplicative relative risk functions provided adequate fits to the data. However, the likelihood function for the additive relative risk model was highly skewed, and the SE estimated from this model gave a falsely negative impression of the statistical significance of the smoking effect. Incorporation of External Standard Rates So far in discussing model building, we have assumed that the effects on mortality of demographic variables (sex, race, age, calendar year) are determined from the cohort data themselves through estimates for the B; coefficients. However, if the study is small one may have doubts about the wisdom of trying to estimate so many regression coefficients from a limited amount of data. One is tempted to assume that the background rates for cohort members follow the same pattern of demographic variation as for the nation or region where the study is conducted, so that only the exposure effects need be explicitly considered. Our comparison of SMR in different subgroups of the Montana cohort (tables 1 and 2) is one example of such an analysis. A potential advantage of known background rates is an increase in the precision of the estimated regression coefficients for the exposure variables. Under the multi- plicative model, the efficiency gain is theoretically greatest when the exposures are strongly associated with the demographic factors (27). However, the gain is often slight in practice. A major disadvantage with this approach is the possible bias in the estimation of exposure effects when, in fact, the standard rates do not apply to cohort members in the manner assumed by the model. Yule (22) pointed out long ago the dangers inherent in a comparison of SMR between different occupational groups. The problem arises precisely in those circumstances when the SMR are themselves not good summary measures, i.e., when the ratios of cohort to standard rates depend strongly on age or the other demographic variables. Because it is then possible for the ratio of 2 SMR to lie outside the range of ratios of the age-specific rates for the subgroups being compared (23), verification of goodness-of-fit when incorporating standard rates into the model is particularly important. Operationally, external rates are introduced as an additional regression variable whose coefficient is fixed at unity rather than being estimated from the data. The additive model becomes A(1|z) = A*(1) + a + zp, where A*(t) represents the known background rates. In the language of generalized linear models, A*(7) is known as an “offset.” For the multiplicative model, we have log A(1|z) = log A*(1) + a + zp, so that the log standard rates offset the linear predictor. In this model, @ may be interpreted as the log SMR for someone with zero covariables (z = 0), and the B coefficients represent changes in the log SMR associated with different exposures. Construction of Exposure Functions The range of possibilities for construction of regression variables representing different aspects of exposure is NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 MULTIVARIATE COHORT ANALYSIS 153 virtually without limit. Suppose that ¢(#)dt is the measured exposure received during the time interval from 7 to t+dt. Then the integral z(t) = ; c(u)w(t—u)du represents a type of time-weighted cumulative exposure, where w(*) expresses a variable latent interval between exposure and effect. This approach has been used by Lundin and associates (24) in their study of Rocky Mountain uranium miners and by Berry et al. (25) in their investigation of workers in an asbestos textile factory. Sometimes it is possible for one to estimate charac- teristics of the latent interval by comparing the goodness- of-fit achieved with various specifications for w. As an illustration, table 3 shows log-likelihood statistics for the multiplicative model when fitting separate heavy and moderate arsenic exposure variables to the Montana cohort. Each exposure is weighted by a log-normal density having a specified mode and coefficient of variation. The results seem to indicate that the effect of arsenic on lung cancer mortality is closely concentrated around a point 20 years from exposure (fig. 1). However, the fits are reasonably similar even for weight functions with dramati- cally different shapes. When one considers the large number of arbitrary elements that enter into the specifica- tion of the model and the fact that the fit may be heavily influenced by data for only a few individuals, it is clear that the danger of overinterpretation is substantial. Checking Model Assumptions Only a brief mention can be made here of methods of checking the model assumptions. This represents an im- portant area of current statistical research activity. We have already mentioned the process of choosing between additive and multiplicative structures by considering a family of link functions that includes the identity and log as special cases. Once the basic structure has been decided, possible interactions among the exposures and between exposure and stratification variables should be tested by the addition of terms to the regression equation. Finally, a search should be undertaken for individuals whose data records have an undue influence on the estimated regression coefficients. More detailed discussion of these important issues is contained in (8) and the references cited therein. 0.064 { icv=03 Vote of log-normal density I a 0 20 40 60 80 100 Years between exposure and effect FIGURE 1.—Three log-normal density functions with different coefficients of variation (CV). SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES TABLE 3.— Log likelihood statistics” Mode, Coefficient of variation yr 0.5 0.1 0.05 15 —759.84 —761.68 —762.02 20 —759.72 —757.72 —758.00 25 —760.66 —759.84 —760.07 @ Coefficients are those obtained when various log-normal distributions are used as time-weighting factors for heavy and moderate arsenic exposure with the Montana cohort and based on case-control analyses with 20 controls/ case. FITTING THE MODEL TO THE DATA Different computer programs and techniques are used to fit the regression models to cohort data depending on whether the data are grouped or continuous, on the choice of an additive versus multiplicative model structure, and on whether external standard rates are to be used. A number of these techniques have been described for the multipli- cative model by Breslow et al. (21). Poisson Regression Analysis of Grouped Data The simplest and most easily interpretable analyses treat all factors as discrete with a small number of levels. Then the data may be summarized into a multidimensional table of deaths and person-years denominators. On the basis of preliminary work, such as shown in tables 1 and 2, we selected 4 “exposure” variables for further study: period of employment (2 levels); birthplace (2 levels); heavy arsenic (3 levels); and moderate arsenic (4 levels). The 3 levels of heavy arsenic refer to duration of work in an area with heavy exposure ( Pq. Ns’ VARD apy = 1 5 group of As and Bs, the number of deaths is counted. Next the number of deaths in each group is adjusted to the ratio of the number of subjects in the denominators. If 5 subjects are in group A and 1 dies and 10 subjects are in group B and 3 die, the adjusted number of deaths would be | and 1.5, respectively. The number of adjusted deaths in the A group and the number of adjusted deaths in the B group are then summed over all combinations of matched groups. The variance is easily computed as shown in table 3. I now want to review one of the characteristic problems of obtaining matches themselves. Let us say that no one over age 80 or under age 20 is in a sample. You want to standardize but cannot to the total population of the United States for the simple reason that people over 80 and under 20 are living here. This limitation is no different in matched groups or matched pairs than it is in age, sex, and race standardization. The interesting part is that the smaller a group the more likely it is to be dropped. For example, in 1 of our study groups, we might have had 10,000 white males who were smokers, but in another we might have had only 25 nonwhite males, smokers or not. Of the 10,000 white smokers we could find enough subjects to produce a large number of matches with nonsmokers. However, when you have a small group of 25 nonwhites and proceed to match them on another 18 variables, there will be no matches for most of the combinations. We faced the same small group problem when we matched those with a history of heart disease, high blood pressure, or of cancer; each of these comprised a small group. For example, at any given time in an age group few subjects have a history of heart disease. Therefore, the matching procedure tended to drop them and select what I will call the modal person, i.e., someone in a small group related to each of the charac- teristics included in the matching. We found that the death rate in the matched pairs or matched groups analysis was much lower than the death rate in the total study population. The matching excluded a considerable portion of the people with a history of heart disease or cancer or who were nonwhites. In other words, the question we were answering concerned the number of deaths among modal SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES Tar and nicotine Ma tar Mg content of cigarettes & nicotine High =25.8 22 Low =17.6 >> >>m A A nu >>>>P>> >>> > >0 2 4 7 o TUMOR THICKNESS TO NEAREST MM FIGURE 3. —Five-year survival for 21 persons with lentigo maligna melanoma according to the thickness of their tumors. 164 R. LEW TABLE 2.— Status of patients with stage I melanoma after 5 yr follow-up Tumor No. of No. of thickness, patients patients mm dead alive =2 0 13 22 3 5 between 1 and 2 mm (/7) that motivated us to produce table 2. Simplification of the data as in table 2 can blunt the precision of the analysis. For example, using table 2 to test that the deceased had thicker lesions, we observed that Fisher’s exact test barely achieved significance with P = 0.042. High-Density Lipoproteins and Coronary Heart Disease Hulley et al. (/7) recommended against diets that lower triglyceride levels as a means of reducing the risk of CHD in asymptomatic American men and argued that HDL levels delineate risk more sharply. In particular, as a single factor, HDL had a higher correlation with CHD. In seeking the best combination of predictive factors, they found that HDL dominated prediction and that tri- glycerides had no role in the risk profile. Table 3, abstracted from Hulley et al. (18), shows a series of models in which triglyceride declined as a significant predictor of risk. In Model 1, the single factor elevated risk. In Model 3, however, cholesterol dominated and tri- glyceride was not significant. In Model 4, after introducing the factors cholesterol, HDL, and body mass, triglyceride played virtually no role in prediction. The phrase “causal links” in the title of table 3 relates to a discussion of cause and association. They (/8) observed: “The issue of whether triglyceride is a cause of coronary heart disease cannot be resolved on the basis of the multivariate findings. Even if triglyceride is not an independent risk factor it may still be an indirect cause of coronary heart disease, operating through one or more of the other risk variables included in the multivariate analysis.” On the basis of other biologic and epidemiologic evidence, however, Hulley et al. concluded that serum triglyceride is not an indirect cause of CHD. They did not assert that low levels of HDL cause CHD. Watching the factors jockey for position in the alternative models merely clarified the hierarchy of risk factors for us. The physician should contrast this hierarchy with that suggested by the biology of the disease. In 1952, Doll and Hill (6) concluded that smoking “produces” cancer, but in 1981, Doll and Peto (19) used different terms saying that withdrawing the exposure of smoking “avoids” some deaths from lung cancer. This change removes a synomym for the word “cause.” Ignorant of cause, we seek factors such as smoking that might be manipulated to reduce the number of bad results (disease incidence, relapse, death). Statistical methods select those factors that, if causal and controlled, promise the greatest benefit. Prognosis for Patients With Melanoma Lesions 3.65 mm or Thicker Day et al. (20) found that among the 79 stage I melanoma patients in the Massachusetts General Hospital- New York University series with lesions 3.65 mm thick or more, 4 factors were associated with prolonged survival: negative lymph nodes, moderate or marked lymphocyte response, location of the lesion at a site other than the trunk, and histologic type SSM. The selected factors came from a list of 14. To validate the model, they first excluded “negative nodes” from the list of 14 and reran the analysis. Then they excluded the factor “lymphocyte response” from the list of 14 and again reran the analysis, after which “location” was excluded and then “histologic type” was excluded. Thus the 4 models in the middle column of table 4 were generated. The factor “level IIT and IV versus level V” replaced “negative nodes versus positive nodes.” This replacement suggests that the spread of disease into subcutaneous fat (level V) tends to coincide with the spread into lymph nodes. The replacement by the factor “absence of microscopic satellites” of “moderate to marked lymphocyte response” suggests that lymphocyte response inhibits satellite formation. To the surprise of Day et al. (20) no factor replaced SSM. In previous analyses of the data set, this factor had never achieved significance. Was the finding an artifact or not? In a later paper, Day and his colleagues (2/) subdivided lesions according to gross morphologic features. TABLE 3.-— Causal links inferred from logistic analysis of 7-yr incidence of CHD in the Western Colld®horative Group Study‘ Approximate Model Independent standardized P Inference 0. variable ; , relative risk 1 Triglyceride 1.36 <0.001 Triglyceride — CHD 2 Cholesterol 1.67 <0.001 Cholesterol — CHD 3 Triglyceride 1.13 NS Triglyceride Cholesterol 1.60 <0.001 Cholesterol — CHD 4 Triglyceride 0.98 NS Triglyceride Cholesterol 1.55 <0.001 Cholesterol — CHD HDL 0.76 0.003 HDL — CHD Body mass 1.18 0.02 Body mass — CHD “ Table is reproduced with permission of the authors and publishers (17). NS = not significant. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 STRATEGIES FOR VALIDATION 165 TABLE 4.—Secondary Cox models for 70 patients with clinical stage I melanoma with lesions greater than 3.6 mm in thickness’ Variable excluded before rerun of Cox analysis Best combination of remaining variables P associated with overall model x? Negative nodes SSM Location other than trunk 2X 1077 Level 111 or 1V, rather than V Moderate or marked lymphocyte response Lymphocyte response SSM Location other than trunk 3X 1077 Microscopic satellites absent Level III or 1V, rather than V Location SSM Histologic type Moderate or marked lymphocyte response Negative nodes Moderate or marked lymphocyte 3.5X10°¢ 1 X 107° Location other than trunk ? Table is reproduced with permission of the authors and publishers (19). Many thick melanomas have a nodule protruding from the layer of plaque. Persons at low risk to metastases had lesions with small nodules free of ulceration or had lesions with nodules surrounded by plaque that were likely to be classified as SSM. Persons at high risk had ulcerated lesions or had lesions with a nodule abutting normal skin; such lesions were often classified as nodular melanoma. Thus the significance of SSM in the earlier work on thick tumors may have reflected the relationship between risk and the size and location of nodules. Protection Against False Positives Naive application of statistical methods frequently pro- duces false conclusions. A hematologist observing an abnormally high sedimentation rate knows that a small proportion of the time, perhaps 5%, samples from normal patients reach abnormal levels. In fact, with a 5% error rate, odds favor at least 1 false positive test in a series of only 14 patients. The same holds true for a series of independent statistical hypotheses. Computers easily generate thousands of tabulations and can collect those results for which P <0.05. To separate true from false positives, many persons adopt the Bonferroni rule that depends on the number of statistical tests performed. Although we seldom know the exact truth about any individual test, we separate results with P-values less than 0.05/(number of tests performed) regarding them as true positives and the rest as false. The conservative Bonferroni rule fails to account for the redundancy of many of the tests. Other rules that do, separate the P-values else- where (22). None of these rules accounts for the medical context. In the previous example, Day et al. (20) expressed surprise at finding the factor SSM in their model. In other words, before the analysis, the SSM hypothesis seemed far less likely than other hypotheses. Because statisticians lack simple practical ways to incorporate prior beliefs into the calculation of P-values, physicians must interpret the results and comment on those that seem serendipitous. SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES A Hidden Source of False Positives All the multivariate examples given here used stepwise regression. Stepwise algorithms test a series of models to arrive at a final best model (17). Statistical packages usually report a P-value for each factor in the final model as if it were the only model tested. In other words, the programs do not screen the resulting P-values for false positives. The physician should carefully scrutinize the P-values generated by stepwise procedures to separate statistical artifacts from genuine medical findings. Results that lose significance after application of the Bonferonni rule or that resist interpretation require comment. SPECIAL VALIDATION METHODS Not every statistical estimate has a known formula for its SE. Thus Day et al. applied the bootstrap method (5) to obtain SE for their estimates of melanoma thickness cut points (17). The bootstrap method, in effect, repeats the study many times so that nearly every statistical statement in the primary analysis can be validated (5, 23). Many multivariate methods only test the factors as designated. Recursive partitioning, which automatically tests many interactions between factors (4), shares few assumptions with other methods, making it an attractive alternative model. In the example on prognosis for rectal cancer, results of both of these special methods were used in the plan of the primary analysis, thereby linking validation methods to those of exploratory data analysis (24). One can use both to explore the data before the primary analysis and afterward, to check the results. Cut Points Castelli et al. (I) put forth simple indices for physicians to use in appraising the risk of CHD in relation to HDL. Contributors to Harrison’s “Principles of Internal Medi- cine” advise the consideration of preventive measures when triglyceride levels exceed 150 mg/100 ml or when total 166 R. LEW cholesterol exceeds 260 mg/100 ml (2). These cut-point values mark the levels at which the risk of CHD rises abruptly. Cut-point rules (such as the 5-year follow-up period for cancer) appear to enter medical lore as conve- nient categories obtained by clinicians “eyeballing” the data. Gradually, by repetition, physicians adopt these values as benchmarks. Before the recommendations made by Day et al. (17), many investigators divided the range of thickness measurements for melanoma lesions at arbitrary values such as 1.0, 2.0, and 3.0 mm. Day et al. selected cut points at 0.85, 1.70, and 3.65 mm to mark the thicknesses at which the risk of death within 5 years rose most sharply for a series of 643 clinical stage I melanoma patients. By making this choice of cut points, they attempted to preserve the relation between tumor thickness and death within 5 years by creating categories within which risk varied as little as possible. As estimates, these 3 cut-point values had SE, i.e., we expect that other series may obtain different estimates for these cut points. However, unlike sample averages, cut- point estimates have no accepted formula for their SE. The bootstrap method uses subsets drawn from the whole original data set as if they were fresh data sets. Thus we can repeat the clinical trial on the computer. Day et al. derived the initial 3 thickness cut points, using the whole data set, and then obtained an SE for each cut point. More precisely, they randomly selected 10 overlapping subsets of the data. Next they derived 3 cut points on each of these 10 subsets. For each cut point, the SE over the 10 replications was used as the estimated SE for the initial value. To check that the values were not artifacts of analysis, they re-ran the entire analysis using a different method (recursive parti- tioning) that gave the same cut points and similar SE. Prognosis in Rectal Cancer Rich and associates (25) evaluated 166 patients treated with surgery for rectal cancer. Of these, 24 had palliative surgery, whereas the remaining 142 had potentially curative surgery. A preliminary analysis with recursive partitioning confirmed their plan to split the group of 142 and analyze separately the 98 persons with negative and the 44 with positive lymph nodes. The primary analysis determined the factors that significantly influenced the probability of local recurrence within 5 years. In the group of 44 with positive nodes, the 2 significant factors were blood vessel invasion and the percentage of positive lymph nodes. Figure 4 gives a breakdown by these factors, excluding 1 patient in the group of 44 who had insufficient follow-up. The primary analysis only selected the 2 factors, whereas the tree diagram in figure 4 showed how they related to one another. All patients with blood vessel invasion had a local recurrence. Among those without blood vessel invasion, findings of more than 10% positive nodes increased the risk of local recurrence. The tree diagram displays cross-tabulations of disease status with the selected prognostic factors. However, with 4 factors or more, the many branches of the tree may obscure the meaning. Also, introduction of factors into the tree in a different order can change the interpretation. Thus for 43 (48% rate ) No blood vessel Blood vessel invasion ( 33) invasion (10) ( 100% rate ) rr rate TT Few positive Many positive nodes (9) nodes (24) | Rn (11% rate ) (42% rate ) FIGURE 4.—Tree diagram relating 2 factors to the rate of local (pelvic) recurrence among 43 patients with rectal cancer. Rates appear as percentages. interpretation, we prefer a series of small trees rather than 1 massive diagram. Recursive Partitioning The method of recursive partitioning gives rise to tree diagrams like that shown in figure 4. Rich and associates (25) first made all predictive factors dichotomous; e.g., they chose a cut point for age and classified each person as “young” or “old.” Next they screened all the factors for the one that best split or partitioned the 166 cases. Roughly speaking, the best splitting factor maximized the difference in 5-year relapse rates between the 2 derived subgroups. The algorithm continued splitting each subgroup separately until it reached a minimum subgroup size. The initial recursive partitioning and 10 subsequent bootstrap replications all made the first split on nodal status. Thereafter, the set of factors selected to split the negative node group had little in common with the set of factors selected to split the positive node group. The corresponding tree diagrams confirmed the clinicians’ decision to analyze the 2 nodal groups separately. Exploratory Data Analysis Rich et al. (25) used a special method to plan the primary analysis and then used the tree diagrams to validate the results of the analysis, thereby linking validation to exploratory data analysis. This type of data analysis emphasizes the use of a simple picture of the data that is of greatest value when “it forces us to notice what we never expected to see” (24) and provides a rich source of techniques for displaying the results of complex analyses to clinicians. Two recent books, “Applications, Basics, and Computing of Exploratory Data Analysis” and “Under- standing Robust and Exploratory Data Analysis,” serve as manuals for the subject and are highly readable introduc- tions to it (26, 27). INEVITABILITY OF MULTIVARIATE ANALYSIS In his essay on observational studies, Cochran (7) discusses how to handle “disturbing variables.” To test a medication, we often measure variables such as age, sex, and stage of disease to account for variation in the effects of a given dosage. Epidemiologists refer to “confounding NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 STRATEGIES FOR VALIDATION 167 factors” and statisticians generalize to “nuisance param- eters,” but all these terms reflect a reluctant acceptance that factors other than medication influence results. How did physicians deal with disturbing variables before the dis- turbing introduction of biostatistics and computers? In 1894, Halsted used extensive cross-tabulations to describe how radical mastectomy had reduced the local recurrence of breast cancer in a series of 50 cases (28). Apparently unfamiliar with life tables, he classified patients as failures or successes, despite the fact that follow-up varied from 1 to 46 months. Before computers, when hand calculation prevailed, authors tended to divide their data into strata according to categories, such as male and female, and to analyze each stratum separately. Sometimes each stratum exhibited a common trend such as an increased risk of lung cancer for smokers. However, standing alone, none of these trends achieved significance because of the small size of each stratum. This small size led to the problem of pooling results. Could the data be re-combined to obtain an overall significant result for the common trend on the entire data? Cochran in 1954 (29) and Mantel and Haenszel in 1958 (30) gave methods for pooling cross-tabulations over strata to test the common hypothesis. The pooling problem forced the statistician to abandon simple averages and proportions and assign special weights to each stratum. Other multivariate methods require different principles such as adjustment. For example, methods based on linear regression adjust the value of the outcome in proportion to the value of the disturbing variable. The computer has made feasible a wide variety of statistical weighting and adjustment schemes merely be- cause it can perform the requisite arithmetic so rapidly. This evolution has not made cross-tabulations obsolete but has given them a new role: to help us interpret multivariate results and alert us to their limitations. Whenever possible, validation ought to precede publica- tion. Authors should review cross-tabulations, survival curves, tree diagrams, and scatter plots before arriving at final conclusions. To detect false positives, poor fitting models, and highly correlated factors, investigators should vary the set of factors in the primary analysis and vary the method of analysis. Unexpected results should initiate a more thorough review. For better communication of results to clinicians, the summary of each analysis should contain simple supporting tabulations and figures. Statistical packages should contain options that will facilitate validation; e.g., the results of alternative analyses could appear on the same page, programs for the Cox model should provide life tables for each significant categorical factor, and an option should allow for bootstrap replications of a primary analysis. Packages might pair methods that test the same hypothesis but make different underlying assumptions, such as the z- and Wilcoxon tests. These suggestions require little special programming. All should fit easily into the brave new world of interactive statistical analysis. The computer has brought us to the point where we can handle disturbing variables and appropriately refine the calculation of P-values. Before surging ahead, we must fill in the gap between what we can do and what we can understand. SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES REFERENCES (1) CASTELLI WP, ABBOTT RD, MCNAMARA PM: Summary estimates of cholesterol used to predict coronary heart disease. Circulation 67:730-734, 1983 (2) BIERMAN EL: Atherosclerosis and other forms of arterio- sclerosis. In Harrison’s Principles of Internal Medicine (Petersdorf RG, Adams RD, Braunwald E, et al, eds), 10th ed. New York: McGraw-Hill, 1983, pp 1465-1475 (3) WoOLINSKY H: Atherosclerosis. In Cecil Textbook of Medi- cine (Wyngaarden JB, Lloyd HS Jr, eds), 16th ed. Philadelphia: Saunders, 1982, pp 239-247 (4) BREIMAN L, FRIEDMAN JH, OLSHEN RA, et al: Classifica- tion and Regression Trees. Belmont, Calif.: Wadsworth, 1983 (5) EFRON B, GONG G: A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Statistician 37:36-48, 1983 (6) DoLL R, HILL AB: A study of the aetiology of carcinoma of the lung. Br Med J 2:1271-1286, 1952 (7) CocHRAN WG: The planning of observational studies for human populations. J R Stat Soc [B] 27:234-265, 1965 (8) THURSTON JH, THURSTON DL, HiXON BB, et al: Prognosis in childhood epilepsy. N Engl J Med 306:831-836, 1982 (9) KALBFLEISCH JD, PRENTICE RL: The Statistical Analysis of Failure Time Data. New York: Wiley, 1980 (10) INGELFINGER JA, MOSTELLER F, THIBODEAU LA, et al: Biostatistics in Clinical Medicine. New York: Macmillan, 1983, p 212 (11) LEw RA, DAY CL Jr, HARRIST TJ, et al: Multivariate analysis: Some guidelines for physicians. JAMA 249: 641-643, 1983 (12) BREsLow NE, DAY NE, HALVORSEN KT, et al: Estimation of multiple relative risk functions in matched case-control studies. Am J Epidemiol 108:299-307, 1978 (13) BREsLow NE, LUuBIN SH, MOSEK P, et al: Multiplicative models and cohort analysis. J Am Stat Assoc 78:1-12, 1983 (14) Kon H, MiCcHALIK E, SOBER A, et al: Lentigo maligna melanoma has no better prognosis than other types of melanoma. J Clin Oncol 2:994-1001, 1984 (15) DAY CL Jr, MIHM MC, LEw RA, et al: Cutaneous malig- nant melanoma: Prognostic guidelines for physicians and patients. CA 32:113-122, 1982 (16) ARMITAGE P: Statistical Methods in Medical Research, 2d ed. New York: Wiley, 1975 (17) DAY CL Jr, LEW RA, MIHM MC, et al: The natural break points for primary-tumor thickness in clinical Stage I melanoma. N Engl J Med 305:1155, 1981 (18) HuLLEY SB, ROSENMAN RH, BAawoL RD, et al: Epidemi- ology as a guide to clinical decisions: The association between triglyceride and coronary heart disease. N Engl J Med 302:1383-1389, 1980 (19) DoLL R, PETO R: The causes of cancer: Quantitative estimates of avoidable risks of cancer in the United States today. JNCI 66:1191-1308, 1981 (20) DAY CL Jr, MiHM MC, SOBER Al, et al: A multivariate analysis of prognostic factors for melanoma patients with lesions =3.65 mm in thickness: The importance of revealing alternative Cox models. Ann Surg 195:44-49, 1982 (21) Day CL Jr, MiuHM MC, SOBER AJ, et al: Skin lesions suspected to be melanoma should be photographed. Gross morphological features of primary melanoma associated with metastases. JAMA 248:1077-1081, 1982 (22) SCHWEDER T, SpJOTVOLL E: Plots of P-values to evaluate many tests simultaneously. Biometrika 69:493-502, 1982 168 R. LEW (23) EFRON B: Estimating the error rate of a prediction rule: Improvement on cross-validation. J Am Stat Assoc 78:316-331, 1983 (24) Tukey JW: Exploratory Data Analysis. Reading, Mass.: Addison-Wesley, 1977 (25) RicH T, GUNDERSON LL, LEW R, et al: Patterns of recur- rence of rectal cancer after potentially curative surgery. Cancer 52:1317-1329, 1983 (26) VELLEMAN PF, HOAGLIN DC: Applications, Basics and Computing of Exploratory Data Analysis. Boston, Mass.: Duxbury Press, 1981 (27) HOAGLIN DC, MOSTELLER F, TUKEY JW: Understanding Robust and Exploratory Data Analysis. New York: Wiley, 1983 {28) HALSTED WS: The results of operations for the cure of cancer of the breast performed at The Johns Hopkins Hospital from June, 1889, to January, 1894. Ann Surg 20:497-555, 1894 (29) CoCHRAN WG: Some methods of strengthening the common x? tests. Biometrics 10:417-451, 1954 (30) MANTEL N, HAENSZEL W: Statistical aspects of the analysis of data from retrospective studies of disease. JNCI 22:719-748, 1959 Avoidance of Bias in Cohort Studies Nathan Mantel ? ABSTRACT —Cohort studies have particular advantages in con- firming results of retrospective or case-control studies in those situations in which case-control studies are no longer feasible. In circumstances, the cohort study may involve randomization, thus reducing selection bias, but ordinarily there will have been self- selection by individuals as to the group in which they will fall. Investigators should analyze data from a cohort study so as to take the passage of time into account. Variables anticipated to have effects should be accounted for by stratification, if feasible, or by mathematical modeling, if necessary. Results should be interpreted with care, and qualifications should be made on any interpretations, including qualifications relating to the propriety of the mathematical model used. When long latencies are a factor, and particularly when exposure is initiated late in life, establish- ment of a positive role for the exposure can be difficult. Case-control and other epidemiologic studies are biased toward identification of exposures leading to outcomes of a unique nature but fail to identify more serious exposures with adverse outcomes which are more commonplace.—Natl Cancer Inst Monogr 67: 169-172, 1985. Large cohort studies play an important role in resolving questions and confirming leads raised in smaller investiga- tions, e.g., retrospective or case-control studies. Although the retrospective study may initially be fully as valid as the cohort study in establishing associations between exposures and outcomes and at much reduced study sizes, it can be suspect as a tool for reconfirming associations. In light of the already established cigarette smoking-lung cancer association, it is unlikely that today a valid retrospective study of that association could be conducted. There could even be a halo effect, so that associational studies of cigarette smoking with other diseases by a retrospective approach would likely be tainted. However, an appropri- ately large cohort study could provide valid confirmations or leads, or both, of associations with cigarette smoking. I am mindful of a study by investigators who sought associations between the use of anesthetic gases by dentists and the outcomes of their wives’ prior pregnancies, particularly spontaneous abortions. That study would have been tainted by prior publicity given to the possible effects of anesthetic gases. Elsewhere (/) I have suggested how this difficulty might have been avoided by camouflaging the purpose of the investigation. From my experience in reviewing the work of others, it is Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Department of Mathematics, Statistics and Computer Science, The American University, Washington, D.C. 20016. Address reprint requests to Mr. Mantel at 4900 Auburn Avenue, Bethesda, Maryland 20814. not so much study bias that confronts us but rather biased interpretations, misinterpretations, questionable statistical analyses, and lack of qualifications or consideration of possible qualifications on the immediate interpretation of the data. Because neither the cohort nor retrospective study involves random assignment to treatment group, we cannot out-of-hand dismiss the possibility that an association found might be a self-selection effect. Only after careful thought and consideration would we reject a constitutional hypothesis explanation for the cigarette smoking-lung cancer association. When the exposed group is a select one, such an explanation would be reasonable. Thus an association between use of an anorectic agent and the occurrence of primary pulmonary hypertension could reasonably be reflective of the effect of the obesity of the patients on the agent. Proper controls would be obese individuals using other weight-control measures. Use of these measures would be far preferable to the use of statistical covariate procedures for taking weight into account [see (1)]. The need to qualify apparent associations evident in the data arises from the nonexperimental nature of the typical cohort study, i.e., individuals have not been assigned to treatment or control by a random procedure. However, in certain situations, that randomization aspect is met, as | will discuss below. In any case, an important aspect in the observational cohort study and the experimental cohort study is aging or the passage of time. For the experimental study, we should want to take into account any chance differences in longevity between treated and control indi- viduals, and qualifications of interpretation could still remain. Treatment could itself have altered longevity or could otherwise have been influential when a morbidity became manifest. For the observational cohort study, we would have additional qualifications relating to noncom- parability of exposed and control individuals on anticipated longevity. In (2), I indicated that in a large-scale cohort study of extended duration into the effects of passive smoking, the investigator may have failed completely to take into account the passage of time. Even if a compre- hensive analysis stratifying on time and age may have been too onerous, it would have been relatively simple for him/her to make an analysis based on person-years at each year of age. I also indicated (2) an interesting qualification that might have to be made with regard to finding a passive smoking effect. It related to the possibility that the category of reportedly nonsmoking women might include women who actually smoked but felt constrained to say that they did not smoke. In another context, I was concerned that replies from distraught mothers of patients with Reye’s syndrome on the use of analgesics might be noncomparable with the replies from mothers of children who did not have 169 170 MANTEL that untoward outcome. A statistically significant difference could indicate only that the replies were different rather than that analgesic use was different. Randomized clinical trials can be thought of as experi- mental cohort studies, albeit they do not have an epidemi- ologic motivation; in some circumstances, they do become epidemiologic in nature. Suppose there is a suspicion that, apart from any therapeutic effect, treatment itself induces some ill effect, perhaps even some neoplasia. The data from the clinical trial as they related to that ill effect can now be analyzed as from a cohort study, one experimental in nature by virtue of the initial randomization. Selection bias would no longer be a factor, yet all other cautions with respect to statistical analysis would remain. Certain problems with regard to timing of treatment, guarantee periods, etc., as they related to the clinical trial aspect of the investigation would not arise relative to the epidemiologic aspect. Distinctions between clinical trials and cohort studies become blurred when the purpose of treatment is preven- tion of disease rather than therapy. Depending on how the trial was conducted or how the data being analyzed arose, we would have an experimental cohort study, or an observational cohort study, or an experimental cohort study in which blocks of individuals rather than separate individuals were assigned to treatment. Though we may analyze the data as though we were in a straightforward experimental situation, we must always have reservations as to the true way in which the data arose. Problems about covariates will always arise. When those covariates are limited in number and the size of the study is large, an effective way of taking covariates into account is by stratification. Each variable is subdivided into a limited number of ranges so that a stratum is defined by the set of ranges for each of the covariates (though I do not like to see age subdivided into extremely coarse intervals, particularly in regard to neoplasia). The relationship between exposure or treatment and response can then be sought with each of the covariates kept constant, within limits, while time or aging effects are also taken into account. For that matter, the effect of any single covariate, whether used in the stratification or not, can also be studied in this way with the others kept constant. Stratification as a way of dealing with covariates breaks down when the covariates are too numerous or the data too sparse. Clinical trial type studies tend to yield far less data than a purposeful cohort study. In their zeal to take everything into account, investigators will spell out a large list of covariates which they want to take into account. Perforce, they use statistical regression methods in which a mathematical model, generally of some linear nature, is used as a way of keeping the covariates constant. These mathematical models might be applied even in situations in which the number of covariates is limited. Blind faith in the suitability of mathematical models for analyzing data seems to be the rule, with investigators little aware of the strong assumptions implicit in the use of those models and of the possibility of obtaining misleading results if the assumptions are violated. My prescription would be to keep covariates limited in number, so as to include variables (either known or suspected with good reason) to play a role in the disease. Do not throw in extra variables just because you can, but do study the effects of those extra variables if you wish. The use of mathematical models may remain essential even with few covariates when the data are sparse, but the investigator should keep constantly in mind the possibility of misleading effects due to the linear mathematical model used. From my experience, this is not just an academic concern. Conceptually, the number of covariates is neither small nor large, but infinite. Every investigator is open to a charge that his/her study is biased because some variable can be found on which treated subjects and controls differed significantly initially; e.g., just run through 1,000 unimportant variables and by chance alone, differences would be significant at the 5% level for about 50 of them. Similarly, by a screening of enough variables, some will be found to have significant effects on the response of interest. Any faulting of this kind would be without justification, for the investigator will have taken into account in his/her analysis all variables that he/she thought might matter, so that the effect of all other variables would have been subsumed under error. A real fault would have been for the investigator to subsume under error, on the basis that randomized assignment was made, variables which could well reasonably matter. If sex of patient really matters, allowance for sex should be made in the analysis despite the randomization used. When is a study large, or at least large enough, to permit use of asymptotic statistical procedures? Surprisingly, asymptotic conditions can obtain even when the number of treated individuals averages far fewer than 1/stratum. The example 1 encountered was one in which 40 patients were treated with cryosurgery, with no controls. Past records of about 3,000 patients were available, and these might have been used, with caution, as controls. Now 40 versus 3,000 would provide a reasonably asymptotic situation, but important covariates had to be considered. With a mini- mum number of covariates and limited breakdown into ranges, 640 strata could be identified: about 5 controls/ stratum, and only one-sixteenth of a treated patient/ stratum. The saving thing was that the treated patients could arise in at most 40 cells or strata, which, on average should contain about 5 controls each. In fact, if a cell contained a treated case, it was in a sense a viable cell which would average in excess of 5 controls. A viewpoint here is that potentially we had 3,000/40 = 75 controls/treated patient, but loss of information was only moderate in reducing this to about 5 controls/treated patient. More substantial loss of information would have resulted had we devised an algorithm for selecting, for each of the 40 treated patients, a most closely matching control from among the 3,000. I have suggested devices like this in other situations. Data on the results of treatment can arise without regard to questions of whether a clinical trial has in fact been conducted. For convenience, let me call it an open clinical trial, but it is easy for one to visualize epidemiologic investigations of a parallel nature. I will not go into details here on issues like time of onset of symptoms, time of diagnosis, time of initiation of therapy, though these all had to be carefully taken into account for a more proper analysis. The facts were that, for whatever reason, patients NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 AVOIDANCE OF BIAS IN COHORT ANALYSIS 171 with the disease had received a particular therapy, and reference data were available. Contradictory assertions could be made as to what kinds of patients were treated. Were they patients with particularly favorable prognoses who could still be salvaged? An alternative claim was that the particular therapy was sought only when a patient was in an urgent situation. Depending on the facts, the seeming effect of the treatment could be artificially enhanced or improperly obscured. In data analysis, one can remove the effects of bias in either direction by progressively discarding more and more of the early data, e.g., those data accumulated during the first 3 months, 6 months, and 12 months of the study. In this instance, when a larger amount of the early data were excluded, the greater was the apparent benefit of therapy. Cohort studies, as opposed to retrospective studies, mailed questionnaire investigations, etc., have a great advantage in being minimally subject to response bias. Anytime a response rate is less than 100% in such other studies, which means nearly always, potential for bias exists. Follow-up on nonrespondents cannot completely resolve issues, for the later stage nonrespondents could represent a hard core of the kinds of individuals who were causing the bias in the first place. When nonresponse rates are comparatively low, the possibility for bias seems to receive little attention. Response rates on the order of 90-95% might even be cited as giving particular reliability to the outcome of an investigation. However, even non- response rates as low as 5-10%, or even lower, could give rise to highly biased estimates of relative risk if the non- responders are of a select nature. In the cohort study, once we have the initial characteristics of the individuals, it is only a matter of ascertaining their eventual outcomes: Any selective aspects in which outcomes are ascertained would probably be unrelated to initial characteristics. Related to the question of incomplete response rates or missing values is the issue of whether part of the data should be excluded from any analyses made. Once when I was contacted about analyzing a body of clinical data, the company suggestion was that they would first edit the data before sending them on for analysis. My demurrer was that any editing procedure could itself introduce biases, and that, instead, any organization responsible for analyzing the data should also take on editing responsibilities (though, perhaps, with some guidance). In another instance, I learned after the completion of analyses that about 509% of the available data had been excluded. The largest part of these exclusions were made on the reasonable basis that the infections had proved to be caused by organisms different from those for which the medication was intended to be effective. However, I replied that the presenting symptoms were the same so that, in clinical use, the same medication would be given anyway. Besides, I was much concerned that with so high an exclusion rate, the results of statistical analysis would be untrustworthy. Happily, when the data were reanalyzed so as to include the patients with the infection from the incorrectly ascribed organisms, the apparent benefits of the medication became even more pronounced. Inasmuch as I am dealing here with a clinical trial, let me cite a situation which brings out the importance of taking time into account. More adverse effects of a SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES particular nature were noted among patients receiving a certain medication than among those given a control one. With controls getting less relief, the drop-out rate was higher, and so the chance of their presenting with the adverse effect was also less. Moreover, the greater relief provided by the test medication could have encouraged activities that might bring on the adverse effect. Latency of disease plays a particularly important role in cohort studies. Suppose we randomly assigned children at birth to be smokers or nonsmokers, though smoking would be delayed until they had attained a suitable age. We should have to wait the 15 years or so for the age to be attained, plus another 20-30 years for certain ill effects of smoking to become manifest. However, if we start follow-up of self- selected smokers and nonsmokers later in life, the smokers among them will already have passed through much of the latent period. Thus the consequences of smoking can start showing up after a more moderate interval. However, a long latency may only reflect that the actual or effective exposure is so low that on average the time to an initiating event is prolonged. For high-exposure groups, e.g., occupa- tional groups, latency could be shorter. The success of epidemiologic methods in the study of smoking and lung cancer can lead others into studies which are biased, biased in the sense that they are foredoomed to failure. Suppose I conduct a cohort study (or even a retrospective study) on the effects of oophorectomy, any carcinogenic effects of which might also be subject to long latent periods. I cannot expect the early rewards that a cohort study of smoking would have because the latency would only be starting, not half-way or more complete. The latent period of oophorectomy-related cancers might even exceed substantially the remaining life expectancies of the women involved, so that chances of detection of an effect are again reduced. Almost completely dooming the chances of finding any effect is that we will be working with women at such an advanced age that spontaneous cancer rates are so high that the relative increase in cancers due to oophorectomy could be moderate, albeit the actual increase could be highly important. We have perhaps been spoiled by our awareness of tenfold increases in lung cancer associated with cigarette smoking. The exposure would have started early in life, and risks could have been negligible in the absence of smoking. In the foregoing are two important lessons, perhaps contradictory. One is that we should not stress too much those potentially carcinogenic exposures, perhaps medi- cations, which start so late in life that the effect is unlikely to show up while the person is alive. The other lesson is that when we are concerned about such effects, we must mount a large enough, also long enough, study to detect important effects even if they are reflected in only moderate relative risk values. Background rates of cancer occurrence have played a dominant role in epidemiologic investigations. Let an exposure produce an important increase, though small relatively, of a particular cancer in the general population, and the retrospective study, for all its virtues, will not pick it up (though it might in an occupational study). If an exposure gives rise, even infrequently, to some unique cancer form, then given some reasonable number of cases 172 MANTEL of that form (collected nationwide, if necessary), the retrospective study will almost unerringly detect the culprit exposure if it is one of the suspect type. Stigma seems to attach not so much to causing cancer but to causing unique cancers. The worst offenders are often overlooked because they blend into the background. In cohort studies, such offenders will still have protective camouflage. Whether an agent that produces unique cancers will be detected in a cohort study will depend on the frequency with which the cancer is elicited as well as on the size and duration of the cohort study. Little is lost if a producer of highly infrequent unique tumors goes undetected. By and large, however, it is the exposure which produces unique effects, which is likely to be detected in epidemiologic investigation, whereas more important causes of common effects can readily escape detection. A high odds ratio, albeit with low actual effects, obtains in the one case, low odds ratio but high actual effects in the other. Though emphasis has been put in the foregoing on the effects of exposures, certain other studies that may require special considerations are of a parallel nature. Thus we might study the influence of blood type, sex, or other characteristics of individuals on the risks of neoplasia or even the interactive effects of those characteristics with particular exposures in producing neoplasms. Which kinds of individuals are at greatest or smallest risk due to smoking cigarettes? A study of factors or perhaps dietary habits that reduce the risk of cancer may be noteworthy, but perhaps the absence of those factors or dietary habits may be thought of as playing a causative role. Although the point has not been made above, it will almost necessarily be true that the population sampled in a cohort study will differ from the general population. It would be no proper defense of an exposure if the exposed group had lower overall death rates than the general population, for the comparison should have been with death rates for the nonexposed group in the same study population. Any biases with respect to the study population would cancel each other. Young cigarette smokers may have lower death rates than nonsmokers because the nonsmokers could have respiratory or other health condi- tions that would preclude their smoking. Even this bias in favor of smoking eventually disappears as the duration of smoking increases. For further study by the interested reader, I would suggest the following. Haenszel and I (3) examined various aspects of retrospective studies, yet much of what is given would also be true in cohort studies. Some generalizations of the statistical methodology therein are given in (4). Although the literature is now rich in methodology for analysis of survival time and time-to-response data, I extended (5) the methodology described in (3) to cover such data, the extension being the progenitor of what is now called the log-rank test. It is this kind of methodology which I have indicated as being appropriate for taking the passage of time into account in analyzing cohort study data. Do not go overboard in applying that methodology. If patients with coronary attacks are hospitalized, what matters is whether they died or survived their ordinarily short hospital stays, not whether any deaths were earlier or later in the stay. Also, timing could be important for some purposes. The time-to-response approach (6, 7) is applied to laboratory studies of carcinogenesis. Distinctions among various kinds of observational studies (quasi-experiments) and true experimental studies, as well as some history of early statistics at the National Cancer Institute are given in (8). Although I cited (/, 2) at the start of this document for certain purposes, the reader might glean many other interesting points from them. Particular issues relating to clinical trials are raised in (9). REFERENCES (1) MANTEL N: Cautions on the use of medical databases. Stat Med 2:355-362, 1983 : Epidemiologic investigations—care in conduct, care in analysis, and care in reporting. J Cancer Res Clin Oncol 105:113-116, 1983 (3) MANTEL N, HAENSZEL W: Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22:719-748, 1959 (4) MANTEL N: Chi-square tests with one degree of freedom; extensions of the Mantel-Haenszel procedure. J Am Stat Assoc 58:690-700, 1963 : Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep 50:163-170, 1966 (6) MANTEL N, BOHIDAR NR, CIMINERA JL: Mantel-Haenszel analyses of litter-matched time-to-response data, with modifications for recovery of interlitter information. Can- cer Res 37:3863-3868, 1977 (7) MANTEL N, CIMINERA JL: Use of logrank scores in the analysis of litter-matched data on time to tumor ap- pearance. Cancer Res 39:4308-4315, 1979 (8) MANTEL N: A personal perspective on statistical techniques for quasi-experiments. /n On the History of Statistics and Probability (Owen DB, ed). New York: Marcel Dekker, 1976, pp 103-129 : An uncontrolled clinical trial—treatment response or spontaneous improvement? Controlled Clin Trials 3:369-370, 1982 (2) (5) 9) Co-Chairman’s Remarks David Schottenfeld 2 The issues raised in the previous papers relate to pitfalls both in the design and analysis of cohort studies. I have summarized potential deficiencies in concept and methodol- ogy that may detract from validity (table 1). Dr. Lew described the need to validate different stages of data collection and analysis. Of particular concern to the epidemiologist is independent verification of the complete- ness with which the cohort has been identified at the onset and with which vital status has been determined at termination. Professor Mantel emphasized the importance of not selectively excluding subgroups in the cohort during the conduct of follow-up. Earlier, Dr. Selikoff? reviewed various approaches that he has used in studying cause-of-death data and when a best interpretation of underlying cause was based on a comprehensive review of different sources of clinical and pathologic information. This approach would introduce bias if the only reference population for cause-of-death data were the general United States population. The usefulness of an internal comparison group is that you can approach both cohorts with the common objective in mind of going beyond the limitations of death certificate data. The standardized mortality ratio is a commonly used summary index of mortality in cohort studies. The com- putation involves indirect age-adjustment and applies the age-, sex-, race-, and cause-specific mortality rates of the same standard population to one or more cohorts, so that the expected number of deaths can be calculated. The study cohorts, although internally standardized, are not generally mutually comparable. The standardized mortality ratio addresses the question of how the observed number of deaths compare with the number expected, whereas the latter number is based on the cumulation of person-years in ! Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 preventive Medicine Service, Epidemiology and Cancer Con- trol, Memorial-Sloan Kettering Cancer Center, 1275 York Ave- nue, New York, N.Y. 10021. 3 This paper is not included in this monograph. TABLE 1.— Pitfalls in conduct of cohort studies No verification of the complete cohort-at-risk Selection of an improper control group Lack of awareness of optimal latency No verification of follow-up status through independent means Correction of cause-of-death information for study group only No relationship of findings to exposure levels Inadequate sample size or estimations of limitations in statistical power No control for confounding by other risk factors, such as age, sex, race, social class, relevant personal habits, etc. relation to age, sex, race, and calendar period and on the null assumption that the study population had the same mortality rates as the standard population. The magnitude of this ratio may only approximate the level of relative risk and is affected by the age and person-years structure and the observed number of deaths in each cohort. Further- more, the standardized mortality ratio does not take each age at death into account which, of course, impacts on cohort survival. Therefore, populations with different life expectancies may have the same standardized mortality ratios, and, conversely, populations with the same life expectancies may have different ones. The stratification of the cohort by person-year subgroups for calculation of a standardized mortality ratio does not allow for summarization of relative risk in relation to time since first exposure (latency). In addition to latency and censoring, which are accommodated more readily through life table methods, there is the statistical problem of transient dose states at various times. The allocation of person-years-at-risk should be in relation to categories of exposure levels (if these are known), interval since exposure onset, and current age. Dose-response relationships can be tested for linear and nonlinear components. One can use current multivariate techniques, such as the Cox propor- tional hazards model, to quantify the effects of cumulative exposures on instantaneous hazards, after controlling simultaneously for various time-dependent confounding variables. 173 Discussion V 2 S. Jablon: 1 wish to comment on the remarks about absolute and relative risk models as they relate to radiation. Historically, at least in the studies of Japanese survivors, the first measures of risk calculated were what are called “absolute risk estimates.” That is, for a group of persons having a particular range of doses, one would calculate the number of excess cancers of a particular kind, divide that by the product of the person-years at risk times the mean dose, and call the result the absolute risk per million person-year rad. More recently, data were presented that indicated that this absolute risk model is probably not the best way one should project in time what happens to an irradiated group. That is to say, for a given cohort, defined by age of exposure, as one progressed through time, one found little excess risk for the solid tumors for the first 10 years or so, then the excess risk began to rise and kept increasing. However, if you examine the relative risk, i.e., if you divide the excess risk in any particular period by the spontaneous rate in that cohort in that period, the resulting numbers seem to be fairly constant in time once you had passed the initial period of latency of 10 or 15 years. Different people use “relative risk” to mean different things, and what-l am talking about here is what has been called a “relative risk projection model.” 1 am referring to relative risk as a function of time since exposure for a cohort of a particular age at the time of exposure. Now what is the importance of this question? Is this just some esoteric notion with which statisticians concern themselves? The answer is: “No.” It is important for 2 reasons which are quite different. Firstly, implications for the biologic meaning of what is going on are important. We are concerned not only with what happens but why it happens, and, when we are talking about radiation carcinogenesis, one of the first questions asked is whether radiation is acting on an early stage of the carcinogenic process (something you might call initiation) or acting as a promoter. Depending on how radiation acts, you would expect different kinds of projection models to hold. In a different dimension, some of you may be aware of the fact that in late 1982, the Congress of the United States passed the Orphan Drug Act, which had a rider introduced by Senator Hatch that directed the Secretary of Health and Human Services to prepare what the bill called ABBREVIATIONS: NCRP=National Council on Radiation Protec- tion and Measurements; BEIR 3=Third Report of the Biological Effects of Ionizing Radiation Committee of the National Academy of Sciences. I Conducted at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Address reprint requests to Lawrence Garfinkel, Epidemi- ology and Statistics Department, American Cancer Society, 4 West 35th Street, New York, N.Y. 10001. “radio epidemiological tables.” These radio epidemiological tables are supposed to tell you the following: If a person has a cancer of a type that can be induced by radiation and a known history of radiation exposure, what is the proba- bility that the cancer was the result of the radiation exposure? This obviously is designed to feed into the whole issue of compensation, etc. The NCRP has had a com- mittee, chaired by Victor Bond of Brookhaven National Laboratories, working on precisely this problem for 14 years. In fact, a set of tables of the kind which the bill calls for has been produced. Those tables are based on the BEIR 3 model of the National Academy of Science Committee. The report on this model, published in 1980, presented estimates of radiation risks for particular cancers with the use of a particular model, more specifically, the use of an absolute risk projection model. In response to the Orphan Drug Act, the National Institutes of Health established a Working Group to prepare radio epidemi- ological tables. The Working Group, after careful con- sideration, decided to adopt the relative risk projection model, and the consequence of all this is that we may have within the next year, 2 sets of tables based on different projection models. If the numbers vary considerably, it is likely that rare problems will be created because you can readily imagine what would happen in a court of law, if you try to introduce a particular number and then counsel for the opposing side comes up with another number, also produced by a supposedly authoritative group. I am in the enviable position of being a member of both groups, so we are going to have to find a way of compromising the issue and coming up with the best estimates that we can make. Obviously, they will have to be accompanied by estimates of the uncertainty that applies to them. D. Schottenfeld: Am I not correct in assuming that these projections are based primarily on the cohort studies of the Japanese in Hiroshima and Nagasaki and of patients treated with external radiation for ankylosing spondylitis? Will these be the basis for risk estimations by organ site? Jablon: Yes and no. Both groups have decided that they are not in a position to repeat the BEIR 3 analysis; they are not in a position to derive their risk estimates, and so they are going to rely on the BEIR 3 risk estimates. What you say is largely true of the BEIR 3 risk estimates, so, in a sense, you are right, but the groups themselves are going to use what the BEIR committee said, which means the NCRP committee had an easy task. You just take the absolute risk numbers for the BEIR 3 model as they appear. The Working Group has a slightly more compli- cated task. They have to take the absolute risks out of BEIR 3 and somehow turn them into relative risks. This can be done, but it is yet to be done. E. Lew: 1 would like to raise a broader issue in connection with Mr. Jablon’s comments. In conducting cohort studies, we are really interested not so much in what happened in the past but to the extent to which the past 175 176 DISCUSSION V may be a guide as to what is going to happen in the years to come. We have been most fortunate with death rates because mortality has been declining; in most situations, past experience is a conservative estimate of what the future holds, at least in the years immediately ahead. However, in the fields of morbidity and medical care we are confronted with an upward trend. We need to conduct cohort studies in these areas so as to bring out the nature of the trend and the probable causes for its direction. This may suggest component analyses so that more information is elicited from past experience that may bear on the trends ahead. In any event, the techniques of forecasting must be brought into the picture and the experience monitored carefully so that the assumptions being made about the future are checked periodically. A great deal of work on this kind of problem has been done in the Office of the Actuary of the Social Security Administration. W. Nicholson: Mr. Jablon’s remarks are absolutely appropriate. I have 2 illustrations on that issue; figure I shows the relative risk of death from all cancer except leukemia versus years since detonation of the atomic bomb. (Here, the relative risk is the ratio of the observed:expected deaths among those exposed to 10 or more rad to the observed:expected deaths among those exposed to less than 10 rad.) The data are from the Atomic Bomb Casualty Commission study (/). Except for 1 point which is low, the risk jumps sharply to about a 15% increase. Thereafter, the percentage increase remains constant, which implies that a relative risk model is appropriate to use, i.e., the relative risk is independent of time since onset of exposure. In contrast to the constancy of relative risk, the absolute risk of death from these cancers rose more than twofold over the period of observation. Figure 2 shows the age-related effect that I spoke about previously. Again, for all cancer except leukemia the relative risk is similar for all ages from 15 to 50 years. However, it appears to fall for the oldest age group as it did for asbestos workers. Radiation seems to be another example of an age-related selection effect. The decrease is not likely to be that of the inappropriateness of a relative risk model for the 30-year follow-up period because it was TTR RELATIVE RISK o 1 1 Oo © bey ® RISK O RATE 0.8 hh 0 I0 20 30 40 YEARS FROM DE TONATION FIGURE |.—Relative risk of death from cancer of all sites except leukemia in survivors of the atom bomb explosions in Hiroshima and Nagasaki according to age at detonation. Relative risk is the ratio of observed:expected deaths among those exposed to =10 rad to the observed:expected deaths among those exposed to <10 rad. Data are from (/). RELATIVE RISK . A + 0 20 40 60 AGE AT DETONATION FIGURE 2. Relative risk of death from cancer of all sites except leukemia in survivors of the atom bomb explosions in Hiroshima and Nagasaki according to time since detonation. Relative risk is the ratio of observed:expected deaths among those exposed to =10 rad to the observed:expected deaths among those exposed to <10 rad. Data are from (7). shown to be appropriate in figure 1. The age-related effect is unrelated to duration of follow-up. The dramatic difference that one would get had an absolute model been used raises the question of whether the biologic model is appropriate for radiation carcinogenesis. We would normally think that radiation acts as an early stage carcinogen, whereas the appropriateness of the rela- tive risk model for both duration from onset and age sug- gests an action at later stages, even 30 years after the event. The effect of radiation appears to be maintained for long, long periods after exposure and apparently acts in con- junction with other carcinogenic processes, even initiation- like events, which occur long after the radiation exposure. Jablon: Did I hear you correctly? Did you say that radiation is acting at a late stage? Nicholson: It appears to act late in the carcinogenic process. Jablon: That is what you said, and this is not the time or place for a debate, but my opinion is exactly the opposite. R. Peto: Surely both are excluded by the data. The hypothesis that radiation is acting at the first stage of the carcinogenic process of tumors is really not at all supported by the data. The processes of aging do not appear to have any important effects on tumorigenesis for the solid tumors. The approximate constancy of relative risk and the rapidly increasing absolute risk strongly suggest that radiation is not acting chiefly on the first stage of tumorigenesis for the solid tumors; for leukemia, it is uncertain. The fact that you are still getting tumors 20-30 years later suggests that radiation is not acting on the last stage of tumorigenesis. That is, I think the data are not compatible with either the first or last stage. I believe that a relative risk model is the more appropriate. Certainly, it is thousands of times more convenient, and, scientifically, it seems to fit the data better, not for the leukemias but for the solid tumors. In fact, whether you say absolute or relative risk, it really is a matter of convenience; neither is going to be exactly right. If you have something that is acting on an early stage, then an absolute risk might well fit the data best. If you have something without substantial NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 DISCUSSION V 177 effect on the first stage of carcinogenesis, then a relative risk model may well fit the data best. The nicest example is asbestos because it causes 2 completely different types of tumors. On the one hand, it has a substantial effect on mesothelioma risk. On the other, it has a substantial effect on bronchial carcinoma risk. For the mesothelioma data, asbestos does appear to be acting on an early stage of the process because it has a substantial effect at that point of the process. The absolute excess of mesothelioma seems to be related directly to asbestos exposure. For bronchial carcinoma, asbestos has the multiplicative effect with smoking and does not seem to be acting chiefly on an early stage of the process. Therefore, a fairly clear excess is attained rapidly but then the relative risk remains moder- ately steady. This is an example of | agent acting in different ways on 2 types of tumors. It is just a question of which is the more convenient description. Neither will be exactly right. In regard to radiation of solid tumors, surely the relative risk is vastly more convenient both legislatively and scientifically. Jablon: It is true that most people agree that the relative risk projection model is more appropriate than the absolute for radiation carcinogenesis for the solid tumors. The difficulty that the NCRP committee has faced from its inception is that members believed they had to rely on data provided by an authoritative body. The report published risks on an absolute basis, and the NCRP felt compelled to adopt them. What the NCRP committee is going to do in the face of a contrary decision by the National Institutes of Health Working Group is an interesting question. Nicholson: I want to comment on Mr. Peto’s argument regarding an intermediate stage. | think the strong constancy of relative risk, particularly with age, is sugges- tive of a later acting interaction. This is not true when cessation of exposure also means termination of biologic activity. 1 think that what may be happening when radiation is administered also happens with regard to asbestos. Just as asbestos fibers remain in the cells, so do the effects of radiation. The carcinogenic activity here will manifest itself in time. If so, the multistage model would not be applicable. Cigarette smoking with its 2 stages is an exception. Otherwise, for virtually every other carcinogen, | find little evidence at this time for discrete stages on which external agents can act. W. Haenszel: I have a question for Mr. Jablon on the absolute versus the relative risk model. If you use a relative risk model, you are introducing a transformation. Is this some variation on the principle of attempting to introduce parallelism? For example, when you standardize by the direct method and use the absolute rates, the choice of the reference standard population makes a lot of difference if the plots of the 2 curves are not parallel. If you transform to log rates and get parallelism, the contrast of the stan- dardized rates are independent of the choice of the standard population. Is this kind of issue involved in the choice of absolute versus relative risk models? Jablon: | am not sure I really understand your question, but perhaps I can expand on the premise a little. Let us consider a disease like breast cancer: If you compare absolute risk in Japanese women versus Caucasian women, SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES the absolute risks are almost the same. The relative risks, however, are different because the spontaneous breast cancer rate in Japanese women is only 1/5 or 1/6 of what it is in Caucasian populations. That means that if you want a relative risk estimate for radiation on the basis of Japanese experience, you cannot simply translate that to the United States. What you can do is take the absolute risk measured in Japanese women at a particular time and translate that into a relative risk for Caucasian women by dividing the excess in Japanese by the spontaneous rate in Caucasians. | do not know if that answers your question, which, I must confess, I did not really follow exactly. N. Mantel: Dr. Hammond discussed these issues and he wisely suggested forming matched sets. However, no careful distinction was made between the different situa- tions in which the matched pair might arise, 1 from a case-control study and the other from the cohort study. Let us imagine having a matched pair in a cohort study, and first 1 member responds then 5 years later the other member responds; I think that is the kind of analyses Dr. Hammond has shown. They would just about cancel each other. The proper method of analysis for this would be that we only look to see which member of the pair responded first. However, that leaves you with the difficulty that whatever happens to the second member of the pair would make no difference in the analysis. We have a method for using these leftover members of the pair that we have been using in laboratory studies of carcinogenic agents, one in which we form a group of leftover animals. I am not too sure as to how well it would apply here, but the problem of loss of information from leftover individuals can be avoided with matched sets of individuals rather than matched pairs. I believe that if Dr. Hammond went over his matched pairs analysis in the cohort type study, he could look to see which member responded first for pairs when both members responded. In any case, the distinction between the two situations should be kept in mind. Peto: A neat statistical way avoids that problem entirely. Suppose you have taken 100,000 blood samples and you are looking for cancer cases of some kind, you do not really want to have as a matched pair individuals who develop cancer later on. Therefore, instead of starting at the time you take your blood samples to choose your pairs, you start at the far end of the follow-up. Suppose you took them in 1980 and you are following them up to the year 2000; you start with the year 2000 and you work backward, and the first time you find a cancer occurring, let us say is in 1999, you then choose a matched pair, i.e., persons who are alive at that time and who are matched with whatever factors you want. If you start at the back end and work to the front, the problem never arises. You always get a patient matched with somebody who did not develop the disease subsequently. You not only avoid the problem but obtain full statistical power without the anomaly of having matched an earlier patient with somebody who developed cancer later on. E. C. Hammond: If I understood what you said properly, I think I have tried what you are talking about. As a matter of fact, I did it in many different ways. You notice I did take 2 periods and showed them independently and reduced that period, year by year, during that 12-year period. You 178 DISCUSSION V can look at it in a lot of ways. By doing it in various ways, I did not get enough difference between the methods to make it worth considering from a practical standpoint. Yes, it was different numerically, but not to a degree that was meaningfully significant. I am not talking about statistical significance. N. Breslow: I would like to go back to the discussion of additive versus relative risk models. Two issues come to mind: Which model fits the data best in the observable range? We are certainly not limited to a choice between these two alone because there is a whole range of different transformations of which additive and multiplicative are just 2 points on the scale. Sometimes, we can find other transformations that give us even better fits to the observable data and use those for interpolation. The second issue concerns the use that we make of these models to extrapolate beyond the range of the observable data, either into low dose ranges or into the future. It is not at all clear to me that the question of which model fits best in the observable range is relevant to such extrapolation. Perhaps we ought to be choosing our models on the basis of our understanding of the nature of the disease process rather than as relatively minor differences in goodness of fit to the particular data points. Here of course the multistage model is particularly important because it is one of the few that has been put forward that appears to have both theoretical and empirical support. For that reason I get upset when I hear Mr. Peto stating that the data show radiation is not acting at an early stage of the process. This points out a problem with the multistage model, which is that we have not been able to give any clear biologic identity to the stages. If one thinks of what radiation ought to be doing, it should be acting at a first stage. Can you explain the discrepancy to me? Previously, while discussing the asbestos situation, we saw the power relationship between incidence and time since initial exposure break down toward the end of the time scale. This is a disturbing observation of the uni- versality of the multistage theory and particularly for its use in making projections into the future. We need to be absolutely sure that it is a correct observation and try to understand it. Peto: I believe that there is no good reason to say that radiation looks as though it initiates rather than acts in the later stage of the process. Experiments in cell cultures show that radiation does lots of things that really look much more as though it should affect later rather than early stages in totally unaltered cells. I think that we know so little about which effects of radiation really are important in carcinogenesis. You said you would like your model to determine how you describe the data. I would like it to be the other way around. I would like the pattern of the data to determine what your model is. G. Howe: It may be impossible for us to distinguish among statistical models on the basis of empirical epi- demiologic data, even for radiation studies in which the effects are better quantified than for any carcinogen. For example, the excess incidence of breast cancer induced by radiation appears constant for both Japan and North American populations and implies an additive model. In contrast, the interconnection between radiation and age-at- risk in the induction of breast cancer appears to be de- scribed better with a multiplicative model. However, even if we start with biologic data, such as a multistage model of carcinogenesis, depending on the specific postulated mechanism, various statistical models will obtain, and these will often be intermediate between the additive and multiplicative models. In this situation, it would appear to be appropriate to apply some of the techniques that Dr. Breslow just described, although again it is unlikely that empirical data will discriminate definitively among the various postulated models. Jablon: Could I comment on a couple of points that Dr. Breslow raised? He is right in stating that the two kinds of projections he mentioned are important. Now the com- mittees, fortunately, are not required to forecast the future. They are not under the impression that the tables they will produce will be the last word. They will apply only to the present data and presumably will be redone as additional data and longer follow-up times become available. Un- fortunately, we are not in a position to avoid the problem of extrapolation or interpolation, whichever you choose, to low dose and low dose rates. Clearly, serious disagreements within the committees will arise on just that score. Peto: If you are going to extrapolate, some generaliza- tions apply no matter how many early or late stages you have. In the range of dose levels at which the excess risk does not greatly exceed the background risk, it is likely that you obtain approximate linearity. Some general arguments for this idea were summarized first by Dr. Hoel in the mid-70s. I have not seen anything that goes against this, and I have seen many sets of data that illustrate it nicely including the various animal experiments we recently completed, as well as considerable human epidemiology. It is when the excess greatly exceeds the background that we have no theoretical model to suggest the shape of that relationship. It could be almost anything. If you are in that situation, 1 think linear extrapolation is likely to be approximately right for all intents and purposes. Jablon: I am pleased to hear you say that; if the excess rate exceeds the background rate then you are in an area where the probability of causation would be more than 50%, and this percentage is a dividing line under tort law, which eliminates any questions; you do not have to calculate a probability. The cancer is a compensable cancer. We are in trouble at much lower levels of probability of causation where perhaps the increase is in the order of 10, 15, or 20%. If we can demonstrate that linear extrapolation is appropriate, that would be satisfying. Schottenfeld: Dr. Breslow, do you believe that the proportional hazards model addresses satisfactorily one of the concerns raised earlier of changing exposure levels over time within a cohort? Breslow: The possibility of having time-dependent expo- sure variables certainly gets around the obvious problem pointed out by a researcher some time ago, i.e., that of comparing groups of workers on the basis of their total accumulated exposure at the end of the study. Such a comparison is clearly wrong. On the other hand, it does not get us around the problem I alluded to in my paper, which is the continuing selection bias for health that goes with continuing employment. When someone moves into a NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 DISCUSSION V 179 higher exposure category, one would think the risk would be increased due to the additional exposure. However, the mortality experience is being compared with people who may have terminated employment already because they were sick. As long as we are using mortality rather than onset of diagnosis of disease as an end point, I think we are going to have problems. I want to ask Mr. Peto another question. My under- standing of the linearity argument just made is that it is based on the hypothesis of dose additivity. I want to know whether you believe that good empirical evidence exists for SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES that assumption and whether an implicit assumption underlies this statement. -Peto: Evidence is not necessary and can occur under a wide range of assumptions. You really do need a lot of peculiar assumptions to avoid it. REFERENCE (I) BEEBE GW, KATO H, LAND CE: Studies of the mortality of A-bomb survivors. 6. Mortality and radiation dose, 1950-1974. Radiat Res 75:138-201, 1978 SESSION VI Models of Chronic Disease Chairman: Nicholas Wald Co-Chairman: George Hutchison Chairman’s Remarks Nicholas J. Wald? There is no shortage of scientific speculation, but one rarely finds an investigator who can conduct studies of the type and size that can resolve the speculation and who will therefore influence future research and public health policy. Dr. E. Cuyler Hammond is one of those rare investigators who has done just this, and it is gratifying to see his colleagues at the American Cancer Society continuing with the productive research he was instrumental in initiating. I would like to focus on the field of biochemical epidemiology, one of the topics covered in this session of the Workshop. The use of laboratory methods in epidemi- ology is not new, and one might ask why interest in this area has recently been renewed. Perhaps three reasons ac- count for the current interest: First of all, the value and the economy of using stored serum samples collected as part of prospective studies have been demonstrated. Samples are retrieved from storage some time after they have been collected when disorders of medical interest have developed in some members of the population being studied. Sera from such patients and also from an appropriate group of matched controls can then be analyzed and compared. This combines the scientific advantage of the prospective ap- proach with the economy of avoiding the many thousands of biochemical investigations needed if all the samples were tested at the time people were initially recruited into the study. The approach also has the advantage of minimizing assay error by permitting the study to be based on the analysis of a limited number of samples, e.g., 200, which can be measured in 1 analytical batch rather than over many years. An example of this approach has been exploited in an attempt toward the determination of whether a possible biochemical marker of spina bifida, AFP, could be detected in the blood of pregnant women early in pregnancy. In 1974, a prospective study of pregnant women in Oxford was under way, and antenatal blood samples had been collected from over 5,000 women and the sera frozen and stored (/). Samples were retrieved from every pregnant woman bearing a fetus with spina bifida or anencephaly, and each was matched with 2 controls. At the time we were ready to perform the biochemical tests, we had 7 patients and 14 controls. All the patients had raised levels of AFP, compared with each of their 2 controls. This finding, which identified the relationship between maternal ABBREVIATION: AFP=alpha-fetoprotein. I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Department of Environmental and Preventive Medicine, St. Bartholomew's Hospital Medical College, London ECIM 6BQ, United Kingdom. serum AFP and spina bifida early in pregnancy, may well have been missed had the close matching for gestational age not been done or if the assay error had not been carefully controlled. Both were readily achieved because of the method and study design used. The finding was published 10 years ago, and now in Britain, antenatal screening by maternal serum AFP measurement is a part of routine obstetric care. A reduction of almost two-thirds of the number of infants born with central nervous system malformations has been largely due to this screening. A concurrent decline in the incidence was also partly responsible. The second reason for the current interest in biochemical epidemiology is an appreciation by scientists of the importance of defining the distribution of predictive biochemical variables. This appreciation has led to a greater awareness of what one might call the “concept of overlapping distributions” and from this the use of a more probabilistic and quantitative approach to the interpreta- tion of results. Tests are not simply performed and reported as positive or negative, but numerical values are obtained, and these are expressed more precisely as the risk of having or developing the disorder which the biochemical test was aimed at identifying. Again, an example of this can be taken from work on the antenatal detection of spina bifida. Here it has been possible for epidemiologists to characterize the distribution of AFP from maternal sera obtained from pregnant women resulting in fetal spina bifida and from those whose fetuses were unaffected. Figure | shows these distributions, and table 1 shows the odds of having an affected fetus (any neural-tube defect or open spina bifida alone) according to the screening AFP cutoff level and the birth prevalence. The third reason for the recent interest in biochemical epidemiology has arisen from investigators’ attempts to obtain precise measures of exposure to a toxic agent which would not be achievable by simple inquiry. For example, the assessment of the extent to which a person is exposed to the tobacco smoke of others is exceedingly difficult if we only have information from a questionnaire available. However, if we measure a constituent of tobacco smoke or a metabolite of a constituent of tobacco smoke which is absorbed into the blood of a person exposed to tobacco smoke, a reliable quantitative measure of exposure may be available. This appears to be so with cotinine, the principal metabolite of nicotine; table 2 shows the relationship between urinary cotinine and reported exposure to other people’s smoke among a group of nonsmokers. The subjects are classified into quintiles of reported exposure to other people’s smoke in hours of exposure during the previous 7 days. The difference in urinary cotinine from the lowest quintile of exposure to the highest is approximately 183 184 WALD TABLE 1.— Odds of women with serum AFP levels equal to or greater than specified cutoff levels at 16-18 wk of gestation having a fetus with a neural-tube defect or open spina bifida All neural-tube defects Open spina bifida Cutoff level, multiple of the median Birth prevalence/ 1,000 births Birth prevalence/ 1,000 births? 2 4 6 8 10 2 4 6 8 10 2.0 1:41 1:21 1:14 1:1 1:8 1:79 1:40 1:26 1:20 1:16 25 1:21 1:10 1:37 1:5 1:4 1:42 1:21 1:14 1:10 1:8 3.0 1:10 1:5 1:3 1:2 12 1:20 1:10 1:7 1:5 1:4 3.5 1:4 1:2 2:3 14 I:1 1:9 1:5 1:3 1:2 2 4.0 1:3 1:1 171 32 2:1 1:7 1:3 12 2:3 1:1 “ Multiple pregnancy was excluded by ultrasonography. Data are adapted from (2). ? Odds were determined with the use of birth prevalences applicable in the absence of antenatal diagnosis and selective abortion. tenfold. Therefore, it appears that urinary cotinine may be a more precise measure of “passive smoking” than can be obtained from simple questioning, and it is also likely to be highly specific because cotinine is derived only from nicotine and nicotine is derived only from tobacco.’ The great potential of this approach is that cohort studies in which urine samples have been collected will subsequently permit the investigation of the hazards of breathing other people’s smoke. Such an approach will avoid many of the problems of interpretation that apply to current studies on this subject. Urinary cotinine levels among cigarette smokers are about 300 times greater than those found in nonsmokers (table 3) and, therefore, readily identify a person who is a smoker. Nonsmokers passively exposed to other people’s smoke have levels about 1% of the level found in active cigarette smokers. Such quantitative data provide an approximate indica- tion of the extent of risk of disease that may be due to breathing other people’s tobacco smoke. If we ignore the fact that, on average, the number of years of exposure to other people’s smoke exceeds the number of years a person actively smokes and assume that the risk of lung cancer is directly proportional to exposure, we can expect the risk of 3 Recently, nicotine has been added to chewing gum, the chewing of which has been proposed as a method of assisting people to give up smoking. This will lead to cotinine formation in the body and therefore the presence of cotinine in urine. open spina bifida anencephaly unaffected pl ss Serra 0.5 1.0 25 50 100 serum AFP level (MoM) FIGURE |.—Distribution of maternal serum AFP levels at 16-18 wk of gestation in single pregnancies. Vertical axis shows the proportion of pregnancies in each distribution. MoM = multiple of the median. Data are adapted from (2). passive smoking to be about 19% of that in active smokers. This is equivalent to an absolute excess risk of about 10 deaths from lung cancer per million per year if the excess risk of lung cancer in smokers were 1,000 deaths per million per year. If the risk of death from lung cancer in nonsmokers who are not exposed to other people’s smoke were about 40 per million per year, an estimate of the relative risk is about 1.2. These estimates are similar to the aggregate estimate based on epidemiologic studies in which exposure to passive smoking was estimated by inquiry about the smoking habits of the spouses of nonsmokers. Assessment of smoking by biochemical means also has useful applications in occupational cohort studies in which it is necessary that one allow for the confounding effect of tobacco smoke. In such studies, one could collect urine samples from populations of workers and, when the risk of exposure to chemical in relation to the risk of lung cancer is studied, simply allow for smoking by stratifying the data according to level of urinary cotinine concentration rather than depend on smoking histories alone. The collection of urine samples in cohort studies will, of course, also allow the possibility of direct measurement of exposure to the chemical being studied. For these three reasons biochemical epidemiology has received new vigor, which should encourage the epi- demiologist of the 1980s to become as much a laboratory- based investigator as one concerned with the analysis of statistical data. TABLE 2.— Urinary cotinine in nonsmokers according to number of reported hours of exposure to other people's tobacco smoke within the past 7 days‘ Duration of exposure Ng of Urinary cotinine, ng/ml, tots — subjects mean + SD’ Quintile Limits, hr J Ist 0.0— 43 28+ 30 2d 1.5— 47 34+ 27 3d 4.5— 43 53+ 4.3 4th 8.6— 43 14.7 + 19.5 Sth 20.0-80.0 45 29.6 + 73.7 All 0.0-80.0 221 11.2 4 35.6 “ Day of urine sample collection is included (3). " Trend with increasing exposure was statistically significant (P<0.001). NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 CHAIRMAN'S REMARKS 185 TABLE 3.— Distribution of urinary cotinine levels in nonsmokers and smokers according to extent of exposure to other people's tobacco smoke (nonsmokers) and type of tobacco smoked (smokers)’ Udinary Exposure to other people’s tobacco smoke Cigarettes Cigars Pipes cotinine, 0 hr/wk <7 hr/wk >7 hr/wk only only only ng/ml No. Percent No. Percent No. Percent No. Percent No. Percent No. Percent <1.0 5 23 14 14 1 1 0 0 0 0 0 0 1.0— 4 18 8 8 5 5 0 0 0 0 0 0 2.0— 9 4] 38 37 11 11 0 0 0 0 0 0 4.0— 3 14 24 23 26 27 1 1 0 0 0 0 8.0— 1 4 18 18 28 29 0 0 1 2 0 0 16.0— 0 0 0 0 14 15 1 1 2 4 0 0 32.0— 0 0 0 0 6 6 1 1 4 7 0 0 64.0— 0 0 0 0 4 4 0 0 5 9 0 0 128.0— 0 0 0 0 0 0 4 3 15 27 0 0 256.0— 0 0 0 0 2 2 4 3 4 7 2 5 512.0— 0 0 0 0 0 0 17 13 8 14 2 5 1,024.0— 0 0 0 0 0 0 58 44 10 18 17 42 2,048.0— 0 0 0 0 0 0 40 30 7 12 13 32 =4,096.0 0 0 0 0 0 0 6 5 0 0 6 15 All 22 100 102 100 97 100 132 100 56 100 40 100 See (3). REFERENCES measurement in antenatal screening for anencephaly and spina bifida in early pregnancy. Report of U.K. Collabora- (I) WALD NJ, Brock DJ, BONNAR J: Prenatal diagnosis of tive Study on Alpha-fetoprotein in Relation to Neural-Tube spina bifida and anencephaly by maternal serum-alpha- Defects. Lancet 1:1323-1332, 1977 fetoprotein measurement. A controlled study. Lancet (3) WALD NJ, BOREHAM J, BAILEY A, et al: Urinary cotinine as 1:765-767, 1974 marker of breathing other people’s tobacco smoke. Lancet (2) WALD NJ, CuckLE H: Maternal serum-alpha-fetoprotein 1:230-231, 1984 SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES Cancer Risk Factors in Human Studies ! John Higginson ? ABSTRACT —The historical developments leading to the ac- ceptance of the influence of dietary and behavioral aspects of our life-style on cancer are reviewed. However, present information is usually insufficient to permit description of the complex mechanisms involved that are unlikely to yield to classical epidemiologic approaches alone. Better integrated laboratory epidemiologic studies are required that use more advanced r.onintervention techniques. Progress may be slow in the identifi- cation of such factors in view of the many parameters involved, the absence of a single predominant or avoidable cause in many cancers, and the lack of adequately developed laboratory tech- niques for epidemiologic application.— Natl Cancer Inst Monogr 67: 187-192, 1985. Dr. Hammond’s contributions to the field of cancer causation and prevention are widely recognized, especially the unique American Cancer Society cohort study designed to evaluate the causal role of cigarettes and other life-style factors in neoplastic disease, the results of which are still being appraised. I will attempt to provide a brief overview of the importance and historical background of carcino- genic risk factors in life-style-related cancers and the direction of future research, with special reference to the complexities of defining objectively life-style parameters. Most educated individuals understand the concept of a carcinogen as a defined substance or factor which directly or indirectly induces cancer. Such factors can often be expressed and measured in physicochemical terms, e.g., a specific chemical, as can also such defined factors as cigarette smoking or alcoholic beverages. In contrast, various factors are associated with an increased cancer risk, such as dietary fiber deficiency, age at first pregnancy, obesity, etc. These obviously cannot be defined as “carcinogens” in the sense described above but are usually termed “carcinogenic risk factors.” The nature of the metabolic and biochemical mechanisms underlying such risk factors remains to be determined. Thus their investi- gation in humans offers a major intellectual challenge to future researchers in cancer and other chronic diseases. As understanding increases regarding pivotal basic molecular events and associated markers, cohort studies may prove helpful in the epidemiologic investigations of such risk factors when their effects may be subtle and not suitable for case history analysis. | Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Universities Associated for Research and Education in Pathology, Inc., 9650 Rockville Pike, Bethesda, Maryland 20814. BACKGROUND Although the first studies on human cancer causation were largely related to occupational hazards, risk factors were recognized early in the nineteenth century when Stern [cited in (1)] described the high frequency of breast cancer and the low frequency of uterine (cervical) cancer in nuns. For the first part of the present century, research on cancer causes concentrated on discrete chemical, physical, and biologic carcinogens, especially following the recognition that experimental research in animals could identify potential human carcinogens. The demonstration that ovariectomy could modify the clinical course of breast carcinoma (2) and later studies on the carcinogenic effects of estrogens in rabbits (3) laid the base for much modern research. In 1947, Willis (4) emphasized Lane-Claypon’s view (5) that certain reproductive habits were causally related to cancers of the breast and other endocrine- dependent organs. At the same time, Tannenbaum and Silverstone (6) initiated the modern study of diet by examining the modifying role of fat, calories, etc. in carcinogenesis in animals. These and other studies com- plemented Berenblum’s early work (7, &) on the concept of two-stage carcinogenesis and emphasized the study of agents acting on the later stages of carcinogenesis (9). The first modern symposium on the causes of human cancer was held in Oxford in 1950; participants reviewed the geographic pathology of cancer and its significance for future research (/0). Most accepted the view that customs and life-style factors, as well as discrete carcinogens, were involved in human carcinogenesis, and, for the first time, cigarette smoking received considerable attention. At a time when most workers considered the carcinogenic effect of diet only in regard to ingested carcinogens, the non- specific influence of diet at all stages of carcinogenesis was emphasized by Kennaway (/7). No new research proposals indicated how multistage mechanisms and risk factors were to be studied in humans, which showed the large gap between recognition of an hypothesis and its confirmation. With the Oxford symposium as a basis, Oettlé and 1 (/2) attempted to investigate the factors influencing cancer patterns in the South African Bantu. We assumed that by examining the cancer patterns in a community converting rapidly from a nonindustrial to an industrial society and by making comparisons with industrial North America and Europe, we might identify carcinogenic chemicals and other factors related to industrialization. No new dis- coveries were made, but cancer was considerably rarer and patterns were clearly different from those in Europe and North America, which demonstrated the validity of studies of communities living in different environments and cultures (geographic pathology) in the search for etiologic clues. It was confirmed that Bantu reproductive habits, 187 188 HIGGINSON including early childbearing, were consistent with the incidence of cancers of the breast, ovary, and uterus, but, interestingly, endometrial cancer was exceptionally rare, and a cervix:body ratio of over 100:1 was observed. In addition, evidence was obtained that the high-roughage diet was negatively associated with cancers of the large intestine (which occurred infrequently). Childhood malnu- trition (kwashiorkor) was widespread in the Bantu com- munity at that time. Because this was before the discovery of aflatoxin, we assumed that malnutrition and viral hepatitis were involved as co-carcinogens in cancer of the liver. Important animal studies that suggested malnutrition had an inhibitory effect on most cancers were ignored. In retrospect and on the basis of earlier animal studies (6), one could argue that the late maturity of Bantu females due to infantile malnutrition was one factor in their low incidence of certain endocrine-dependent cancers, e.g., breast, because the individual was programmed to respond dif- ferently to stimuli in adult life. From these soft data, we concluded that certain classical carcinogenic stimuli as identified in Europe and North America were responsible for a limited number of cancers in the Bantu, but that a high proportion of all cancers, especially in females, were dependent on the Bantus’ “way of life,” i.e., tribal customs, diet, and reproductive habits. Furthermore, the term “socioeconomic level” only had meaning within a specific defined community and not between countries. At that time, one could neither analyze the specific risk factors involved nor determine how such life-style factors could best be identified in the absence of satisfactory hypotheses for testing. Other studies showing similar patterns were conducted by Davies (/3) in East Africa and by others (/4) in West Africa. In contrast, studies in the Indian subcontinent (/5) emphasized the role of more definable cultural factors, such as betel chewing and cigarette smoking, especially in cancers of the mouth and upper digestive system. In many developing countries, most of the cancers occurring in females seemed dependent on their way of living. Thus the incidence of breast and endometrial cancers was low and that of the cervix extremely high. The identification of populations at low risk to many cancers in Africa and Asia coincided with the studies of Wynder and associates (16, 17) on low-risk populations within the United States, such as the Seventh- Day Adventists, which demonstrated the impact of life- style factors. Moreover, Haenszel and Kurihara (/8) through their studies of migrants showed the important influences of the early environment in childhood and adolescence in certain cancers. Additional support for behavioral habits was provided by MacMahon et al. (19) on the effect of age at first pregnancy in females with breast cancer in a well-designed international case-control study. By the end of the 1960s, a considerable body of information had been collected on the role of carcinogenic risk factors related to life-style. Such data, though demon- strating the role of environment in carcinogenesis, empha- sized that dietary, cultural, and behavioral patterns were part of this environment (concepts that go back to Hippocrates). These views, accepted by many epidemi- ologists, did not receive wide attention because environ- mental carcinogenesis was still largely perceived as caused by direct carcinogens (chemical or biologic), and the possible application of modern biochemistry to the analysis of life-style factors tended to lag. However, by the late 1970s, there was a swing away from concentration only on direct chemical initiators to the study of promoters, enhancers, inhibitors, diet, and host factors in carcinogens (/7, 20-27) and their effects on an individual’s metabolism and biochemistry. These changes were further stimulated by the recognition that diet and other aspects of one’s living habits were major determinants in human cardiovascular disease. PRESENT SITUATION The most important contributions of epidemiology to cancer control in the past have been in the identification of strong carcinogenic risks, e.g., cultural habits, occupational exposures, chemicals, etc. Today we know with reasonable certainty the cause of approximately 50% of cancer in males and 10-20% in females in the United States and Europe (16, 25, 28). In the risk-habit-exposure relation- ship, cause implies identification of a factor in whose absence a measurable portion of a specific human cancer would not occur. This concept of avoidable or major cause does not consider differences in mechanisms between promoters and genotoxic carcinogens, provided that the suspected factor can be identified and measured, nor the impact of other environmental and host factors. However, consideration of mechanisms may significantly modify an epidemiologist’s interpretation of dose responses as well as his approach to preventive strategies. Nonetheless, with few exceptions, such as liver cancer and its relation to aflatoxin and hepatitis B virus, our overall knowledge of new cancer causes with resultant practical public health possibilities for control has ad- vanced little since 1960. Apart from such cultural habits as tobacco and alcohol (chemical mixtures), few carcinogenic risk factors related to life-style or host susceptibility have been defined with sufficient precision, e.g., age at first pregnancy, to provide a preventive strategy for most tumors of the gastrointestinal tract and endocrine-de- pendent organs, such as breast, uterus, ovary, and prostate. It is assumed that the above risk factors related to diet and behavior will eventually be defined in objective physiologic or biochemical terms. EPIDEMIOLOGIC APPROACHES AND THE ROLE OF RISK FACTORS The theoretical importance of our integrating sophis- ticated laboratory techniques in human epidemiologic studies to permit meaningful understanding of fundamental etiologic relationships has been appreciated for some time (8, 10, 22, 29-31). Although this approach was among the earliest proposed programs for the International Agency for Research on Cancer, the development of specific investigations has proved difficult. Terms such as “meta- bolic epidemiology” and “molecular epidemiology” have not added any new concepts to the study of life-style factors, but they illustrate a growing change in research attitudes. Furthermore, the acceptance of the multistage hypothesis for carcinogenesis (7, 8) and recent progress in NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 CARCINOGENIC RISK FACTORS 189 technology, permitting the application of more sophisti- cated nonintervention methods to humans, have provided a sound intellectual base for the direction of future research (29-31). However, such approaches have some statistical and biologic limitations. The limitations are well illustrated in dietary studies in relation to cancer that often show inconsistencies as to the biologic significance of individual components, such as unsaturated fats, protein, cholesterol, green vegetables, etc. The overall impact of an individual food is not easily measured in retrospective case history studies, a fact which is not surprising due to the biologic complexity of individual components (26, 32-39). For example, green vegetables contain factors which both activate and inacti- vate carcinogens (26, 39). Dietary questionnaires can give only general and imprecise responses to many quantitative questions. Also, some nutrients such as protein and fat, which may enhance carcinogenesis in animals, are essential for the maintenance of normal health. (Among nutri- tionists, no agreement has been reached as to what constitutes an “optimum” diet.) Relevant experimental models may test such risk factors at extreme pharmacologic or toxicologic doses, whereas variations in humans may be within a comparatively small range, so that effects may be subtle and almost impossible for one to detect in a homogeneous population, inasmuch as precise measure- ments of the risk can seldom be made. Similarly, many uncertainties surround the role of endocrine secretions in human cancer, not only as initiators or promoters but also their interactions with diet and behavior; few would deny their importance (40-45). Al- though differences in cancer frequencies among females may be explained partly by hormonal variations associated with behavioral habits, the possible role of other agents acting as co-carcinogens cannot be excluded, e.g., herpes virus in carcinoma of the cervix. So far, seroepidemiologic and specific viral studies are equivocal. Investigators working on urinary and blood levels of hormones in persons living in different environments suggest that variations may often be too slight to evaluate so that causal associations with cancer patterns may prove difficult to establish. Such practical limitations must be recognized in any plan of a cohort study, and individual parameters must be chosen and measured with discrimination so that overloading the study with too many variables is avoided. In such instances, geographic correlations may provide guidelines to possible key questions in a cohort study and thus define objectives. Animal models have demonstrated the great number and complexity of potential modulating mechanisms involved in human life-style that can be identified. For example, in relation to diet, such factors include not only classical promoters but a wide range of co-carcinogens, enhancers, enzyme-inducers, inhibitors, etc., as described in the report of the National Research Council (32, 33). Furthermore, even for the many factors classified as promoters, such as phorbol esters, estrogens, phenobarbital, unsaturated fat, etc., common pivotal mechanisms remain to be confirmed, although certain possibilities have been proposed by Trosko and Chang (46). However, when suspected indi- vidual promoters can be identified and measured, such as SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES conjugated estrogens in endometrial cancer or asbestos in lung cancer, they may be analyzed by standard epidemi- “ologic techniques and regarded as “operational carcino- gens” from a public health viewpoint. In some models, promoters have been shown to induce cancer in the absence of a defined initiator, and the question has been raised as to whether tumor induction in such situations depends on the presence of already spon- taneously initiated cells. Theoretically, such initiated cells could result from a spontaneous error-prone repair, background mutation, oncogene activation, etc., as sug- gested by the existence of mouse strains with high or low spontaneous hepatoma roles. Alternatively, they may reflect a random but rare response to any of the many exogenous and endogenous mutagenic or genotoxic factors (chemical or viral) to which an individual cell is con- tinuously exposed. Thus the endogenous formation in humans of N-nitroso compounds is an established fact, as is the existence of a wide range of animal and human carcinogens in the normal environment. Infection by certain suspected viruses, e.g., hepatitis B, Epstein-Barr virus, or herpes virus type II, is widespread, and only seldom are they associated with cancer induction. The concept of such background initiated cells was emphasized by Berenblum (7, 8) as being important in developing cancer control strategies because it directed attention to late-stage carcinogenesis and life-style factors; the control of a myriad of potential initiators was not feasible. INDIVIDUAL SUSCEPTIBILITY The increased susceptibility to cancer of certain indi- viduals with inherited diseases is widely accepted. However, many researchers also tend to assume that individual susceptibility in the general population will be readily measurable by a limited number of biochemical or meta- bolic parameters that permit distinction between high- and low-risk individuals when techniques easily applicable to large populations are used. In many instances, individual susceptibility may be random and may reflect a myriad of biochemical and metabolic interactions, partly inherited, partly environmental, with no individual factor being definable or obviously predominant. Susceptibility under such conditions will not be easily analyzed in a typical case-control or cohort study in a homogeneous community. Moreover, if certain endocrine-related cancers depend on hormones operating within the normal physiologic range, the role of the latter will be difficult to identify in small- scale studies. Thus one may find it necessary to search for clues or to test hypotheses by examining differences in metabolic and enzymatic profiles in populations with different cancer and other chronic disease patterns, such as migrants, Mormons, or Africans (22, 24, 43-45). However, Conney et al. (39) have shown that significant changes can be induced by short-term dietary variations. Thus establish- ment of long-term, meaningful base lines may prove difficult. Further complicating factors are the long latent period of cancer and the effect of age. Rates of change in migrants for various cancers (/8) and experimental studies suggest that neoplasia may be significantly affected in the fetal stage or 190 HIGGINSON in preadolescence by dietary and endocrine stimuli which presumably permanently program individual susceptibility at an early age. This may explain why the marked changes occurring in the diet consumed in the United States over the last three decades (approximately 50% in the type of fat eaten) have had little obvious impact on many cancers in adults (34). On the other hand, no differences in endocrine profiles were found in a geographic study on adolescents in the United States and Japan who differed in their dietary and breast cancer profiles (44). The fundamental question of the age at which a prospective cohort study is to be com- menced must be addressed because studies beginning in childhood will increase logistical problems enormously. Differences in susceptibility to cancer and other diseases between socioeconomic groups can also be linked to life- style (47), a fact recognized at the end of the last century (48). Some modern industries have unusually healthy workforces compared with the general population which, for cancer, cannot readily be ascribed to the healthy worker effect or to selection. The implications of such socio- economic factors on the total burden of ill health can no longer be ignored by oncologists in their exploration of preventive strategies. Although widely neglected, socio- economic studies possibly offer a useful approach to the identification of dietary and behavioral life-style factors within an otherwise homogeneous community. FURTHER STUDIES Despite the attractiveness of large-scale prospective studies, epidemiologists’ attempts to design specific studies on the nature of life-style remain difficult when concrete hypotheses are lacking; sophisticated laboratory techniques do not replace the necessity for one to define and to ask specific questions of biologic relevance. Initially, transverse correlation or ecologic studies on populations differing in cancer profiles and environments may prove more useful and less expensive for the testing of preliminary hypotheses. The following approaches to cohort studies are worth consideration: 1) The recognition in an epidemiologic study that an association exists between an associated life-style risk factor and a specific cancer should lead to studies on drugs or chemicals that may involve similar mechanisms, e.g., oral contraceptives, retinoids, cholesterol-inhibiting drugs, enzyme inducers, etc. If a causal association exists, its impact should be magnified in such studies. 2) The limitations of questionnaires on dietary back- ground are well recognized and have suggested that more intensive analytical studies on the identification and mea- surement of specific dietary components are necessary for evaluation in a prospective cohort. However, the recent reports by the National Academy of Sciences (33) and Ames (38) show how complex and difficult is the identifica- tion of the individual effects of multiple risk factors in the diet. 3) Analysis of the metabolites of exogenous carcinogens, e.g., DNA adducts in tissues, body fluids, or excreta, in populations living in different environments, is now con- sidered a possible approach as the methods have become extremely sensitive (30, 49). Although they may be useful at high levels of exposures as markers of exposure, such approaches may prove unsatisfactory at low levels because a search for a rare event in a single or few somatic cells is implied, and the problems of other confounding adducts may prove insoluble. Similar limitations apply to other approaches identifying altered DNA and oncogenes (30, 31, 49). However, if specific damage attributable to an agent can be demonstrated directly or indirectly, its possible appli- cation in appropriate epidemiologic studies may be great. 4) The analysis in high- and low-risk populations of persistent exogenous xenobiotics present in fat or organs, e.g., chlorinated hydrocarbons, may provide useful evi- dence of previous exposures and allow evaluation of their cancer-causing potential (49). 5) The application of in vitro tests, e.g., mutagen activation, in the study of biologic variations in body fluids in different external environments may prove to be not only a useful tool for identifying carcinogen exposures but also for evaluation of the impact of life-style factors on carcinogen activation at high exposure levels. 6) The measurement in tissues and excreta of endoge- nously formed carcinogens, such as N-nitroso com- pounds, formaldehyde, cholesterol epoxides, etc. (31) remain largely at the experimental level. 7) Studies of enzymatic profiles, including enzyme induction, sex-linked enzyme imprinting, etc., which are believed to influence individual susceptibility require in- tensification (49, 50). Such studies may explain partly the effects of diet in children and adults. 8) The study of certain familial diseases such as familial polyposis may provide clues to risk factors affecting late- stage carcinogenesis. 9) Cohort studies may be most important in indicating that certain life-style factors are unimportant under a range of conditions in a population. Such data may be of considerable value in defining practical public health strategies. Furthermore, when no effect can be demon- strated in a correlation or a cohort study of an extensive mixture of suspected life-style or carcinogenic factors, each individual component is not likely to be of great signifi- cance except in exceptional circumstances, and individual analysis may be impossible. When investigators apply sophisticated laboratory techniques to epidemiologic studies, especially in relation to the identification of exposure markers, it is essential that the techniques be based on an adequate experimental base. If the particular parameters selected are not of pivotal etiologic importance, few inferences may be possible. Moreover, concentration on limited parameters without considering interactions with other unknown or unmea- sured parameters may lead to false conclusions. Thus the inverse relationship of fat and fiber in many diets may complicate evaluation of the role of fat. A review of possible laboratory approaches to identifica- tion and measurement of factors at low doses that can modify early- or late-stage carcinogenesis in human popula- tions, including those related to life-style, has been recently published by the National Institute of Environmental Health Sciences (49). Although many approaches were suggested over a decade ago, overall concrete results have NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 CARCINOGENIC RISK FACTORS 191 been disappointing, partly due to biologic or technologic limitations and partly to perceived difficulties; such studies have not been adequately exploited. While emphasizing geographic variations in biochemi- cal epidemiology, we should not forget that defined factors, such as smoking, alcohol ingestion, and exposure to sunlight, explain many of the differences in cancer incidence between the low-risk populations of Africa and Asia and the high-risk populations in New Zealand, Switzerland, and North America. Thus the studies by Jensen (51) on Seventh-Day Adventists and members of temperance societies emphasize the tremendous impact on the cancer burden of tobacco and alcohol in males and the lesser role of other life-style factors in such low-risk popula- tions. This contrasts with the situation in females and their different cancer spectrum because life-style cancers tend to predominate. Nonetheless, newer approaches may prove more en- couraging, such as that of Bartsch and co-workers (52). These workers have developed a suitable method for evaluating and measuring endogenous N-nitroso formation in different dietary contexts. The method has been tested in areas of high and low esophageal cancer in China, with the conclusion that in the former region, exposure to endogenous N-nitroso compounds is much higher. Further- more, ascorbic acid has been demonstrated to have an inhibiting impact on N-nitroso formation. Such studies have also led to the identification of a new potentially significant N-nitroso compound. If, in fact, these com- pounds are significant factors in human carcinogenesis as many believe, correlation studies of this type may be useful. In the high incidence area, however, almost all individuals show preneoplastic esophageal mucosal lesions, which suggests universal exposure to the unknown factor. In these situations, neither case history nor cohort studies with cancer as the only end point are likely to be effective unless a clear dose-response relationship can be demonstrated. The extent to which developments in oncogene research (53) may influence epidemiologic investigations on life-style is unclear. However, with evidence that certain markers may be identified with carcinogen activation and that a cascade of events are involved in transformation, the possibility that life-style may enhance or trigger gene expression cannot be excluded and may provide new approaches to an evaluation of the role of late-stage risk factors. Nonetheless, the evidence suggests that the present concentration of epidemiologic research on identification of exogenous carcinogenic factors and risk factors that may affect late-stage carcinogenesis is necessary and not a misplaced effort. Molecular epidemiology is now a defined science and is pertinent to the study of risk factors in humans. CONCLUSIONS Although the evidence is strongly supportive of the role of life-style factors in human cancer, there is no certainty that the pertinent modulating mechanisms will be under- stood easily or identified with sufficient precision for cancer control, even by an integrated multidisciplinary approach. The wider the range of interactions involved, the more SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES difficult it becomes for us to separate readily or to define the individual components. Furthermore, the epidemi- ologist must distinguish between stimuli causing pivotal or fundamental effects and a wide range of ill-defined enhancing or inhibiting factors which only act indirectly in the presence of a further factor. At best, experimental models can only provide guidelines as to possible mechanisms which may not be easily extrapolated to humans now, although they may be useful in indicating active approaches to cancer control, such as chemopre- vention. However, recognition of the role of modulating and late-stage mechanisms provides a biologic base for acceptance of the importance of carcinogenic risk factors as distinct from defined carcinogens and is open to testing by epidemiologic methods. Such research programs must be long-range and divorced from the immediate needs of generic legislation or cancer prevention. REFERENCES (I) CLEMMESEN J (ed): The female genital system. In Statistical Studies in Malignant Neoplasms: Review and Results, vol I. Copenhagen: Munksgaard, 1965, pp 249-342 (2) BEATSON CT: On the treatment of inoperable cases of carcinoma of the mamma: Suggestions for a new method of treatment, with illustrated cases. Lancet 104:162-167, 1896 (3) LACASSAGNE A: Appearance of mammary adenocarcinomas in male mice treated with a synthetic estrogenic substance. C R Soc Biol (Paris) 129:641-643, 1938 (in French) (4) WILLIS RA: Pathology of Tumours. London: Butterworths, 1948 (5) LANE-CLAYPON JE: A Further Report on Cancer of the Breast with Special Reference to Its Associated Antecedent Conditions. Rep No. 32 of the Ministry of Health. London: HM Stat Off, 1926 (6) TANNENBAUM A, SILVERSTONE H: Nutrition and genesis of tumours. /n Cancer (Raven RW, ed), vol 1. London: Butterworths, 1957, pp 306-334 (7) BERENBLUM I: Historical perspectives. /n Carcinogenesis— A Comprehensive Survey. Mechanisms of Tumor Promo- tion and Cocarcinogenesis (Slaga TJ, Sivak A, Boutwell RK, eds), vol 2. New York: Raven Press, 1978 : Theoretical and practical aspects of the two-stage mechanism of carcinogenesis. /n Carcinogens: Identifica- tion and Mechanisms of Action (Griffin AC, Shaw CR, eds). New York: Raven Press, 1979, pp 25-36 (9) DAY NE, BROWN CC: Multistage models and primary prevention of cancer. INCI 64:977-989, 1980 (10) CLEMMESEN J (ed): Symposium on Geographical Pathology and Demography of Cancer. Paris: Council Int Org Med Sci, 1950 (11) KENNAWAY EL: Forms of cancer in man suitable for investigation. /n Symposium on Geographical Pathology and Demography of Cancer (Clemmesen J, ed). Paris: Council Int Org Med Sci, 1950, pp 122-124 (12) HIGGINSON J, OETTLE AG: Cancer incidence in the Bantu and “Cape colored” races of South Africa: Report of a cancer survey in the Transvaal (1953-55). J Natl Cancer Inst 24:589-671, 1960 (13) DAVIES JN, WILSON BA, KNOWELDEN J: Cancer incidence of the African population of Kyadondo, Uganda. Lancet 283:328-330, 1962 (8) 192 (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (29) (25) (26) (27) (28) (29) (30) (31) (32) HIGGINSON International Union Against Cancer: Cancer in Africa. Report of a Symposium in Leopoldville, September, 1956, Acta Un Int Contra Cancr 13:881-975, 1957 KHANOLKAR VR: Cancer in India in relation to race, nutrition and customs. /n Symposium on Geographical Pathology and Demography of Cancer (Clemmesen J, ed). Paris: Council Int Org Med Sci, 1950, pp 51-60 WYNDER EL, GORr1 GB: Contribution of the environment to cancer incidence: An epidemiologic exercise. J Natl Cancer Inst 58:825-832, 1977 WyYNDER EL, HOFFMANN D, McCoy D, et al: Tumor promotion and cocarcinogenesis as related to man and his environment. /n Carcinogenesis—A Comprehensive Sur- vey. Mechanisms of Tumor Promotion and Cocarcino- genesis (Slaga TJ, Sivak A, Boutwell RK, eds), vol 2. New York: Raven Press, 1978, pp 59-77 HAENSZEL W, KURIHARA M: Studies of Japanese migrants. I. Mortality from cancer and other diseases among Japanese in the United States. J Natl Cancer Inst 40:43-68, 1968 MACMAHON B, COLE P, LIN TM, et al: Age at first birth and breast cancer risk. Bull Osaka Med Sch 43:209, 1979 MILLER EC, MILLER JA, BROWN RR, et al: On the pro- tective action of certain polycyclic aromatic hydrocarbons against carcinogenesis by aminoazo dyes and 2-acetyla- minofluorene. Cancer Res 18:464, 1958 MILLER JA, MILLER EC: Perspectives on the metabolism of chemical carcinogens. /n Environmental Carcinogenesis— Occurrence, Risk Evaluation, and Mechanisms (Emmelot P, Kriek E, eds). Amsterdam: Elsevier/North Holland Biochemical Press, 1979, pp 241-263 HIGGINSON J: Present trends in cancer epidemiology. In Proceedings of the Eighth Canadian Cancer Conference (Morgan DF, ed). Toronto: Pergamon Press, 1969, pp 40-75 WEISBURGER JH, COHEN LA, WYNDER EL: On the etiology and metabolic epidemiology of the main human cancers. In Origins of Human Cancer (Hiatt HH, Watson JD, Winsten JA, eds). Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory, 1977, pp 567-602 WYNDER EL, HOFFMANN D, CHAN P, et al: Interdisci- plinary and experimental approaches: Metabolic epidemi- ology. In Persons at High Risk of Cancer. An Approach to Cancer Etiology and Control (Fraumeni JF Jr, ed). New York: Academic Press, 1975, pp 485-502 HIGGINSON J, MUIR CS: Environmental carcinogenesis: Misconceptions and limitations to cancer control. J Natl Cancer Inst 63:1291-1298, 1979 WATTENBERG LW: Inhibitors of chemical carcinogens. In Environmental Carcinogenesis—Occurrence, Risk Evalua- tion, and Mechanisms (Emmelot P, Kriek E, eds). Amsterdam: Elsevier/ North Holland Biochemical Press, 1979, pp 241-263 CAIRNS J: The origin of human cancers. Nature 289:353-357, 1981 DoLL R, PETO R: The Causes of Cancer. Oxford: Oxford Univ Press, 1981 DoLL R, VobpiruJA I: Host Environment Interactions in the Etiology of Cancer in Man. IARC Sci Publ No. 7. Lyon: IARC, 1973 PERERA FP, WEINSTEIN IB: Molecular epidemiology and carcinogen-DNA adduct detection: New approaches to studies of human cancer causation. J Chronic Dis 35:581-600, 1982 BARTSCH H, ARMSTRONG B: Host Factors in Human Car- cinogenesis. IARC Sci Publ No. 39. Lyon: IARC, 1982 National Research Council, Assembly of Life Sciences: Diet, Nutrition, and Cancer. Washington, D.C.: Natl Acad Press, 1982 (33) National Research Council, Commission on Life Sciences: Diet, Nutrition, and Cancer: Directions for Research. Washington, D.C.: Natl Acad Press, 1983 (34) HIGGINSON J: Summary: Nutrition and cancer. Cancer Res 43(Suppl):2515s-2518s, 1983 (35) FEINLEIB M: Summary of a workshop on cholesterol and noncardiovascular disease mortality. Prev Med 11:360-367, 1982 (36) ZArIDZE DG: Environmental etiology of large-bowel cancer. JNCI 70:389-400, 1983 (37) KINLEN LJ: Fat and cancer. Br Med J 286:1081-1082, 1983 (38) AMES BN: Dietary carcinogens and anticarcinogens. Science 221:1256-1264, 1983 (39) CoNNEY AJ, PANTUCK EJ, PANTUCK CB, et al: Role of environment and diet in the regulation of human drug metabolism. /n The Induction of Drug Metabolism (Eastbrook RW, Lindenbaub E, eds). Stuttgart, New York: F. K. Schattauer Verlag, 1978, p 583 (40) McMICHAEL AJ, POTTER JD: Reproduction, endogenous and exogenous sex hormones, and colon cancer: A review and hypothesis. INCI 65:1201-1207, 1980 (41) HENDERSON BE, Ross RK, PIKE MC, et al: Endogenous hormones as a major factor in human cancer. Cancer Res 42:3232-3239, 1982 (42) KIRSCHNER MA, SCHNEIDER G, ERTEL NH, et al: Obesity, androgens, estrogens, and cancer risk. Cancer Res 42(Suppl):3281s-3285s, 1982 (43) MACMAHON B, COLE P, BROWN JB, et al: Urine estrogens, frequency of ovulation, and breast cancer risk: Case- control study in premenopausal women. JNCI 70:247-250, 1983 (44) GrAY GE, PIKE MC, HIRAYAMA T, et al: Diet and hormone profiles in teenage girls in four countries at different risk for breast cancer. Prev Med 11:108-113, 1982 (45) MACMAHON B, COLE P, BROWN JB, et al: Urine estrogen profiles of Asian and North American women. Int J Cancer 14:161-167, 1974 (46) TrOSKO JE, CHANG CC: An integrative hypothesis linking cancer, diabetes and atherosclerosis: The role of mutations and epigenetic changes. Med Hypotheses 6:455-468, 1980 (47) Office of Population Censuses and Surveys: Occupational mortality. The Registrar General’s Decennial Supplement for England and Wales, 1970-72. Ser DS, No. 1. London: HM Stat Off, 1978 (48) LOGAN WP: Cancer mortality by occupation and social class, 1951-1971. IARC Sci Publ No. 36. Lyon: IARC, 1982 (49) U.S. Department of Health and Human Services: Health Effects of Toxic Wastes. Environmental Health Prospec- tives, volume 48. Washington, D.C.: U.S. DHHS, 1980, p 144 (50) CIBA Foundation: CIBA Foundation Symposium 76: Environmental Chemicals, Enzyme Function and Human Disease (Wolstenholme G, ed). Amsterdam: Excerpta Medica, 1980 (51) JENSEN OM: Cancer risk among Danish male Seventh-Day Adventists and other temperance society members. JNCI 70:1011-1014, 1983 (52) BARTSCH H, OsHIMA H, MuRNoz N, et al: Assessment of endogenous nitrosation in humans in relation to the risk of cancer of the digestive tract. In Proceedings of the Third International Toxicology Conference, San Diego, 1983. Amsterdam: Elsevier/ Amsterdam Biomedical Press. In press (53) HAMLYN P, SIKORA K: Oncogenes. Lancet 2:326-330, 1983 Biologic Banking in Cohort Studies, With Special Reference to Blood ' 2 Nicholas L. Petrakis ®* ABSTRACT —Those who conduct cohort studies in cancer epidemiology increasingly use biochemical analyses as an im- portant component. Some of the potentially important considera- tions when banked blood is used include the conditions and temperature of storage, effects of thawing, and the stability of specific substances under prolonged subfreezing temperatures. I have reviewed a selected number of biochemical substances. — Natl Cancer Inst Monogr 67: 193-198, 1985. Laboratory methods were introduced into epidemiologic research during the 19th century when bacteriology and pathology became the basis for the investigation of infectious diseases (/). More recently, specific concepts and techniques derived from molecular biology, biochemistry, immunology, genetics, endocrinology, nutrition, etc., have led to various labels for epidemiology, such as biochemical, molecular, genetic, metabolic, and nutritional. The poten- tial value of the laboratory approach lies in its use of specific tests for biochemical, immunologic, genetic, or other factors said to be involved in the pathogenesis of a particular disease. These methods, supplemented by less specific but essential, descriptive techniques, offer im- portant new opportunities to help define the high- and low- risk members of the population and to develop preventive measures. More recently, the cancer field has been stimulated by reports of findings based on banked blood obtained in ABBREVIATION: IARC=International Agency for Research on Cancer. I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Supported in part by Public Health Service grant POI- CA13556 from the National Cancer Institute. 3 Department of Epidemiology and International Health, School of Medicine, University of California, San Francisco, California 94143. 4 1 thank Dr. Sheldon Margen of the University of California at Berkeley; Drs. Girish Vyas, Selna Kaplan, and John Kane of the University of California, San Francisco; and Dr. Robert Gellert of the Kevex Corporation, Burlingame, California, for information and suggestions on the stability of certain biochemical substances at subfreezing temperatures. I also acknowledge the assistance of Dr. George Comstock of The Johns Hopkins University School of Public Health and Hygiene, Baltimore, Maryland; Dr. Walter Willett of the Harvard School of Public Health, Boston, Massachusetts; and Dr. Abraham Nomura of the University of Hawaii, Honolulu, Hawaii, for sharing information on their unpublished data on the presence of retinols in banked frozen blood. cohort studies of cardiovascular disease, e.g., Framingham, Massachusetts, and Evans County, Georgia (2). These reports indicate that persons who had low plasma cholesterol levels at time of entry into the study are at higher risk of cancer than those who had elevated entry levels of plasma cholesterol. In other studies, low vitamin A serum levels were associated with increased cancer risk (3-5). The latter reports have provoked some discussion regarding their interpretation. One view of the cholesterol findings is that they are a likely reflection of existing latent cancers. The vitamin A studies, supported by experimental animal data, have been interpreted by many investigators as being of possible etiologic significance and have led to major intervention trials with beta-carotene and retinol in an attempt to prevent cancer (6). Peto (7) recently reviewed some of the problems in biochemical epidemiology of the retinoids and carotenoids. Because of the large number of new laboratory tests of potential interest to epidemiologists, a brief review is in order of the methods of collection, storage, and interpretation of cohort studies in cancer that have used banked blood and other banked biologic materials. Con- siderable information exists in the literature on the stability of specific biochemical substances in blood stored at subfreezing temperatures, but this important information is neither known nor readily available to many epidemiologists. The potential importance of banked biologic materials for research in cancer epidemiology is emphasized by a recent memorandum from the IARC on “Biologic Ma- terials Banks” (8). The memorandum contains information on existing banks of biologic materials obtained from persons who are said to be traceable should they later develop cancer. As of January 31, 1983, the IARC has received reports of 383 collections of varying sizes and materials. Various materials have been stored, with blood serum and cells comprising 54%. Variations in the sizes of the populations are considerable, as are their geographic locations, traceability, and the extent of information available on them. In the IARC table, significant informa- tion is missing on the conditions of the collections and on the temperatures for long-term storage of these materials, all of them factors of considerable importance in in- terpretation of the results of biochemical and other studies that have used or will use these collections. Even if this information can be made available, it is likely that many of these banks are maintained under conditions unfavorable for proper preservation and subsequent useful analysis of many constituents of interest, especially currently still unrecognized markers that may be of future importance. However, many substances, e.g., trace elements, will still be measurable, whereas others will have been lost. 193 194 PETRAKIS BIOLOGIC BANKING General Considerations Recent research in cryobiology has provided valuable practical information on methods of banking blood and other tissues (9-11) for the subsequent analysis of various potentially labile substances. The factors to be considered in most subsequent analyses include the use of anti- coagulants, temperature of storage, method of sealing test tubes, type of test tube (glass, plastic, or other material), thawing technique used, effects of repeated thawing and freezing, changes in ionic strength or pH, etc. In addition to these technical considerations, variations in the biochemical composition of banked materials may be due to such factors as exercise, body position during venesection, season, diurnal variations, phase of the menstrual cycle in women, concurrent infections, pharmaceutical agents, smoking, and antecedent medical conditions. Depending on the sensitivity and specificity of the technique of analysis, these factors may influence the interpretation of epidemiologic studies. A detailed review here of all of the above factors that might affect biochemical, immunologic, or other types of analyses in cohort studies is not possible. These factors will depend on the specific hypotheses being tested in a particular investigation. When considering the stability of deep frozen blood and serum specimens, one must ask whether a specific test for a biochemical substance is to be used for qualitative purposes (as with genetic polymorphic markers) or for quantitative measuring (as for levels of retinol). Although many banks of biologic materials exist, one cannot be certain that proper conditions of preparation and storage have been taken without careful investigation. A list of selected biochemical substances in blood and their stability at —20° and —70° C is shown in table 1. Erythrocytes, Blood Groups, Red Cell Enzymes A voluminous literature exists on the technique of erythrocyte preservation. For genetic epidemiologic studies of blood group antigens and isoenzymes, several techniques have been successfully applied. For blood group antigens, only freshly drawn specimens should be typed. When this is not feasible, small samples of erythrocytes can be frozen rapidly in liquid nitrogen (—192° C). Huntsman and associates (/2-14) have reported excellent preservation of fresh blood diluted with one-half its volume of 1.2 M sucrose immediately before introducing it by drops from a syringe into liquid nitrogen. They reported a reduction in intact red cells of only 6% at 2 years of storage. They found no deterioration in blood group antigens ABO, Rh, MN, S, P|, Lutheran?, Kell, Lewis? Lewis?, and Duffy. These re- sults are not surprising in view of the finding that blood group substances can still be qualitatively detected in Egyptian mummies after embalming and burial for 3,000 years (although quantitation is poor). Glucose-6-phosphate dehydrogenase and 6-phosphoglu- conate dehydrogenase were completely preserved for 2 years at —192° C, whereas glutamic-oxaloacetic trans- aminase and aldolase decreased to 80 and 50% of original levels, respectively. The Huntsman studies indicate that TABLE 1.— Long-term stability of selected constituents of blood at subfreezing temperatures® Blood — 0 constituent 20° C ~30° C Erythrocytes Blood groups Isoagglutinins Red blood cell enzymes Glucose-6-phosphate dehydrogenase 6-Phosphogluconate dehydrogenease Acid phosphatase Adenylate kinase Esterase Malic dehydrogenase Catalase Phosphoglucomutase Glutamic-oxaloacetic transaminase Aldolase | ++++++++ ++ +++ +++ +++ + ++ Serum Antibodies IgA IgG IgM Beta-globulin Beta-lipoprotein Apolipoprotein Polyunsaturated fatty acids Vitamins Retinol B-Carotene Retinol-binding protein Tocopherol Ascorbic acid Hormones Gonadotropins Prolactin Thyroxine Insulin Adrenocorticotropin Testosterone Estrogen Progesterone Sex-hormone binding globulin | HH H+ ?) +++ ++++++ FHA HAH ++ H+++ | ++++ +HHHHE HI (?) ? + = stable; — = unstable. complete stability of biochemical systems of red blood cells has not been obtained even at —196° C. Other studies indicate that clotted blood stored at —70° C over 1 year can be successfully typed for glucose-6-phosphate de- hydrogenase, 6-phosphogluconate dehydrogenase, acid phosphatase, adenylate kinase, esterase, malic dehydrog- enase, catalase, lactic dehydrogenase, and phosphoglu- comutase (15). Serum Proteins, Lipoproteins, Fat-Soluble Vitamins Epidemiologists who preserve sera for infectious disease studies generally take it for granted that most serum antibodies are stable at temperatures of —4° to —20° C. Although this is probably true for many antibodies, it may not necessarily be for all proteins in blood. Blood serum and plasma are complex mixtures of neutral molecules and NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 BIOLOGIC BANKING IN COHORT STUDIES 195 electrolytes, small molecular weight species, and a wide range of macromolecules (16). These molecules and their solvent water constitute the plasma of the blood. Many proteins are complexed with lipids and are held together by weak association forces that may be denatured by freezing. Many sera contain proteolytic proenzymes that may be activated by bacterial products if the sera are allowed to stand at room temperature for a time before freezing. Although prompt freezing of plasma will protect against proteolysis, cooling to a temperature just above the eutectic (— 22° C) may cause denaturation of proteins by high salt concentrations and changes in pH. Serum does not completely freeze until about —50° to —60° C. Some authors recommend that sera be cooled in liquid nitrogen (—192° C), then stored at — 70° C, and thawed quickly at time of analysis (/7). Repeated thawing and freezing may denature many proteins and must be avoided. Containers made of acetyl acetate should be avoided as they may absorb serum water and result in lower serum pH levels that may denature protein. In regard to sealing and freezing, the sealing of test tubes with parafilm is not recommended, as dessication of contents may occur and affect the concentration of constituents. One should avoid glass test tubes and use instead polyethylene test tubes closed with Teflon gaskets and screw tops. The samples should be divided into multiple aliquots of 0.5 to I cc in separate test tubes, so that the entire sample need not be thawed when needed for a specific analysis. Specimens of blood plasma and serum should be rapidly frozen to —70° C, thawed rapidly for use, and thawed only once (Kane JP: Personal communication). Storage at —20° C appears to be adequate for prolonged preservation of most types of antibodies, with some notable exceptions. Paul and White (/), when retesting sera stored 14 years at —20° C, found the same antibody titers as had been obtained before freezing. However, decreases in and loss of reactivity of certain globulins or qualitative changes in their structure that can affect immunoreactivity at temperatures of — 4° and —20° C have been reported [(/7, 18); Vyas GN: Personal communication]. Augustin and Hayward (/8) found abnormalities in electrophoretic analysis of serum gamma-globulin in samples stored at —4° C for 5 to 8 years. Fessel (/9) found serum alpha,- globulin lost immunologic reactivity after being frozen at —20° C for 3 years. His studies of 1,100 serum samples indicated a loss of 1 of 2 immunodiffusion precipitin lines with time. When antibodies are characterized as IgA, IgG, and IgM classes, differences in storage stability are present (Vyas GN: Personal communication); IgA and IgG antibodies are stable at —20° C for indefinite periods. However, just one thawing and refreezing can significantly reduce IgG levels. Vyas found that IgM antibodies frozen at —20° C are stable for 1 year or longer; but just one thawing and refreezing can reduce IgM activity to 25% below prefreezing levels. Serum complement is unstable at —20° C and once thawed and refrozen is completely destroyed. For plasma beta-lipoproteins, cholesterol, and triglycer- ides, substances extensively used in atherosclerosis studies, more information exists on their stability at low tempera- tures. Serum cholesterol and triglycerides, stored at —20° C SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES are stable 5 years or longer (20). Plasma beta-lipoproteins are disrupted by prolonged freezing at —20° C and make ultracentrifugal studies invalid (21); therefore, one should freeze them at —70° C. Destruction of beta-lipoproteins probably occurs at — 20° C because that is the eutectic for plasma. We have at best only partial information on the stability of beta-lipoproteins, cholesterol, and triglycerides. Even less information is available on other lipoid substances. Apolipoprotein (A) is damaged at —70° C and rapidly loses its radioimmunoreactivity when thawed (Kane JP: Personal communication). Consequently, this lipoid must be examined when fresh. Polyunsaturated fatty acids undergo slow peroxidation in serum and tissues even if preserved at low temperatures (22). The loss of flavor in meat that has been frozen for a long time is due in part to lipid peroxidation. Unless protected by antioxidants or by exclusion of oxygen and UV light, peroxidation may reduce the stability of such other fat-soluble substances as vitamins A and E and beta- carotene, inasmuch as they are carried in fatty substances in the blood or are stored in fat depots. Ascorbic acid and other water-soluble vitamins, in contrast, are stable at -J0° C, Steroid hormones in plasma appear to be stable for prolonged periods at —70° C, as are gonadotropins pro- lactin, thyroxin, and insulin, etc., but thawing and refreezing may damage these proteins. For example, adrenocorticotropic hormone is highly unstable and cannot be frozen and thawed; it must be analyzed fresh (Kaplan S: Personal communication). Trace Elements Trace elements stored at subfreezing temperatures of —20° C or less will obviously not deteriorate with time. However, the methods of analysis will affect the qualitative nature of elements like selenium, cobalt, etc., i.e., their valence, whether it is free or bound, the degree of binding, etc. Biochemical methods will differentiate these qualitative problems in contrast to physical techniques of neutron activation analysis or x-ray excitation spectroscopy. In long-term storage of blood that is to be analyzed for trace elements, the substance of the container is an important factor. Glass should be avoided because serum proteins, such as albumin, can chelate small amounts of minerals from glass even when frozen (Gellert R: Personal communication). USE OF BANKED BLOOD IN COHORT STUDIES OF RETINOL AND CANCER Considerable interest has developed since the demonstra- tion that retinoids (retinol, retinyl esters, ethers, etc.) can affect keratinization and differentiation of epithelial tissues and can inhibit carcinogenesis in experimental animals (23). Mettlin et al. (24) reported that dietary intake of vitamin A and beta-carotene is lower in cancer patients than in controls and that serum levels of retinol and beta- carotene are reduced in many cancer patients. Reduced plasma retinol levels were found in persons who later developed cancer compared with controls, but the number 196 of cancer patients in 2 large prospective studies was small. In England, Wald and associates (3) noted that 86 men who had developed cancer, compared with 172 controls, had significantly lower blood retinol levels in plasma samples drawn and frozen about 3 years before the diagnosis of cancer. Mean levels were 210 + 52 IU (63 * 15.6 ug/dl) for all cancer patients compared with 231 +46 IU (69.3 + 13.8 png/dl) for controls. For the 86 men with lung cancer, the values were 183 + 62 IU (54.9 + 18.6 ug/dl). Kark and associates (4) used data and banked blood from a 15-year cohort study of atherosclerosis in Evans County, Georgia. Eighty-five persons in the cohort de- veloped cancer during this period. When compared with 174 age-, race- and sex-matched controls, a statistically significant reduced level of retinol was found in the cancer patients (41 ug/dl vs. 47 pg/dl, respectively). Kark et al. had used regression and residual analysis in conjunction with epidemiologic techniques. Stahlein and co-workers (5), in their study of vitamin A and cardiovascular risk, deter- mined a slight overall decrease in serum retinol levels for most sites of cancer, except for increased retinol levels in lung cancer. Three additional investigations in which banked frozen PETRAKIS blood is being used, in progress or nearly completed, are by Comstock and associates in Washington County, Mary- land; Willett et al. in Boston; and Nomura in Hawaii. Data by Willett and associates were presented at the 1983 meeting of the Society for Epidemiologic Research (25). The Wald and Kark studies reported reduced serum retinol levels in cancer patients compared with controls. Stahlein and associates (5) found minimal or no differences (table 2). Willett et al. and Nomura found no significant differences between patients and controls for serum retinol levels. Disregarding the short follow-up time of 5 years, the Wald study appears to have been carefully conducted. The Kark study, on the other hand, included many sera that had been frozen and thawed several times for earlier studies, and the investigators were uncertain about which sera had undergone this treatment. To dispel this concern, they conducted an experiment with samples of their blood in which several were frozen and thawed as many as 12 times and exposed to UV light up to 24 hours. The treatment was without effect on the before and after levels of serum retinol. However, this test of stability is not predictive of what might occur during a 15-year period of storage at —18° C, interrupted by several episodes of thawing and TABLE 2.— Comparison of cohort studies on retinol and cancer risk Item compared Wald et al. (3) Kark et al. (4) Stahlein et al. (5) Willett et al. (23) Serum storage, yr ~% 15 =~ 10 9 Storage temperature, —40 —18 Not given —70 degrees C Conditions of storage and transport Technique of thawing Method of retinol analysis Epidemiologic analysis No. of cases/controls Serum retinol levels Retinol-binding protein Beta-carotene Smoking history Medical history Social class Cracked vials for 37 cases, 27 controls; average retinol values 7% lower than un- cracked tubes Room temperature High-pressure liquid chromatography Case-control; matched for age, smoking, date of drawing blood 86/172 Cancer patients: 210 1U Controls: 231 1U Not reported Not reported Matched for smoking Two groups defined: suspected vs. not suspected for cancer I and II Many samples re- peatedly thawed and frozen Not specified Trifluoroacetic acid 1) Matched case control for age; 2) regression, residual analysis 85/ 162 Cancer patients: 38.47 ug/dl Controls: 45.24 ug/dl Not tested Levels too low to test Patients: 509 smokers Controls: 369, smokers No information given Controlled; no effect ” ” No. of deaths from cancer vs. survivors 108/357 Cancer patients: Lung: 280 + 78 1U/dl Stomach: 248 + 44 1U/dl Colon: 275 + 56 1U/dI Other sites: 277 + 75 1U/dl Controls: 275 + 67 1U/dl Not tested Not tested Lung cancer: 74% Controls: 449% Evaluations for cardio- vascular disease Not given Undisturbed until analysis Not specified High-pressure liquid chromatography Case-control 111/210 Cancer patients: 67.3 ug/dl Controls: 68.7 ug/dl Cancer patients: 6.01 mg/dl Controls: 5.94 mg/dl Cancer patients: 114.5 ug/dl Controls: 111.6 ug/dl Not given Subjects in hypertension study Not given NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 BIOLOGIC BANKING IN COHORT STUDIES 197 freezing of sera. This aspect would have had an even greater effect on beta-carotene because it is more readily oxidized than retinol. Kark and associates were unable to measure beta-carotene in the serum samples because of low levels, which further suggested beta-carotene lability under similar conditions of storage. In view of these contradictory reports, it is pertinent that one question the biologic significance of an 8- to 10-ug/dl mean difference in retinol levels found between patients and controls. Is the difference due to 1) redistribution of retinol in the body compartments, 2) excess utilization or excretion of retinol in cancer pathogenesis, 3) decreased dietary intake of carotene by cancer patients, or 4) interaction with infection, a commonly associated condi- tion in bronchial, oral, intestinal, and bladder cancers? The cohort studies reported to date have not specifically addressed these possibilities. One factor not considered in the interpretation of retinol and beta-carotene studies is the potential effect of infection and its influence on these substances. It is well recorded that long-term heavy smoking is commonly associated with low-grade mucopurulent chronic bronchitis, and that smokers, as well as lung cancer patients, have more bouts of upper respiratory infections and pneumonitis than nonsmokers (26, 27). That low-grade infection can sig- nificantly depress retinol levels has been demonstrated by Arroyave and Calcano (28) and by others (29, 30). Recent studies by Davis and associates on persons in the Lipid Research Clinics Prevention Trial indicate that cigarette smoking alone is associated with significant reductions in serum retinol and beta-carotene levels (37). A similar in- verse relationship between beta-carotene serum levels and smoking (correlation coefficient = —0.28) was found by Nomura in Hawaii. It is reasonable for one to suggest that the association of smoking with reduced serum retinol levels is a result of secondary, low-grade, chronic bronchial infection or smoking, or both, that must be considered in cohort studies of lung cancer. CONCLUSION This cursory review of the effects of frozen storage of blood components indicates that considerable caution should be exerted before epidemiologists use banked materials for cohort studies of cancer and other chronic diseases. Before using the counterargument that the serum from the control is exposed to the same conditions so that it will “all even out,” it is prudent that we consider the possibility of a deleterious influence of prolonged storage at subfreezing temperatures on any specific biochemical component before initiation of a study. This requirement will cause difficulties in the future use of banked blood for many biochemical analyses of interesting but unstable substances hypothesized to have a role in cancer etiology. This may be particularly vexing for those investigators conducting studies in which specimens are collected for future analyses, as new hypotheses or laboratory tests develop. Probably the safest policy for all to follow is to divide the blood samples into multiple aliquots of 0.5 to 1 cc, to freeze them at —70° C, and to thaw them only once SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES for any new biochemical analysis, unless it is established that repeated thawing and freezing will have no effect on the specific factor in question. However, even this can be a doubtful premise especially when the newer, highly sensitive radioimmunologic reagents are used. Finally, we know that most cancers have incubation periods of many years before their clinical detection (32), and many individuals with occult cancers may be included among cohort studies as apparently healthy controls. Many studies indicate that occult cancers can produce humoral substances, including peptides, hormones, etc., that can have widespread effects in the host years before clinical diagnosis (33). Possibly, before some cohorts are as- sembled, even such an effect as altered taste perception has been reported to occur in cancer patients several years before diagnosis (34, 35). Such factors could affect diet and nutritional status in subtle ways and affect interpretations of cohort studies. Until more is known about such factors, one must be careful not to attribute etiologic significance to small biochemical differences in serum components be- tween cancer patients and controls as observed in cohort studies. My purpose was to bring to the attention of epidemi- ologists some potentially important considerations when using banked blood because biochemical epidemiologic studies are and will increasingly become an essential component of cohort studies in cancer and other chronic diseases. ADDENDUM Recent preliminary studies indicate that the percent free estradiol in serum increases after several years of storage at —32°C and suggest an effect of prolonged freezing on sex hormone binding globulin (Murai J, Siiteri P: Personal communication). REFERENCES (1) PAUL JR, WHITE C (eds): Serological Epidemiology. New York: Academic Press, 1973 (2) LEvY RI: Cholesterol and disease—What are the facts? JAMA 248:2888-2889, 1982 (3) WALD N, IDLE M, BOREHAM J, et al: Low serum vitamin A and subsequent risk of cancer. Lancet 2:813-815, 1980 (4) KARK JD, SMITH AH, SWITZER BR, et al: Serum vitamin A (retinol) and cancer incidence in Evans County, Georgia. JNCI 66:7-16, 1981 (5) STAHLEIN HB, BUESS E, ROSEI F, et al: Vitamin A, cardio- vascular risk factors, and mortality. Lancet 1:394-395, 1982 (6) PETO R, DoLL R, BUCKLEY JD, et al: Can dietary beta- carotene materially reduce human cancer rates? Nature 290:201-208, 1981 (7) PETO R: The marked differences between carotenoids and retinoids: Methodologic implications for biochemical epidemiology. Cancer Surv 2:327-340, 1983 (8) International Agency for Research on Cancer: Memorandum on Biologic Materials Banks. Lyon: IARC, 1983 (9) PEGG DE: Banking of cells, tissues, and organs at low temperatures. /n Current Trends in Cryobiology (Smith AU, ed). New York: Plenum Press, 1970 198 PETRAKIS (10) DoeBBLER GF, ROWE AW, RINFRET AP: Freezing of mammalian blood and its constituents. /n Cryobiology (Meryman HT, ed). New York: Academic Press, 1966, pp 407-450 (11) World Health Organization: Multipurpose Serological Surveys and WHO Serum Reference Banks. WHO Tech Rep Ser No. 454. Geneva: WHO, 1970 (12) HUNTSMAN RG, HURN BA, IKIN EW, et al: Blood groups and enzymes of human red cells after a year’s storage in liquid nitrogen. Br Med J 2:1508-1514, 1962 (13) : Blood groups and enzymes of human cells after two years’ storage in liquid nitrogen. Br Med J 2:1315, 1963 (14) : Liquid nitrogen storage of haemoglobin variants. J Clin Pathol 17:99-100, 1964 (15) WILKINSON JH: Isoenzymes, 2d ed. Philadelphia: Lippin- cott, 1965 (16) LoveELOCK JE: The denaturation of lipid-protein complexes as a cause of damage by freezing. Proc R Soc Lond [Biol] 147:427-432, 1957 (17) SLAVIN B: Protein chemistry in a general hospital. In Structure and Function of Plasma Proteins (Allison AC, ed), vol 2. New York: Plenum Press, 1976 (18) AUGUSTIN R, HAYWARD BIJ: Cleavage of human gamma- globulin. Nature 187:129-130, 1960 (19) FESSEL WIJ: Loss of immunological reactivity of an alpha-2 globulin after prolonged freezing of serum. Nature 187:1307, 1963 (20) ANDERSON JT, KEYS A: Cholesterol in serum and lipo- protein fractions: Its measurement and stability. Clin Chem 2:145-159, 1956 (21) MiLLs GL, WILKINSON PA: Some effects of storage on plasma beta-lipoproteins. Clin Chim Acta 7:685-693, 1962 (22) LoGANI MK, DAVIES RE: Lipid oxidation: Biologic effects and antioxidants—a review. Lipids 15:485-495, 1980 (23) SPORN MD, NEWTON DL: Chemoprevention of cancer with retinoids. Fed Proc 38:2528-2534, 1979 (24) METTLIN C, GRAHAM S, SWANSON M: Vitamin A and lung cancer. J Natl Cancer Inst 62:1435-1438, 1979 (25) WILLETT W, PoLk BF, UNDERWOOD BA, et al: Prediag- nostic serum vitamins A and E and total carotenoids and the risk of cancer. N Engl J Med 310:430-434, 1984 (26) FERRIS B JR: Chronic bronchitis and emphysema. Classifica- tion and epidemiology. Med Clin North Am 57:637-649, 1973 (27) LE Roux BT: Bronchial Carcinoma. London: Churchill Livingstone, 1968 (28) ARROYAVE G, CALCANO M: Desceno de los niveles sericos de retinol y su proteina de enlace (RBP) durante las infecciones. Arch Latinoam Nutr 29:233-260, 1979 (29) VITALE JJ: The impact of infection on vitamin metobolism: An unexplored area. Am J Clin Nutr 30:1473-1477, 1977 (30) Anonymous: Depression of serum levels of retinol and retinol binding protein during infection. Nutr Rev 39:165-166, 1981 (31) Davis CE, BRITTIAN E, HUNNINGHAKE DB, et al: Relation between cigarette smoking and serum vitamin A and carotene in candidates for the Lipid Research Clinics Coronary Prevention Trial. N Engl J Med 310:430-434, 1984 (32) ARMENIAN HK, LILIENFELD A: Incubation period of disease. Epidemiol Rev 5:1-15, 1983 (33) Rose DP: Ectopic hormone syndromes. In Concepts in Cancer Medicine (Kahn SB, Low RR, Sherman C, et al, eds). New York: Grune & Stratton, 1983 (34) DEWYS W: Abnormalities of taste as a remote effect of a neoplasm. Ann NY Acad Sci 230:427-434, 1974 (35) BREWIN TB: Can a tumor cause the same appetite perversion or taste change as a pregnancy? Lancet 2:907-908, 1980 Epidemiology and the Inference of Cancer Mechanisms ' David G. Hoel ? ABSTRACT —Through the use of molecular methods and mathematical models, epidemiologists are contributing to the improved understanding of the mechanisms of cancer. Multistage models with their mechanistic basis have been useful in descrip- tions of initiator-promoter type behavior of some carcinogens as well as genetic predisposition to rare tumors and reproductive risk factors in breast cancer. The use of biochemical and molecular laboratory techniques on tissue and fluid samples should provide important information in the near future concerning the basic mechanisms of human cancer. The potential of these methods is not only to describe exposure to carcinogens but also to indicate various host factors and their relevance to the risk of cancer.— Natl Cancer Inst Monogr 67: 199-203, 1985. Epidemiologic studies in man are used extensively for the assessment of the risk of cancer associated with exposures to chemicals and the effects of various life-style factors. Inasmuch as epidemiology is primarily an observational science and not an experimental one, our attention has been directed primarily at the issue of establishing cause- and-effect relationships. Many experts accept the view that epidemiologic studies can establish causality. This is the position taken by the IARC in that criteria have been established for the degree of evidence needed to establish causality. In the evaluation of environmental carcinogens by the IARC working groups (/), the epidemiologic studies are classified according to the degree of evidence which the results provide; these results provide the bases for the statement by the working groups that a causal association exists between exposure and human cancer. Armstrong (2) recently reviewed the evaluations made by the various IARC working groups. He found that, among the 54 chemicals considered by the IARC in their first 20 monographs for which some human data were evaluated, 18 compounds or processes were identified as having sufficient evidence of a causal association with human cancer, and all had some information derived from cohort studies which are considered to provide the strongest evidence for causality. Information from case-control studies was also often available but certainly to a lesser extent. Generally, ecologic or nonanalytic epidemiologic studies are not believed to provide sufficient evidence in ABBREVIATION: IARC=International Agency for Research on Cancer. I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Biometry and Risk Assessment Program, National Institute of Environmental Health Sciences, P.O. Box 12233, Research Triangle Park, North Carolina 27709. themselves for establishing causality. Beyond the qualitative establishment of causality by the epidemiologic method, investigators express considerable interest in quantitative risk assessment. From a public health standpoint, control of exposures to human carcinogens is essential and, as such, establishment of estimates of risk at given human exposure levels is required. When given actual exposure estimates or, more likely, upper bounds to exposures in the epidemiologic studies, one can provide statements con- cerning the risk of cancer from prolonged exposures. Radiation probably provides the best example through the studies and analyses conducted by the United Nations Scientific Committee on the Effects of Atomic Radiation (3) and the Committee on Biological Effects of Ionizing Radiation (4). The problem becomes much more difficult when we are dealing with chemical exposures for which few studies have been completed, and the studies themselves are of limited scope for quantification purposes. Nonetheless, it is essential that efforts be made toward development of quantitative risk estimates based on human data. If these estimates are not available, one must rely on experimental evidence that often involves the extrapolation of the results of animal studies to man. Epidemiology has not generally been thought of as providing mechanistic evidence in carcinogenesis. However, recent developments suggest that this may no longer be true. One can view 2 aspects of epidemiologic studies that either support mechanistic theories or generate theories. The first area deals with the quantitative models. The multistage model as developed by Armitage and Doll (5) and discussed by Peto (6) has provided considerable information with regard to the mechanism of carcino- genesis. A second area relating to mechanism is what is referred to as biochemical or molecular epidemiology. In this area, the presence of chemicals or their metabolites in human tissue and body fluids is measured, and the associated biologic and molecular changes are assessed. QUANTITATIVE MODELS Mathematical models have been constructed for the quantitative description of cancer incidence in epidemio- logic studies for well over 30 years [see Whittemore and Keller (7) for an excellent review]. This effort has been stimulated partly by the observation that the relationship between cancer incidence rates and age is often linear on a log/log scale. The mathematical construction of these models ranges from an attempt by epidemiologists to describe empirically epidemiologic data to the actual construction of incidence rates based on mathematical descriptions of presumed biologic mechanisms of cancer. The use of these models then provides hypotheses about the basic mechanisms of cancer and allows for mechanistic 199 200 hypotheses to be tested on epidemiologic populations. Thus we see that ideally the interaction between the mechanistic work conducted in the laboratory and its counterpart in the study of human populations will be close. Multistage models have received the most attention among researchers involved in quantification of epidemio- logic data from cancer studies. The most common description of the multistage model is one in which a cancer incidence rate is the result of a clone of malignant cells that originated from a single malignant cell. This cell is assumed to have undergone a finite number of heritable changes in its transformation from normality to malignancy. If it is assumed that the rate of change from one stage to another is constant throughout life and this rate is small, then, according to Armitage and Doll (5), the incidence rate at age ¢ is proportional to #"~/ for the instance when the cell is required to undergo n transitional stages before becoming malignant. It then follows that the log of the incidence rate is a linear function of the log of age. Most data suggest that 4 to 6 stages are involved and that the log/log plot reasonably describes many epithelial cancers in both man and rodent. Perhaps lung cancer and cigarette smoking offer one of the best examples of the multistage model as applied to epidemiologic data. Doll and Peto (8) observed that for cigarette smokers, lung cancer incidence is approximately equal to the amount smoked raised to the second power times the duration of smoking to the fourth or fifth power. A similar relationship has been shown by Peto et al. (9) for the incidence of mesothelioma and time since first exposure to asbestos. In this study, they determined that incidence increases as the third or fourth power of time since first exposure. In both examples, the multistage model ade- quately describes the relationship between incidence, dose, and duration. A difficulty with our applying these types of models lies with the quality and amount of available epidemiologic data. As shown by Doll and Peto (8), the latency parameter and the power of duration of smoking are highly correlated; this correlation in turn leads to various acceptable values for the power of time in the incidence formula. Furthermore, errors in exposure may cause an incorrect estimate in the power of dose that then implies problems with conclusions about which stages are affected by the exposure. The relationship between inci- dence rate and dose rate may not be a simple integer power of dose. For example, Day (/0) points out that the relationship between tobacco smoke and alcohol for cancers of the esophagus and bladder appears to be related to the square root of cigarette consumption. One of the most interesting applications of multistage models is with the analyses provided by Day and Brown (11) and Whittemore (/2). They considered the issue of cessation of exposure of the carcinogen and the effect on the time-incidence curve. With smoking as an example in a study of the relationship between incidence and the second power of dose, the effects on cancer incidence after cessation of smoking is consistent with the idea that, mechanistically, smoking affects both early and late stages in the multistage process for lung cancer. Problems with one’s drawing conclusions from models are related to the quality of available data and the delicate HOEL nature of models. This latter issue is well illustrated by examination of one of the largest animal studies in which a chemical carcinogen was administered for the lifetime of the animals and also included discontinued exposures for some of the experimental groups. In this study, 2- acetylaminofluorene was given to 24,000 male mice re- sulting in both liver and bladder tumors. With the data on the liver tumors, the multistage model is best fitted by 4 stages and a quadratic function of dose. Using the data from the experimental groups in which exposure was discontinued, one finds that the incidence of liver tumors is consistent with either an early and late stage being affected or with the 2 middle stages being affected (/3). A closer examination of the data indicates that the dose effect would be better described by a “hockey-stick” type function and not a quadratic; in this situation, only 1 stage is affected and it is clearly an early stage. All of this illustrates that it is difficult for one to draw mechanistic conclusions concern- ing “initiation-promotion” based on the limited amounts of epidemiologic data. This is especially so when the dose information is poor. An example of this problem is the study of lung cancer and copper smelting involving arsenic exposures. Brown and Chu (/4) concluded that arsenic affected a late stage in the multistage model but could not rule out an early stage effect. Multistage models have been important in describing the possible mechanistic behavior of the joint action of 2 or more agents. The classic examples of this are the epi- demiologic studies that relate the combined action of cigarette smoking with exposure to alcohol, radon daughters, or asbestos. If the individual agents operate on different stages of the multistage process in these situations, one would then expect a multiplicative effect on incidence (15). Whereas if the mode of action of the 2 agents is on the same stage, one may have either an additive or a multiplicative effect. Pike et al. (/6) have continued with the development of the multistage model in attempting to describe quanti- tatively the impact of various reproductive factors on the risk of breast cancer. They considered the duration or age factor in the relationship between dose, age, and incidence to be a function of “tissue age.” In this model, the tissue age was taken to be a function of hormone activity in the breast, and, as such, this function changed with the occurrence of various reproductive events, e.g., menarche, age at first full-term pregnancy, and age at menopause. With this model, Pike and associates (/6) explained some of the effects relating reproductive factors and breast cancer risk and much of the difference between breast cancer rates in Japan and in the United States by differences in reproductive factors in the 2 countries. GENETIC CONSIDERATIONS Inasmuch as some investigators believe that the cancer process does not involve as many as 5 or 6 discrete steps, consideration by a number of them has been given to the more simple 2-stage models. To describe the log-log relationship observed with cancer incidence and age in the 2-stage model, one must assume that the cellular de- scendants of the first change replicate at a faster rate than NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 EPIDEMIOLOGY AND INFERENCE OF CANCER MECHANISMS 201 do the normal cells in that tissue. Armitage and Doll (17) worked out the mathematics for this case by assuming an exponential growth after the first change in the cells. Furthermore, they were able to describe the incidence-age relationship with their 2-stage model for a number of human epithelial cancers. Moolgavkar and Knudson (78) extended this 2-stage model to a fuller description of cellular change including division and death that has been used for descriptions of several human cancers, including breast cancer in which the time and incidence rate relationship changes after the onset of menopause. More- over, their models have been especially useful in the characterization of various childhood cancers. In particular, Knudson has extensively studied retinoblastomas. By observing the incidence of bilateral retinoblastomas, he has shown that this tumor is consistent with a 2-stage model, in which in about 40% of the cases, the risk is due to the inheritance of a dominant gene for the disease. In other words, the first stage of the 2-stage model occurs as a genetic defect. Other tumors that have been suggested as being due to the inheritance of an autosomal dominant cancer gene include neuroblastomas, Wilms’s tumor, and familial polyposis coli. McKusick (/9) has listed about 30 cancers that may be inherited in a dominant manner and account for approximately 19% of all the cancers in the United States. Knudson has estimated that 100 to 200 human cancer genes exist. Peto (20) divided individuals according to their susceptibility to cancer into 3 broad categories. The first and most sensitive group in whom risks may be increased by a factor of three orders of magnitude includes bilateral retinoblastomas and squamous cell skin cancers of patients with xeroderma pigmentosum and colon cancers in individuals with polyposis coli. Peto’s second category includes those cancers for which a genetic predisposition results in risks of one to two orders of magnitude higher than the general population. He believes that a large proportion of cancers fall into this category. The third category then would consist of persons whose genetic risk is within an order of magnitude of the population base rates and, as such, their detection would be difficult. From his epidemiologic study of 2 genetically susceptible subgroups, Cairns (2/) discussed some mechanistic theories of carcinogenesis. He speculates that the initiation of cancer is not primarily due to the increased frequency of point mutations and resulting error of replication at the damaged DNA sites but is related to chromosomal rearrangements. He first studied patients with xeroderma pigmentosum (a condition with an enzymatic defect in DNA excision repair) who are highly susceptible to skin cancer initiated by UV radiation. However, Cairns observed that these patients do not seem to be at substantially increased risk for other fatal cancers, such as those of the lung and breast. If indeed these cancers were a result of point mutations, then one would expect them to be at increased risk which they are observed not to be. On the other hand, patients with Bloom’ syndrome have a condition in which the chromosomes are fragile in the sense that they are at increased risk for chromosomal aberrations and exchanges. For these individuals, cancer rates are generally elevated. One would then conclude that the SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES mechanism of chromosomal breakage and rearrangement is a critical factor in carcinogenesis more so than point mutation. Using laboratory data, Weinstein (22) also arrived at this conclusion and based his arguments on the fact that the frequency of transformation in rodent cell cultures is ten to several hundred times that of induced mutation. This is not what one would expect based on a point mutation model when one considers that the cell transformation should consist of a multistep process in which one step would supposedly involve DNA point mutation. Another interesting observation pointed out by Weinstein is that human and rodent cell cultures seem to be equally susceptible to mutagenesis, whereas human cells are more resistant to cell transformation. BIOCHEMICAL AND MOLECULAR CANCER EPIDEMIOLOGY The greatest contributions to a better understanding of the mechanisms of cancer by epidemiologic studies probably lies in the area of molecular epidemiology. This term, as defined by many individuals, consists of the combination of sophisticated laboratory techniques with analytical epidemiologic studies. These methods range from laboratory procedures for identification of host factors that may modify the individual's susceptibility (including various subgroups with genetic defects) to carcinogens to tests on tissue and body fluids for exposure to carcinogens. Included in this exposure detection would be measurements of the effective dose of chemicals at their target sites and the use of various sophisticated probes for measurement of biologic responses to carcinogens. As mentioned earlier, one of the major problems in epidemio- logic studies involves the accurate measurement of exposures to which individuals and cohorts have been subjected. Through the use of biochemical and molecular methods, one may deal with doses of chemicals or their appropriate metabolites rather than environmental expo- sures. This option should greatly improve the quality of data on doses. These techniques range from traditional methods in chemistry for the measurement of various organics present in tissue, such as the presence of a pesticide in adipose tissue, to the highly sophisticated use of monoclonal antibodies for the measurement of particular DNA adducts, e.g., metabolites of benzo[a]pyrene, as studied by Perera and Weinstein (23). Measurements also include indirect assessment of exposures to mutagens by the use of microbial assays for mutagens in urine, amniotic fluid, and feces. All these methods are exciting in a rapidly progressing field. Technically, they are often specific and precise, as in the use of antibodies in the measurement of specific adducts; however, difficulties do arise in the detection of low levels of exposure to environmental carcinogens. A good example of the application of molecular tech- niques in epidemiology is the use of immunoassays as a probe for detection of benzo[a]pyrene-DNA adducts. Perera et al. (24) have shown dose response in lung tissue after injection of benzo[a]pyrene in the mouse. Further- more, they observed differences between cancer patients and controls after examining samples of lung tissue and 202 HOEL lymphocytes. However, differences in cigarette smoking levels could not be measured with this assay. Currently, these immunoassay techniques are being used in studies of occupational groups such as coke oven workers. Although the monoclonal antibody method of Perera and Weinstein (23) can detect adduct levels as low as a single adduct per 107 nucleotides, investigators encounter difficulties in se- curing sufficient amounts of DNA and with understanding the impact of the various DNA repair systems. One could avoid this latter problem by studying mitochondrial DNA, in which repair processes apparently are either not present or much less effective, and obtaining this DNA through the collection of platelet cells. The use of these adduct probes is extremely exciting because they provide us with a highly specific method for measuring effective doses of a particular agent. One does, however, pay the cost of having to develop antibodies for each type of adduct studied. Nonetheless, the high sensitivity of the method suggests that it will be of great use in both general epidemiologic studies and in attempts toward a better understanding of the mechanisms of cancer. Numerous methods are available for the detection of early biologic responses to carcinogens, and the most familiar are, of course, the study of chromosome aberra- tions in peripheral lymphocytes and assays for sister chromatid exchange. Other assays include morphologic changes in sperm and the detection of deficiencies of hypoxanthine-guanine phosphoribosyltransferase in lym- phocytes [see (25) for references on these topics]. Currently, these assays are being applied in epidemiologic studies. For example, Vijayalaxmi et al. (26) showed a presumptive loss of hypoxanthine-guanine phosphoribosyltransferase activ- ity in lymphocytes among patients with Bloom’s syndrome after assaying for 6-thioguanine resistance. The study of chromosomal aberrations has long been associated with exposures to carcinogens like ionizing radiation and benzene. Recently, the high-resolution banding techniques have been used to show some specific chromosome defects associated with particular cancers. Specific translocations associated with certain leukemias have been observed and may be related in some instances to the presence of oncogenes at the place of the chromosomal break. This area of research was recently discussed by Yunis (27), who illustrated that the particular translocation of chromosomes 8 and 14 in Burkitt’s lymphoma occurs at the site of the myelocytoma (c-myc) oncogene. With the use of an immortal mouse fibroblast cell line, in vitro transformation has been demonstrated for the transforming gene of 2 human bladder carcinoma cell lines. This transforming gene is related to the sarcoma virus oncogene v-HA-ras and is a single point mutation of the normal human gene which codes for the protein p21 (28, 29). Using a nonimmortal cell line of hamster fibroblasts, Newbold and Overell (30) showed that a prior mutation in the hamster cells was necessary before the oncogene could produce transformed foci in the culture. This type of research provides further evidence for the multistage nature of some cancers. Other effects on molecular and biochemical activity at the cellular level can be detected with the new molecular probe techniques. For some time, investigators have examined aryl hydrocarbon hydroxylase activity present in patients with lung cancer and controls. This particular assay and its relationship to the P-450 mixed function oxidase system are relevant to the handling of polycyclic aromatic hydrocarbons which, of course, is highly relevant to human cancer. Perhaps with the molecular probes we need not assay the actual enzyme activity but instead measure the particular RNA levels that are related to the various metabolic processes in the P-450 metabolism system. As these modern molecular techniques rapidly evolve, they will have a major impact on the way epidemiologic studies are conducted. These techniques should permit a much more scientifically refined approach to human studies and possibly permit smaller cohorts and provide a closer relationship between clinical, epidemiologic, and laboratory research. REFERENCES (1) International Agency for Research on Cancer: IARC Mono- graphs on the Evaluation of the Carcinogenic Risk of Chemicals to Humans, vol 17-31. Lyon: IARC, 1978-83 (2) ARMSTRONG B: The use of epidemiological data to assess human cancer risk. /n Methods For Estimating Risk in Humans and Chemical Damage in Nonhuman Biota and Ecosystems (Vouk VB, Butler GC, Hoel DG, et al, eds). New York: Wiley. In press (3) United Nations Scientific Committee on the Effects of Atomic Radiation (UNSCEAR): Ionizing radiation— Sources and Biological Effects: 1982 Report to the General Assembly. New York: United Nations, 1982 (4) Committee on the Biological Effects of Ionizing Radiation: The Effects on Populations of Exposure to Low Levels of Ionizing Radiation, 1980. Washington, D.C.: Natl Acad Press, 1980 (5) ARMITAGE P, DOLL R: Stochastic models for carcinogene- sis. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability (Neyman J, ed), vol 4. Berkeley, Los Angeles: Univ Calif Press, 1961, pp 19-38 (6) PETO R: Epidemiology, multistage models, and short-term mutagenicity tests. /n Origins of Human Cancer: Human Risk Assessment (Hiatt HH, Watson JD, Winsten JA, eds), Book C. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory, 1977, pp 1403-1428 (7) WHITTEMORE A, KELLER JB: Quantitative theories of carcinogenesis. Soc Ind Appl Math Rev 20:1-30 1978 (8) DoLL R, PETO R: Cigarette smoking and bronchial car- cinoma: Dose and time relationships among regular smokers and lifelong non-smokers. J Epidemiol Com- munity Health 32:303-313, 1978 (9) PETO J, SEIDMAN H, SELIKOFF I: Mesothelioma mortality in asbestos workers: Implications for models of carcino- genesis and risk assessment. Br J Cancer 45:124-135, 1982 (10) DAY N: Risk estimation models. /n Methods for Estimating Risk in Humans and Chemical Damage in Nonhuman Biota and Ecosystems (Vouk VB, Butler GC, Hoel DG, et al, eds). New York: Wiley. In press (11) DAY NE, BROWN CC: Multistage models and primary prevention of cancer. JNCI 64:977-989, 1980 (12) WHITTEMORE AS: The age distribution of human cancer for carcinogenic exposures of varying intensity. Am J Epi- demiol 106:418-432, 1977 (13) BROWN KG, HOEL DG: Multistage prediction of cancer in serially dosed animals with application to the EDg; study. Fundam Appl Toxicol 3:470-477, 1983 NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 EPIDEMIOLOGY AND INFERENCE OF CANCER MECHANISMS 203 (14) BROWN CC, CHU KC: Implications of the multistage theory of carcinogenesis applied to occupational arsenic expo- sure. JNCI 70:455-463, 1983 (15) SIEMIATYCKI J, THOMAS DC: Biological models and sta- tistical interactions: An example for multistage carcino- genesis. Int J Epidemiol 10:383-387, 1981 (16) PIKE MC, KrRAILO MD, HENDERSON BE, et al: “Hormonal” risk factors, “breast tissue age” and the age-incidence of breast cancer. Nature 303:767-770, 1983 (17) ARMITAGE P, DOLL R: A two-stage theory of carcinogenesis in relation to the age distribution of human cancer. Br J Cancer 11:161-169, 1975 (18) MoOLGAVKAR SH, KNUDSON AG JR: Mutation and cancer: A model for human carcinogenesis. JNCI 66:1037-1052, 1981 (19) McKusick VA: Mendelian Inheritance in Man: Catalogs of Autosomal Dominant, Autosomal Recessive, and X- linked Phenotypes, Sth ed. Baltimore: Johns Hopkins Univ Press, 1978, p 975 (20) PETO R: Genetic predisposition to cancer. /n Cancer Inci- dence in Defined Populations; Banbury Report 4 (Cairns J, Lyon JL, Skolnick M, eds). Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory, 1980, pp 203-213 (21) CAIRNS J: The origins of human cancers. Nature 289: 353-357, 1981 (22) WEINSTEIN IB: Current concepts and controversies in chemical carcinogenesis. J Supramol Struct 17:99-120, 1981 SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES (23) PERERA FP, WEINSTEIN IB: Molecular epidemiology and carcinogen-DNA adduct detection: New approaches to studies of human cancer causation. J Chronic Dis 35:581-600, 1982 (24) PERERA FP, POIRIER MC, YuspPA SH, et al: A pilot project in molecular cancer epidemiology: Determination of benzo(a)pyrene-DNA adducts in animal and human tissues by immunoassays. Carcinogenesis 3:1405-1410, 1982 (25) Cold Spring Harbor Laboratory: Indicators of Genotoxic Exposure; Banbury Report 13 (Bridges BA, Butterworth BE, Weinstein IB, eds). Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory, 1982 (26) VuayaLaxmi, Evans HJ, RAY JH, et al: Bloom’ syn- drome: Evidence for an increased mutation frequency in vivo. Science 221:851-853, 1983 (27) Yunis JJ: The chromosomal basis of human neoplasia. Science 221:227-236, 1983 (28) PARADA LF, TABIN CJ, SHIH C, et al: Human EJ bladder carcinoma oncogene is homologue of Harvey sarcoma virus ras gene. Nature 297:474-478, 1982 (29) SANTOS E, TRONICK SR, AARONSON SA, et al: T24 human bladder carcinoma oncogene is an activated form of the normal human homologue of BALB- and Harvey-MSV transforming genes. Nature 298:343-347, 1982 (30) NEwBoLD RF, OVERELL RW: Fibroblast immortality is a prerequisite for transformation by EJ ¢-Ha-ras oncogene. Nature 304:648-651, 1983 CO Age at Exposure Versus Years of Exposure ' Herbert Seidman 2? ABSTRACT —The pattern of incidence rates according to age for many forms of cancer has been found to be in reasonable accord with the equation or some modification of it: I, = bik, where I; is the incidence rate at age ¢, and b and k are constants. An alternative equation postulates that the risk of cancer is determined not by the age of a person but by the length of time exposed to a carcinogenic agent: I; = b(t—w)X, where (—w represents the “effective exposure” between first exposure and clinical evidence of cancer. Mesothelioma rates in asbestos insulation workers were strongly related to time from onset of exposure regardless of age at first exposure. However, the same pattern was not evident for lung cancer mortality in the same workers compared with blue collar worker controls from the American Cancer Society Cancer Prevention Study I. Lung cancer mortality by attained rates and by duration of smoking were shown for current smokers of cigarettes only for the Cancer Society study, classified by age at which they started smoking. Lung cancer results were also given for men who never smoked regularly.—Natl Cancer Inst Monogr 67: 205-209, 1985. From previous discussions at this Workshop, it is clear that the results of a given model that fit the data well are subject to differing interpretations as to biologic mech anisms and meaning. Also, different models may fit the data satisfactorily and thereby lead to further disparities as to what the data connote. For many forms of cancer, the increase in frequency is sharp with advancing age, at least to age 80 years or so. These findings are consistent with either a multistage model of carcinogenesis, according to which carcinogens act at a later stage on a growing number of partially transformed cells, or the thesis that susceptibility to cancer increases with age due to systemic changes. Perhaps immune or other regulatory factors are important in certain situations, and the theories may be regarded as complementary rather than as alternatives. In the early 1950s, at about the time Dr. Hammond started conducting the first American Cancer Society cohort study on the smoking habits of American men (/), the k-stage theory of cell transformation was being propounded by Muller (2) and Nordling (3) to explain their observations that cancer mortality rates for many sites increased according to a fifth or sixth power of age. Though several variations and modifications have been ABBREVIATION: CPS I=Cancer Prevention Study I. I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Epidemiology and Statistics Department, American Cancer Society, 4 West 35th Street, New York, N.Y. 10001. 3 1 wish to thank Edwin Silverberg and Steven Gelb for their assistance in processing the data for this study. proposed to fit specific situations more appropriately, the simplest form of the model may be written as: I, = bt, where I, = incidence rate at age f, and b and k are constants. Taking natural logarithms of each side of this equation, we obtain the formula: 1n(/;) = 1n(b) + k 1n(2), which plots as a straight line on log/log graph paper. In the ensuing 30 years, a number of investigators have concerned themselves with the age patterning of cancer rates, but I think it is fair to say that the major contributions have come from Dr. Richard Doll and his colleagues, in particular, Richard Peto (4-14). Figure 1, adapted from (9), shows the incidence by age of 6 forms of cancer in men in England and Wales and a fairly close fit to the model, although a much higher value of k was seen for cancer of the prostate and a low one for cancer of the skin. They (9) used cancer incidence data compiled by the International Union Against Cancer (/5) to examine the relationship postulated by this model for 22 types of cancer in men and 9 in women. They studied 11 large populations for which cancer registration was presumably good. Cancers, such as leukemia and those of the breast, cervix uteri, and lung, were omitted because they were known to show a grossly nonlinear relationship in logarithms of incidence and age. Peto (11) remarks that because cancer involves (among other biologic abnormalities) impairment of the ordinary control of mitosis, the mechanism of cancer induction is likely to be fundamentally different in different types of cells whose mechanisms of mitotic control differ funda- mentally. He finds it useful to divide the cell types of the body into 3 groups: 1) sex-specific epithelial cells, i.e., epithelial cells of the breast, uterus, prostate, etc; 2) other epithelial cells, those common to both sexes; 3) non- epithelial cells, connective and soft tissues, etc. He also notes that the log/log relationship, perhaps with some modifications, is applicable to most group 2 tissues, the non-sex-specific carcinomas, but not to the tissues of the other groups. In the study by Cook et al. (9), they restricted the analysis to rates at ages 35 to 74 years to secure sufficient numbers not available at younger ages and to avoid possible unreliable data at the older ages. They tested the adequacy of the model by finding the best fitting values of k and b for each type of cancer in each population and seeing whether the addition of a quadratic term significantly improved the fit; k varied much more between cancers than between populations. For most of the cancers examined, the mean value of k per population ranged between 4 and 6, though a much higher value (10.7) was shown for cancer of the prostate, an organ comprised of group | tissue in Peto’s classification. On visual examination of the 338 sets of rates, only 21% 205 206 500 1004 504 ANNUAL INCIDENCE PER 104 100,000 MALES 3 (log #7 scale) (0) 051 SEIDMAN Prostate FIGURE 1.—Age distribution of pa- tients with cancers of the esophagus, stomach, rectum, pancreas, pros- tate, and skin (excluding melanoma) in England and Wales. Logarithm of incidence vs. logarithm of age. See (9). 01 v 20 30 40 50 AGE IN YEARS (log scale) appeared to fit the simple model at all closely. In 549%, the curve was downward, with the rate of increase falling progressively below predicted values with advancing age, and, in 25%, the converse or an upward curve was seen. In one-third of the total instances, the differences were statistically significant. Cook et al. (9) found that, although the simple model was certainly useful, cancer incidence often does not vary according to a simple power of age. With respect to duration of exposure, the simple model may be rewritten as: I, = b (1—w)X, where w is a constant such that r—w represents the effective exposure between start of exposure and the first appearance of cancer as a clinical entity. Cook and associates (9) found that w = 3214 years gave a plausible fit for cancer of the prostate. After examining differences among types of cancer within given populations and among populations in given types of cancer in the estimated value of k (the power of effective exposure time in the above equation), the authors con- cluded that, although the results were not wholly consistent, they did suggest that the value of k is a biologic constant characteristic of the tissue in which cancer is produced. The importance of duration of exposure has been clearly demonstrated in experiments in skin painting of mice with cigarette smoke fractions or benzo[a]pyrene (16-18). In human experience, surely some of the most re- markable results are to be seen in mesothelioma of the peritoneum and pleura. Figure 2 shows the results obtained from 1967 through 1979 in a study by J. Peto and associates (19) regarding the cumulative probability of dying of mesothelioma by age 80 in the absence of other causes of death. The data are presented according to the age at which the study subjects started working with asbestos, both for attained age and for number of years since onset of work. Because mesothelioma rates are so low for the general population who have had no exposure to asbestos, we may take the onset of work to correspond to the onset of exposure. Also, mesothelioma usually runs a rapid course from clinical appearance to death, and the date of death is not far from the date of detection in most instances. Clearly, the risks of dying of mesothelioma were much the same at a given time from onset of exposure regardless of the age at which exposure started. It would seem mesothelioma death rates are in accord with the third or fourth power of time from first exposure, though if one allows an induction period of 10 years, a power of 2 also gives a good fit. 0 - — = re 30+ " CUMULATIVE PERCENT DYING eed] Ll - —. as 45 55 65 75 85 10 40 50 60 ATTAINED AGE YEARS SINCE ONSET FIGURE 2.—Probability of dying of mesothelioma for insulation workers by age at which they began work and years since onset of work. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 AGE AT EXPOSURE VERSUS YEARS OF EXPOSURE 207 Much of the modeling work has been based on cigarette smoking in relation to lung cancer. I have used the CPS I cohort (20) to illustrate aspects of this approach involving age, duration of smoking, and daily dose. I have included some new findings according to age the men began smoking as observed during the 12 years of the study. The data were also intended to illustrate some of the nuances and problems of cohort studies to which Dr. Hammond was so sensitive, some of which have already been noted during this Workshop. The median date of enrollment in the study was about the middle of November 1959. Survivors were questioned again about their smoking habits at 2-year intervals in the latter part of 1961, 1963, 1965, and then after a longer lapse of years in 1971-72. - Follow-up was complete for 98% of the subjects through June 1971 and 93% complete in the twelfth year of the study. At the start of this analysis, we surveyed 99,000 men who had never smoked regularly and 152,000 who were smokers of cigarettes only. We classified them by the age at which they started smoking: 18,000 started at less than 15 years, 115,000 at 15-24 years, 13,000 at 25 years or more, and 6,000 for whom the starting age was not known. Thus most had started at ages 15-25 years or at roughly 20 years of age. In their study, Doll and Peto (/3) were mostly concerned with male cigarette smokers only who began when 16-25 years old and who continued to smoke approximately the same number of cigarettes/day. Our routine recording of number of cigarettes smoked/day is much broader than Doll and Peto’s. Moreover as Dr. Hammond and Mr. Garfinkel have shown, not only does smoking influence health but health influences smoking (2/). Amount of smoking is often increased or reduced according to how well the smoker feels, and ill health is a principal reason many stop smoking. Consequently, a smoker was retained in this study even though he changed the amount smoked. He was no longer counted only when he reported giving up the habit or started regular smoking of a pipe or cigar. Among the nonsmokers at the beginning of the study, the few men who took up regular smoking were dropped from subsequent observation. The interval between requestioning on smoking habits affords some protection against the problem that Dr. Breslow called lag in classification (22), i.e., missing lung cancer deaths in men who stopped smoking recently because of poor health. It is prudent that we check whether this is likely to be of appreciable magnitude; figure 3 ad- dresses this point. It shows death rates from lung cancer by age, standardized for numbers of cigarettes smoked/day at the start of the study, for men who smoked at the start of the study (initial classification) versus those who continued to smoke (continuing classification). Data are shown for those who smoked less than a pack a day and those who smoked a pack or more a day. Men who never smoked regularly are shown for comparison. During this short run of 12 years, the death rates among the continuing smokers were generally a bit higher than among the total smoking at the start of the study, but essentially they were much the same despite the fact that one-quarter of the cigarette SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES 0 —_— 10,000 -f |= INITIAL CLASSIFICATION an | > eet CONTINUING CLASSIFICATION & & 4 iA Le STIS 7 3 Py Pack +/Doy / = p 2,000 1,000 — 500 200 DEATH RATE PER 100,000 PERSON YEARS 100 na 50 — “7 20-4 Ler verde ee rrr 25 35 a5 55 65 75 8525 35 45 55 65 75 85 ATTAINED AGE YEARS SINCE ONSET FIGURE 3.—Lung cancer death rates for cigarette smokers only (standardized for No. of cigarettes smoked/day, excluding ex- smokers at start of study) of a pack or more/day and less than 1 pack/day and for men who never smoked regularly. Compari- sons of cigarette smokers and nonsmokers classified at the start of study are made with those who continued in that category throughout the study. Subjects were classified by attained age and by years since onset of smoking. NSR=never smoked regularly. smokers had stopped smoking at some time during the 12-year period. Many factors in cigarette smoking vary aside from the amount smoked daily and current or former usage. As Dr. Hammond remarked, manufacturers have been changing cigarettes markedly for several decades. The changes include such things as tar and nicotine content and prevalence of filtering devices. Smokers also differ in depth of inhalation, number of puffs/cigarette, length of butt to which a cigarette is smoked, and number of years smoking various amounts per day, etc. Regardless of all of these considerations, simple classifications regularly give con- sistent results. Dr. Doll found that for the men who started smoking cigarettes at ages 16-25 years standardized for number of cigarettes smoked /day, a power of 7 fit the data for attained age for the smokers in the model I have shown, whereas, for nonsmokers, a power of 4 was indicated. When he looked at the data with respect to duration of smoking, he found that the power of 4 fit the duration well for the smokers, which corresponded to the nonsmokers, taking age zero as the latter’s age of start of exposure (/4). Our results for attained ages are closer for cigarette smokers and non- smokers, a power of 5 for smokers, and one of 4 for nonsmokers. In the duration data, the smokers and nonsmokers both are close to Doll’s observation of a power of 4. Figure 4 is designed to illustrate that, because men who started cigarette smoking at ages 15-24 comprised such a large proportion of total smokers, the death rates of the total group differ little, compared with those starting at these ages. This has relevance to later findings I am going to show for asbestos workers who had a history of cigarette smoking. In figure 5 are some new results on men in CPS I who had smoked cigarettes only, who were currently smoking at 208 SEIDMAN 20000 WORKERS CONTROLS » ooo04 % + == BEGAN CIGARETTES:ANY AGE 7 = Z - w 5000 [eee 15-24 Fo. A z —-= NEVER SMOKED REGULARLY & 22 : g x 2,000 £ 7 z w & 7 > £ yr a ° 1000 / = = © 7” pa z © & / / uw S 500 Lr 7 » &£ @ e / / / uw i 7 | / a w 7 md od w - 200 / / o > 7 7 = 5 J 7 / < x 100 / / 3 = I E 2 E / / / 2 - 50 LT , B a ~ No? 20 Phen idliiili lt eal eA TB AS 25 35 45 55 65 75 8525 35 45 55 65 75 85 | pyr 1 1 1 ATTAINED AGE YEARS SINCE ONSET FIGURE 4.—Lung cancer death rates among all cigarette smokers only and among those who started smoking when 15-24 yr old (standardized for No. of cigarettes smoked/day, excluding ex-smokers) and among men who never smoked regularly. Subjects were classified by attained ages and by years since onset of smoking. the start of the study, and who then continued smoking cigarettes throughout the course of the 12-year study. They are classified by the age at which they started smoking cigarettes. The cumulative percent dying is based on the assumption that other causes of death are not operative. The comparison here is of the attained age versus the duration of smoking. In the attained age part, it is clear that by a given age the younger one starts smoking, the higher the probabilities of dying of lung cancer. This is consistent with a number of factors: Younger men had smoked for longer periods, and they also tended to inhale more deeply (20). For all 3 groups of ages started smoking cigarettes, a power of 5 is indicated in the model discussed earlier. On the right hand panel, the cumulative probability figures are much closer to one another. The older men at start of smoking show the effects of smoking more rapidly than the 20000 T— 10,000 5,000 + 2,000 — 1,000 500 200 — 100 + DEATH RATE PER 100,000 PERSON YEARS 50 20 Ll 1 bidiialcidimidic do bbl) fel cboy At A LiX 25 35 45 55 65 75 85 25 35 45 55 65 75 85 ATTAINED AGE YEARS SINCE ONSET FIGURE 5.—Lung cancer death rates among cigarette smokers only (standardized for No. of cigarettes smoked/day, excluding ex-smokers) by age started smoking compared with those who never smoked regularly (NSR). Subjects were classified by at- tained age and years since onset of smoking. 85 35 45 55 65 75 85 ATTAINED AGE FIGURE 6.—Probability of dying of lung cancer for insulation workers with history of cigarette smoking compared with controls by age at which they began working and attained age. younger men. In the early years since onset, the younger men are the lowest group. The eventual probability for the men who started smoking at an earlier age comes to a matching or even higher level of probability compared with those who had started at ages 25-34. What happens to a group when one adds asbestos exposure to cigarette smoking (23)? Mr. Edward Lew wanted to see some figures on absolute rates rather than the ratios of the rate for the workers compared with controls. 1 hope these probabilities in figures 6 and 7 provide the answers he wanted. The controls were blue collar workers from CPS 1 (24), with a history of cigarette smoking who were of the same age as the asbestos workers but not necessarily of matching ages at start of employment. In attained ages, the asbestos workers who started work under 25 or 25-34 years were similar in cumulative probabilities of lung cancer death by age 80 in the absence of other causes, and much higher than those starting asbestos work when older, whereas the probabilities for the controls of the same attained ages were WORKERS CONTROLS 40 —— et — 15-24 y ) “ / Fs / ; pees / ; CUMULATIVE PERCENT DYING 1 1 FIT —k 1 50 €0 10 20 30 40 50 60 YEARS SINCE ONSET OF WORK FIGURE 7.—Probability of dying of lung cancer for insulation workers with history of cigarette smoking compared with controls by age at which they began working and years since onset of work. NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 AGE AT EXPOSURE VERSUS YEARS OF EXPOSURE 209 about the same for all 3 groups. For duration in both the asbestos workers and the controls, those starting asbestos work at older ages showed a more rapid effect. Though the pattern was similar in the asbestos workers, the proba- bilities of dying of lung cancer were at a higher level. From these results, it seems to me that when a small value of k is seen that implies a small number of stages in the cancer process, it may then be possible to make an unequivocal inference, as for example, for duration from start of asbestos exposure for mesothelioma. However, larger values of k are seen, more stages are indicated as likely, and the ambiguities of interpretation are much greater. REFERENCES (I) HAMMOND EC, HORN D: Relationship between human smoking habits and death rates: Follow-up study of 187,766 men. JAMA 155:1316-1328, 1954 (2) MULLER HG: Radiation damage to the genetic material. Sci Prog 7:93-493, 1951 (3) NORDLING CO: A new theory on the cancer-inducing mechanism. Br J Cancer 7:68-72, 1953 (4) Stocks P: A study of the age curve for cancer of the stomach in connection with a theory of the cancer- producing mechanism. Br J Cancer 6:407-417, 1953 (5) ARMITAGE P, DoLL R: The age distribution of cancer and a multi-stage theory of carcinogenesis. Br J Cancer 8:1-12, 1954 (6) © A two-stage theory of carcinogenesis in relation to the age distribution of human cancer. Br J Cancer 11:161-169, 1957 (7) : Stochastic models for carcinogenesis. /n Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability (Neyman J, ed). Berkeley, Calif: Univ Calif Press, 1961, pp 19-38 (8) PIKE MC, DoLL R: Age at onset of lung cancer: Significance in relation to effect of smoking. Lancet 1:665-668, 1965 (9) Cook P,DoLL R, FELLINGHAM SA: A mathematical model for the age distribution of cancer in man. Int J Cancer 4:93-112, 1969 (10) DoLL R: The age distribution of cancer: Implications for SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES models of carcinogenesis (with discussion). J R Stat Soc [Ser A] 134:133-166, 1971 (11) PETO R: Epidemiology, multistage models and short-term mutagenicity tests. /n Origins of Human Cancer: Human Risk Assessment (Hiatt HH, Watson JD, Winsten JA, eds), Book C. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory, 1977, pp 1403-1428 (12) WHITTEMORE AS: The age distribution of human cancer for carcinogenic exposures of varying intensity. Am J Epidemiol 106:418-432, 1977 (13) DoLL R, PEeTO R: Cigarette smoking and bronchial car- cinoma: Dose and time relationships among regular smokers and lifelong non-smokers. J Epidemiol Com- munity Health 32:303-313, 1978 (14) DoLL R: An epidemiological perspective of the biology of cancer. Cancer Res 38:3573-3583, 1978 (15) DoLL R, PAYNE P, WATERHOUSE J, eds: Cancer Incidence in Five Continents, vol I, UICC Tech Rep. Berlin: Springer-Verlag, 1966 (16) LEE PN, ROTHWELL K, WHITEHEAD JK: Fractionation of mouse skin carcinogens in cigarette smoke condensate. Br J Cancer 35:730-742, 1977 (17) LEE PN, O’NEILL JA: The effect both of time and dose applied on tumour incidence rates in benzpyrene skin painting experiments. Br J Cancer 25:759-770, 1971 (18) Peto R, ROE FJ, LEE PN, et al: Cancer and ageing in mice and men. Br J Cancer 32:411-426, 1975 (19) PETO J, SEIDMAN H, SELIKOFF 1J: Mesothelioma mortality in asbestos workers: Implications for models of carcino- genesis and risk assessment. Br J Cancer 45:124-135, 1982 (20) HAMMOND EC: Smoking in relation to the death rates of one million men and women. Natl Cancer Inst Monogr 19:127-204, 1966 (21) HAMMOND EC, GARFINKEL L: The influence of health on smoking habits. Natl Cancer Inst Monogr 19:269-285, 1966 (22) BRESLOW N: Multivariate cohort analysis. Natl Cancer Inst Monogr 67:149-156, 1985 (23) NICHOLSON W: Selection factors in cohort studies. Natl Cancer Inst Monogr 67:111-115, 1985 (24) HAMMOND EC, SELIKOFF IJ, SEIDMAN H: Asbestos ex- posure, cigarette smoking and death rates. Ann NY Acad Sci 330:473-490, 1979 Co-Chairman’s Remarks George Hutchison ? I would like to make a few comments about the sessions we have just had and then open the discussion to all of you. Dr. Wald introduced this last session of the Workshop by emphasizing our need to understand the biologic and biochemical mechanisms of carcinogenesis. He pointed out that if human studies are going to contribute to our understanding, these studies have to be guided by models, many of which come from the underlying basic sciences. Dr. Higginson said that responses to date were dis- appointing if we consider either control or prevention of disease as related to behavioral risk factors that have been prominent in our talking and thinking for a long time. Although the result is disappointing, Dr. Higginson told us it was not surprising. It is not surprising in that the stage of knowledge to which this area has moved has not provided us with much advice as to what ought to be done. Certainly in the dietary area, although we may retain the idea that diet is a highly important factor, we do not translate that into practical terms that are likely to show changes in cancer prevention and control. Furthermore, it is part and parcel of this same argument that those who are funding cancer research and those who are designing their cancer research projects at this time might be much better advised to direct their thoughts to elucidating biologic, biochemical, and carcinogenetic mechanisms instead of attempting to direct the research to the more practical and more exciting things announced in the news media about cancer control and prevention. Dr. Petrakis told us about a valuable tool and gave us its limitations and values. He dealt more with the limitations than the values. However, in his final remarks, he said that we were not to go away with the idea that he is pessimistic about all of this. He suggested that some of the serum banks and other biologic banks are being maintained under I Presented at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Harvard School of Public Health, 677 Huntington Avenue, Boston, Massachusetts 02115. unfavorable conditions and are not making maximum use of this material. Indeed, in some instances, we may not be able to use them at all because of the unfavorable way they are being handled. If that is so, it clearly is a waste of time and effort. This part of carcinogenic epidemiologic research has to be done with as much planning and careful study design as all the rest of epidemiologic research. According to Mr. Peto, much of the work that is being done in biochemical epidemiology is inadequate with regard to sample size. Some have had the notion that when we got into this more specific epidemiology, the bio- chemical analyses could be done with much smaller sample sizes; they suggested that we did not have to think so carefully about sample size. Mr. Peto pointed out clearly that, just as with all other research planning, careful planning is mandatory in this area if we are not to spin our wheels and waste time, effort, and funds. Those comments reinforce the spirit of what Dr. Petrakis had said about careful planning in a different area. Dr. Hoel began his paper with some words of caution. He noted that the results of some carefully planned, large, well- controlled animal studies (despite their large size) were difficult to interpret in relation to biologic mechanisms. I thought he was going to tell us meaningful research in human populations was hopeless because it would be clearly impossible for one to maintain adequate research control. However, Dr. Hoel was optimistic about the research being done in human populations. Indeed, he enumerated a number of epidemiologic studies that are closely tied with underlying biologic and biochemical mechanisms and that are using appropriate study designs. He predicted that this was going to become more and more the way of the immediate future and that fundamental epidemiologic research was moving at a good pace. Mr. Seidman mentioned some particular models. His remarks were the most appropriate to have at the end of this meeting because much of his work was closely allied with that of Dr. Hammond and Dr. Selikoff. He also referred to the work of Dr. Richard Doll and his numerous colleagues. 211 Discussion VI 2 R. Peto: I would like to start by reviewing the material that Dr. Petrakis mentioned on the serum vitamin A studies. I think the studies may have been too small. The average blood retinol for the patients with lung cancer and other cancers differed slightly from the controls, but the difference was not large enough to be really conclusive. This kind of question needs to be answered on a large scale, on the same sort of scale as the classic studies that Dr. Hammond coordinated, including data on a few hundred thousand samples prospectively for many years. At least a million person-years of observation would be needed. Some aspects of the methodology concerning serum studies are interesting. Simply because the measurements are technically more complicated does not mean that a smaller number of persons will suffice. If one is going to make sense of the nutritional and hormonal factors in cancer epidemiology, then the need exists for question- naire-based epidemiology to be augmented by the system- atic search for the biochemical correlates of disease among apparently healthy people. Of course, some will object. Dr. Petrakis listed some of the possible obscurities of interpretations, but if we do not go this way I think that we cannot sort the subtle nutritional and personal factors involved. I think any understanding of the direct correlates of the disease which can be measured in blood is likely to be a necessary step to getting definite evidence that can lead to confident advice as to what types of dietary modifications to recommend. Without epidemiology we would not have made the advances we have in identifying causes of liver cancer. It was the strong association between chronic active hepatitis and primary liver cancer which provided the most con- vincing piece of evidence that hepatitis B virus was a cause of this cancer. In the field of heart disease, we would have made little advancement in our understanding of the causes of heart disease and the relevance of cholesterol to heart disease without the blood-based epidemiology and the knowledge that low density lipoprotein cholesterol is posi- tively related to heart disease. If students of vascular disease epidemiology used only questionnaires to estimate the intake of cholesterol as the main index of cholesterol instead of moving toward blood-based studies, we would probably have almost no idea that a definite cause-and- effect relationship existed. I do not intend to minimize the complexity of the subject. The kind of blood-based epidemiology I am referring to has been used to study heart disease for 20 years. What those doing studies in cancer epidemiology need to do is to progress to the point I Conducted at a Workshop on the Selection, Follow-up, and Analysis in Prospective Studies held at the Waldorf-Astoria Hotel, New York, N.Y., October 3-5, 1983. 2 Address reprint requests to Lawrence Garfinkel, Epidemi- ology and Statistics Department, American Cancer Society, 4 West 35th Street, New York, N.Y. 10001. cardiologists have already reached. Of course, we can avoid some of the mistakes made by cardiovascular epidemi- ologists in using blood studies, and we will need to make our surveys much larger. When we talk about the search for correlates, we are often misinterpreted as thinking naively that every correlate is a simple cause. Of course, this is no more true than it is with questionnaire-based epidemiology. Using questionnaire information, you find that people who drink more alcohol have an increased risk of lung cancer. Obviously, this is not proof that drinking alcohol causes lung cancer. It is the same in blood-based epidemiology: Gamma-glutamyltransferase is a fairly good marker for alcohol intake. If you find that gamma-glutamyltransferase in a patient’s blood is predictive of lung cancer, that is no indication that alcohol intake is the cause of lung cancer. We will see many artifactual correlates. For example, if a vegetable-rich diet is protective against various forms of cancer, then blood-based epidemiology would show us that blood carotene is inversely related to subsequent cancer onset rates, but that will not be good evidence that carotene itself is the cause of the protection. Each such correlate does not point directly at a cause, but it is much better if we speculate about causes in the knowledge of biochemical correlates of disease than without such knowledge. Then one is clearly in a much better position to talk sense. I think that if we are going to make sense of diets and nutritional factors we have to look for more biochemical correlates. At the same time, we need to recognize the difficulties that Dr. Petrakis outlines and overcome them. The best kind of study and the most practical and the most open-ended is the type that Dr. Wald and various other investigators have already undertaken. In these studies, blood is drawn from many individuals and stored, and then the researchers observe what happens to the people. Those who develop the disease under study are identified, and their blood samples are assayed for dif- ferences in them compared with the blood samples collected from individuals who did not develop the disease in question. However, if you do this, it will be necessary to collect hundreds of thousands of blood samples and wait long enough for a few hundred of the individuals in the study to develop the particular disease. Then those few hundred (or possibly only those few dozen) blood samples become amazingly valuable; they are like gold! The number of questions you might want to ask is enormous; those blood samples cannot possibly provide all the answers. For example, those who conducted the Multiple Risk Factor Intervention Trial collected blood samples (2-6 ml/ person) and these are now being used for the investigation of various scientific questions. The people who are interested in hormone factors say that the minimum volume of sample they need for their studies is at least 4 ml, but that represents the total volume. Are they going to be given the whole sample? How can you do biochemical epidemiology 213 214 DISCUSSION VI if, having collected the samples, the performance of just a few tests consumes whole samples? It is really a major practical problem which requires a simple solution, if prospective biochemical epidemiology is to make advances. The usual procedure, if you want to estimate the average level of retinol in 2 blood samples is to measure the concentration of retinol in each and calculate the average. Alternatively, and here lies the solution, you could mix the samples and assay the mixture. If you want to determine the mean retinol level in 100 blood samples, you can either perform 100 assays or you can take an aliquot from each sample, pool them, and then assay the retinol in the mixture which would also give you the mean retinol in those 100 samples. At the same time, one can analyze many others substances in the pooled sample. You could collect pools from people with various disorders and pools from controls and use them for testing etiologic hypotheses. Of course, there are disadvantages to pooling; the difference only indicates the difference in the means, and the result may be influenced by a few outliers. It is possible that one bizarre sample may damage all the others; e.g., 1 specimen with a high level of rheumatoid factor may damage various immunologic assays. It does not give you any idea of the SE, so this needs to be estimated separately. Also, interactions cannot be investigated. If beta-caro- tene is correlated with cotinine because cigarette smoking is associated with beta-carotene, the correlations would be lost in the pooling. Therefore, pooled analyses cannot replace the detailed analyses of individual samples, but they do provide a method of initial identification of the few main questions that are worth pursuing without wasting your entire resources. I would like to give you a few examples of the ways in which serum pooling has been used in connection with 4 types of epidemiologic inquiry: 1) Geographic correlations provide data on biochemical analyses on groups of people in prescribed geographic areas. 2) Case-control studies use blood samples as the primary measure of exposure, i.e., investigators believe that the disease is not seriously distorted by the biochemistry. 3) Case-control studies are conducted of those diseases, the cause of which may affect the blood level. The blood level here is essential to validation of questionnaire information. 4) Retrospective prospective type of study is one in which blood is collected from a large number of healthy individuals, stored for years without significant deterioration in the substances of interest, and assayed later in respect to cases and controls. An example of blood studies that help to determine geographic correlations is the collection of serum samples from 72 different counties in China. These counties have been selected to cover a wide span of cancer incidences. In each county, 2 production teams are being selected at random, i.e., roughly 2 villages. The importance of obtaining 2 groups within each county is for assessment of the variation within counties and interpretation of whether the differences between counties are significant. Within each production team, 25 people are chosen at random. Blood, hair, urine, etc. are collected from them plus information on dietary habits and food samples for biochemical analysis. We want to know whether blood selenium levels are correlated with the incidence of cancer. The selenium levels would be compared with the risk of cancer across the different counties. To do this, we will take 1 ml from each of the serum samples collected from each individual in the study and thereby create a pool of 25 ml for the males in 1 production team and 25 ml from the females in the same production team, and 25 ml from the males in the other production team in that county and 25 ml from the corresponding females. This will give us 4 pools/county. When we do our correlation on selenium and cancer over the 72 counties we will only have to do 4 X 72 (288) analyses to get our correlation, even though we will have obtained data on several thousand individuals. The same is true for all sorts of other substances that might be worth measuring. We could not possibly do all the investigations we want on each sample nor would we have the funds or volume of blood needed. Pooling is thus a good way of investigating speculative ideas; if one is definite about a particular hypothesis, then one can always go back to the individual samples for the more detailed biochemical and statistical analyses. We have also been involved in an example of a case-control study of a disease that affects the blood biochemistry, but the biochemical analysis is useful as a means of validating questionnaire information. This study of beta-carotene intake is being done in Brazil in an area where differences in the intake are large mainly because some cook their food in red palm oil (a rich source of beta-carotene), whereas others apparently do not. The variation in long-term beta-carotene intake in this area is about fivefold or tenfold. Apparently, about one-third of the people use red palm oil regularly, another third never use it, and the final third uses it to an intermediate extent. These eating habits are well entrenched because the oil is cheap and the people are unlikely to change their habits much during the course of a lifetime. The results of the case-control study, although preliminary, indicate that beta-carotene intake is not associated with a lower risk of cancer. How can we prove that this negative result is informative? We will take our control patients and divide them into 3 groups according to the reported intake of beta-carotene. We will take 0.5 ml blood from each patient, merge these 0.5-ml aliquots into 3 pools according to the beta-carotene, and see whether the difference in intake is reflected in difference in serum beta-carotene level. If it is, we have a strong negative result indeed. An example of the retrospective-prospective approach to the study has been performed in Finland. In 1972, blood samples were collected from 50,000 adults who were a representative sample of the total population. In 1976, the survey was repeated on 17,000 people. Serial samples of this kind are useful because they enable us to investigate whether the results from a single sample are likely to be informative. Such data enable us to obtain a measure of the within individual variability. For example, it can determine whether the concentration of substance “X” in 1972 is correlated with the 1976 values. If it is, it may provide useful information about the future risk of disease. If there is no correlation, we know that it is uninteresting to investigate. In fact, on the basis of the Finnish data, serum retinol has been shown to correlate with itself 4 years apart NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 DISCUSSION VI 215 reasonably well (correlation coefficient, 0.5), so there is at least the possibility that the blood retinol will be predictive in cancer. I hope that I have pointed out the importance of biochemical epidemiology and how, with a bit of careful foresight and planning, considerably more information can be obtained from a cohort study than would otherwise be available. W. Haenszel: I would like to contribute a footnote on the discussion on biochemical epidemiology that would interest Dr. Selikoff in that it introduces information on biologic end points. In Colombia, a cohort drawn from the general, noninstitutionalized population has been gastroscoped, and the gastric mucosa was classified from normal mucosa to intestinal metaplasia and dysplasia. From this particular population, sera have been collected and information was obtained on the distribution of micronutrients by gastric pathology. I emphasize this only to point out that, if we have information on precursor states, we can look at the variations in the micronutrients and get a better feeling for those candidates that are likely to be exploited profitably. I think there is nothing novel in this; the idea could be extended to cancers of the esophagus and buccal cavity. | see no future for this approach with respect to cancer of the prostate or pancreas, but I think in our discussions on biochemical epidemiology we ought to try to correlate the findings with a variety of available biologic end points. G. Comstock: I wish to make some comments regarding storage of serum and storage of other data. First of all, it is extraordinarily difficult for one to determine losses with storage because they are invariably compounded with laboratory variations from year to year. My experience with laboratory tests is that they are not nearly as reproducible as many think they are, and we have had some real difficulties in trying to study the losses of hormones in sera from year to year. Even though we thought we had a foolproof scheme, it was not. On the positive side is that as long as you only have moderate losses and you treat your subjects and controls in exactly the same way, you will be able to detect case-control differences. That is the big advantage of such comparisons. With respect to data storage, I believe the basic problem with cohort studies is that the length of observation often has to be extremely long before we can obtain a definitive answer. The obvious solution for that is long-term follow-up, and we have discussed how to do that with studies that are still in progress. Many studies are cancelled after a time, but many of them could provide important data for future prospective or historical cohort studies. Three questions really need serious consideration by epidemiologists and particularly those who are interested in cohort studies: 1) Who determines which base-line data should be stored and how should they be stored? 2) What is the best method of storage? 3) How should these data sources be indexed so they can be readily available? The latter question I think also applies to the serum banks be- cause I believe Dr. Petrakis encountered considerable diffi- culty in locating them, and that has been my experience too. N. Wald: Mr. Peto raised a point in connection with the pooling of serum samples from all the subjects and controls in a study. A weakness with this approach is that one would SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES not obtain an estimate of the variance of the substance being assayed, and so it would be impossible to judge whether a difference observed in the value obtained from the patients was significantly different from that obtained from the controls. Therefore, interpretation of the result would be impossible. However, the approach would have the advantage of avoiding the need to test all the samples and possibly wasting precious material. 1 suggest that a sensible compromise would be to pool patient sera in bundles of 10s or 20s and do this similarly for the controls. Such pooling would be economical and would also allow us to obtain an estimate of the variance in the population of cases and controls so that the significance of any difference could be assessed. R. Peto: You need to know the extent to which a substance is reproducible from one time to another. For example, if the beta-carotene in blood is erratically variable, then you know that carotene measured at one time is not correlated with the carotene measured at another time. In this instance, it is epidemiologically uninteresting for one to determine whether the carotene in a single sample is predictive of subsequent cancer onset rates. You do not learn anything of value. As a preliminary to this type of study, you perhaps need hundreds of samples taken a few years apart to find out whether you can measure the same thing consistently in different individuals. As a by- product of that analysis, you will find the mean and SD of the samples taken at the initial survey which give you minimum information. The analysis tells you first whether the mean is roughly what you expected it to be, based on some published values. It does not tell you whether any denaturing has occurred, but you do learn whether 909% of the assay material is gone. You need that mean value, and you want that mean value to be reasonably normal. Secondly, you get an SD within a factor of less than 2 of the samples later pooled. If you do those other studies you have to do anyway, then you have some check on the extent to which the serum or material is deteriorated, and you also have some check on the SE of your pool values; I think you need not go into the business of pooling 10s and 20s, separately. You may choose to do that for some other reason, e.g., if you are afraid that 1 sample has contami- nated all the rest. If you are looking at the prevalence of a rare antibody, then pooling 10s and 20s might be a reasonable thing to do; generally, it would not be necessary. Wald: Mr. Peto, you indicated that it was necessary to be cautious about the interpretation of the case-control prospective studies on serum retinol and cancer because the 3 published studies have been small. Although I would not disagree with your view that we should be cautious, I think this need does not arise from the studies being too small. The position is more complicated. Of the 3 studies, 2 were significantly positive, so here small size was not a problem. Clearly the differences observed were sufficiently large to avoid the need for a larger study. However, the problem was that in the study of men in the British United Provident Association, the period between blood sampling and development of cancer was short (3-4 yr), and some of the patients may have been incubating the cancer at the time we were taking the samples. To some extent, the low serum retinol levels in the men who developed cancer may have 216 DISCUSSION VI been due to the cancer rather than the retinol being directly or indirectly associated with the subsequent risk of cancer. In a second positive study from Evans County, the sera from patients and controls were handled differently. Sera samples from patients were frozen and thawed more often than the samples from controls. 1 find it difficult, however, to believe that this was a real problem because serum retinol appears to be stable and not easily affected by storage or sample manipulation. The third study did not demonstrate a statistically significant association between low serum retinol and cancer, but the difference which was observed, although small, was in the direction of an effect, and the SE of the difference was such that the result was not inconsistent with those from the other 2 studies. The difficulty in the interpretation of the 3 studies on serum retinol and cancer is not one arising from small numbers but rather from short follow-up for one and the possibility that differences in sample handling introduced some bias into the other. G. Hutchison: A large amount of good information is available, but, unfortunately, not well presented in the literature. Dr. Petrakis’ paper from this Workshop will be an important contribution to the literature on this subject. N. Petrakis: I want to point out that blood levels of a specific biochemical substance may not always reflect its concentration or uptake by a specific tissue site. An example from our studies on the biochemistry and cytology of nipple aspirates of breast fluid illustrates this point. We found that serum cholesterol is not related to the cholesterol levels in breast secretions. The increase of breast fluid cholesterol is progressive with age, whereas the serum levels increase only slightly. In the 20- to 29-year age group, breast fluid cholesterol levels are similar to those in the blood, in the 30- to 39-year age group, the mean levels are about 1,000 mg/dl, and in the 40- to 49-year and older age group, the mean levels are about 3,000 mg/dl. This finding on cholesterol levels has additional significance. High levels of breast fluid cholesterol are often associated with high levels of alpha- and beta-cholesterol epoxides. These substances have been reported to have DNA damaging properties and to transform mammalian epithelial cells in tissue culture. However, epoxides were not detected in the blood. Also, in similar studies of estrogens in blood and breast fluids, levels of the estrogens El and E2 averaged an order of magnitude higher in breast fluid than in blood. Schaffner reported that the prostate contains high levels of cholesterol and cholesterol epoxide compared with blood levels. These examples illustrate the need for physiologically based models for the proper interpretation of the meaning of serum levels of any biochemical constituent in cancer etiology. J. Higginson: Levels alone may be meaningless. Blood levels reflect metabolism, such as absorption, increased production, reduced excretion, etc. Each of these may have a different significance. Large blood banks are highly expensive, if properly operated. The International Agency for Research on Cancer collected blood samples from 45,000 children that are still in existence. They were collected for a specific purpose and used for that purpose. The bank is expensive to maintain and its future value is uncertain. Who is going to say what is to be done with it? Opinions are numerous. My advice is to start with an objective and to take aliquots right from the beginning. I do not believe in freezing serum and taking an aliquot later. If you are interested in metabolism and in turnover, pooling material is a waste of time because you may be throwing away pertinent material. Peto: No one is suggesting that you should pool the entire sample. When one is trying to decide which questions are pertinent and which are not, would not a useful guide be to make judgments on whether the difference between subjects and controls was substantial? If you use a small aliquot of test material, you make pools of samples from cases, and make pools of various control samples. The pools are good for helping you to sort what is worth trying, i.e., what is a promising hypothesis and what is not. Higginson: When you have defrosted an aliquot, it may be that the kind of difference that you are going to look for will be subtle. If someone had looked at pooled retinol in each of these studies, I think the differences would have been such that the studies would have had to be repeated. N. Mantel: I can see some value for these biochemical studies if they are done in limited ways. I think our trying to do too much with them may be the death knell for cohort and case-control studies. An example I have in mind, certainly is the Hirayama study of smoking. I should have liked to have seen confirmed the result that women who claimed to be nonsmoking wives of smoking husbands were indeed nonsmokers, which could possibly be verified by biochemical methods. For the kinds of results and situa- tions that we have heard discussed relative to case-control studies, there may be some value in making a biochemical analysis now, but we also want to know what was the status of these people 20 years ago. We generally like to look for past exposures, not something you can find in the serum today, although it might confirm a past exposure. If we are going into cohort-type studies, I see no point in looking at a pool of material when we are going to study the outcomes for individuals later on. When it comes to storing the material, it may be that we can examine some pools of sera now and bank the rest. We are following these individuals in time. Let us look at a hypothetical cohort study. Somebody develops cancer and dies of it. We will pull his serum sample and study what that shows, but for a meaningful comparison, we also have to take samples from those who did not respond. Next week someone else develops cancer and we examine his serum or his pool and then those for other controls. I see impossible situations arising. Maybe with limited studies we can do something. | think that Mr. Peto’s position is strongly in favor of pools, even if he does not tell us exactly how to use them. Peto: I think that this discussion is based on a misunder- standing of what I proposed as regards pooling. The idea is that after a number of years, when a fair number of people have developed the disease of interest, you take samples of sera or tissues of those who have it and you pool an aliquot of those samples. If you have taken 100,000 samples from about 200 patients with lung cancers, you remove those of patients who died of lung cancer in the first year and keep them separate. You can take out the samples of those who developed the disease or died more than 2 years from the time the samples were first taken. Having obtained the NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 DISCUSSION VI 217 samples for each of these, you then take 0.2 ml of each sample and pool them. For controls, you could use randomly chosen controls, stratified by age, sex, and, if you like, vital status at certain points. In Finland, we have about a dozen controls, 6 of each age and sex. We pool a 0.2-ml aliquot for each of those controls, get control pools, and study the values in both the case-control pools for various substances. That leads us to decide which factors to investigate more fully. Mantel: I can see more feasibility now to what you are proposing, but I would still have misgivings as to whether the volunteers of the American Cancer Society will be able to get the people who they enroll to give up the samples of material. Wald: Mr. Peto’s suggestion has to be the correct one. Actually, we have no choice. If you have 2-ml samples of sera from cancer patients, you will soon find that your supply is exhausted after you have made 4, 5, or 6 determinations. SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES I. Selikoff: This discussion is valuable because we are freezing other materials; we can already freeze lymphocytes well, and, in our laboratory, we have retested them after 2 or 3 years and found them to be still viable and good. Most are maintained and, with the rapid progress in monoclonal antibodies, I think we are going to move rapidly in that direction as well. Also, on a test basis, we tried a I-mm punch biopsy of the skin to get some cells and find it is highly acceptable. Here too I am sure this technique is going to be added to the list. The methodology is no great problem; you can do everything that Mr. Peto has said simply by putting a 10-cc sample into 20 vials calibrated to 0.5 cc. In this way, you do not have to defrost the entire sample every time someone wants to do a study. With regard to the methodology, however, I would suggest that we focus on high-risk groups because here we can increase the yield sharply (at least double it) in most instances and maybe stratify these groups. PARTICIPANTS Bernard Benjamin, Ph.D., D.Sc. Tait Building, Room CM 514 Northampton Square London ECIV OHB United Kingdom John W. Berg, M.D. Departments of Pathology and Preventive Medicine and Biometrics University of Colorado School of Medicine Denver, Colorado 80262 Norman Breslow, Ph.D. Department of Biostatistics SC-32 University of Washington Seattle, Washington 98195 Gerald R. Chase, Ph.D. Manville Corporation P. O. Box 5108 Denver, Colorado 80217 George W. Comstock, M.D., Dr. P.H. Training Center for Public Health Research Washington County Health Department Box 2067 Hagerstown, Maryland 21742 Manning Feinleib, M.D. National Center for Health Statistics Center Building, Room 2-19 3700 East-West Highway Hyattsville, Maryland 20782 Joseph F. Fraumeni, Jr., M.D. Division of Cancer Etiology National Cancer Institute Landow Building, Room 4C03 Bethesda, Maryland 20205 Mr. Lawrence Garfinkel Epidemiology and Statistics Department American Cancer Society 4 West 35th Street New York, N.Y. 10001 Leon Gordis, M.D. Department of Epidemiology School of Hygiene and Public Health The Johns Hopkins University 615 North Wolfe Street Baltimore, Maryland 21205 Mr. William Haenszel Illinois Cancer Council 36 South Wabash Avenue, Suite 700 Chicago, Illinois 60603 E. Cuyler Hammond, Sc.D. Epidemiology Training Program Mt. Sinai Hospital 10 East 102d Street New York, N.Y. 10029 John Higginson, M.D. Universities Associated for Research and Education in Pathology, Inc. 9650 Rockville Pike Bethesda, Maryland 20814 David G. Hoel, M.D. Biometry and Risk Assessment Program National Institute of Environmental Health Sciences P. O. Box 12233 Research Triangle Park, North Carolina 27709 Geoffrey R. Howe, Ph.D. National Cancer Institute of Canada Epidemiology Unit Faculty of Medicine McMurrich Building University of Ontario Toronto, Ontario MSS 1A8 Canada George Hutchison, M.D. Harvard School of Public Health 677 Huntington Avenue Boston, Massachusetts 02115 Mr. Seymour Jablon Commission on Life Sciences National Reseach Council 2101 Constitution Avenue Washington, D.C. 20418 ! Participants include authors, co-authors, and attendees who either chaired a session or participated in discussions of papers presented. 219 220 PARTICIPANTS Paul Kotin, M.D. 4505 South Yosemite, #339 Denver, Colorado 80237 Leonard Kurland, M.D. Department of Medical Statistics and Epidemiology Mayo Clinic 200 First Street, S.W. Rochester, Minnesota 55905 Philip J. Landrigan, M.D. Division of Surveillance, Hazard Evaluation, and Field Studies National Institute for Occupational Safety and Health 4676 Columbia Parkway Cincinnati, Ohio 45226 Mr. Edward A. Lew Route 1, Box 745 Punta Gorda, Florida 33950 Robert A. Lew, Ph.D. Department of Neurology University of Massachusetts Medical Center Worcester, Massachusetts 01605 Abraham Lilienfeld, M.D. Department of Epidemiology School of Hygiene and Public Health The Johns Hopkins University 615 North Wolfe Street Baltimore, Maryland 21205 Mr. Nathan Mantel Department of Mathematics, Statistics and Computer Science The American University Washington, D.C. 20016 Miss Margaret Mushinski American Cancer Society Epidemiology and Statistics Department 4 West 35th Street New York, N.Y. 10001 William J. Nicholson, Ph.D. Environmental Sciences Laboratory Mount Sinai School of Medicine The City University of New York New York, N.Y. 10029 Mr. Richard Peto Regius Department of Medicine Oxford University Radcliffe Infirmary Oxford 0X2 6PS United Kingdom Nicholas L. Petrakis, M.D. Department of Epidemiology and International Health School of Medicine University of California San Francisco, California 94143 Carol K. Redmond, Ph.D. Department of Biostatistics Graduate School of Public Health University of Pittsburgh 130 DeSoto Street Pittsburgh, Pennsylvania 15261 Howard E. Rockette, Ph.D. Department of Biostatistics Graduate School of Public Health University of Pittsburgh 130 DeSoto Street Pittsburgh, Pennsylvania 15261 Miss Ruth Roeser Health Insurance Plan of Greater New York 220 West 58th Street New York, N.Y. 10019 David Schottenfeld, M.D. Epidemiology and Preventive Medicine Service Memorial Sloan-Kettering Cancer Center 1275 York Avenue New York, N.Y. 10021 Mr. Herbert Seidman Epidemiology and Statistics Department American Cancer Society 4 West 35th Street New York, N.Y. 10001 Irving Selikoff, M.D. Environmental Sciences Laboratory Mt. Sinai Hospital 10 East 102d Street New York, N.Y. 10029 NATIONAL CANCER INSTITUTE MONOGRAPH NO. 67 PARTICIPANTS 221 Mr. Sam Shapiro Health Service Research and Development Center School of Hygiene and Public Health The Johns Hopkins University Baltimore, Maryland 21205 Richard B. Singer, M.D. RFD 1, Box 109 York, Maine 03909 Jeanne M. Stellman, Ph.D. Division of Health Administration School of Public Health Columbia University 21 Audubon Avenue New York, N.Y. 10032 Steven D. Stellman, Ph.D. Epidemiology and Statistics Department American Cancer Society 4 West 35th Street New York, N.Y. 10001 Philip Strax, M.D. Department of Community Medicine New York Medical College Valhalla, New York 10595 SELECTION, FOLLOW-UP, AND ANALYSIS IN PROSPECTIVE STUDIES Louis Venet, M.D. Department of Surgery Beth Israel Medical Center 10 Nathan B. Perlman Place New York, N.Y. 10003 Wanda Venet, B.S., R.N. Health Insurance Plan of Greater New York 220 West 58th Street New York, N.Y. 10019 Nicholas J. Wald, M.B. Department of Environmental and Preventive Medicine St. Bartholomew’s Hospital Medical College Charterhouse Square London ECIM 6BQ United Kingdom J. A. H. Waterhouse, M.B. Regional Cancer Registry Queen Elizabeth Medical Center Birmingham BIS 2TH United Kingdom Ernst Wynder, M.D. American Health Foundation 320 East 43d Street New York, N.Y. 10017 7¢ U.S. GOVERNMENT PRINTING OFFICE 1985 0 - 461-334 GENERAL LIBRARY - U.C. BERKEL. B000502880