Public Health Service Publication No. 1000-Series 2-No. 13
For sale by the Superintendent of Documents, U.S. Government Printing Office
Washington, D.C'., 20402 - Price 35 cents
NATIONAL CENTER| Series 2
For HEALTH STATISTICS | Number 13
VITALand HEALTH STATISTICS
DATA EVALUATION AND METHODS RESEARCH
Computer Simulation of
Hospital Discharges
Hicro-simulation of measurement errors in hospital dis-
charge data reported in the Health Interview Survey.
Washington, D.C. February 1966
U.S. DEPARTMENT OF
HEALTH, EDUCATION, AND WELFARE Public Health Service
John W. Gardner William H. Stewart
Secretary Surgeon General
NATIONAL CENTER FOR HEALTH STATISTICS
FORREST E. LINDER, Pu. D., Director
THEODORE D. WOOLSEY, Deputy Director
OSWALD K. SAGEN, Pu. D., Assistant Director
WALT R. SIMMONS, M.A., Statistical Advisor
ALICE M. WATERHOUSE, M.D., Medical Advisor
JAMES E. KELLY, D.D.S., Dental Advisor
LOUIS R. STOLCIS, M.A., Executive Officer
OFFICE OF HEALTH STATISTICS ANALYSIS
Iwao M. Moriyama, Pu. D., Chief
DIVISION OF VITAL STATISTICS
RoserT D. Grove, Pu. D., Chief
DIVISION OF HEALTH INTERVIEW STATISTICS
Puivie S. LAWRENCE, Sc. D., Chief
DIVISION OF HEALTH RECORDS STATISTICS
Monroe G. SirkEN, Pu. D., Chief
DIVISION OF HEALTH EXAMINATION STATISTICS
ArtHUR J. McDoweLL, Chief
DIVISION OF DATA PROCESSING
SioNEY BINDER, Chief
Public Health Service Publication No. 1000-Series 2-No. 13
Library of Congress Catalog Card Number 65-62273
PREFACE
The purpose of the study described in this
report was two-fold: (1) theunderlying considera-
tion was methodology, with emphasis on model
building and on experience to be gained inthe use
of computer simulation techniques employed in
analysis of health statistics; and (2) the immedi-
ate target was a better understanding of the im-
pact of certain measurement deficiencies present
in health interview surveys.
The specific problems studied are set forth
in sections I and II of the report. The subject
matter is hospital discharges, and more espe-
cially the discrepancies between the number of
discharges as reported by household respondents
to interview and those that actually occur. The
Health Interview Survey of the National Center
for Health Statistics in its household inquiry in-
cludes questions asking for the number andchar-
acteristics of hospital discharges experienced by
household members in the year prior to inter-
view, There are many reasons for discrepancy
between the reported number of discharges and
the true number. Two of these causes have been
given particular attention. One is that hospital
experience during the reference period for per-
sons not living at the time of interview is not
reported in a survey of living persons. This de-
ficiency is relatively more important the longer
the reference period. A second principal cause of
discrepancy between reported and true data isthe
response error in the report for a living person.
Empirical data and theory have indicated that this
error, too, increases with length of reference
period.
The interaction of these factors and their
impact on reported data have been explored pre-
viously in a variety of ways, using record-check
techniques, internal analysis of reported data,
and hypothetical models. This research has con-
RAwoq
sia
wo. 15-5
PUBLIC
HEALTH
LIBRARY
tributed substantially to better knowledge of the
subject but has left several questions unanswered,
It seemed likely that understanding would be
further promoted, and especially that better
judgments could be made of the effect of changes
in interview procedure, if the process were to be
studied through a technique for simulating on a
computer the hospital experience of a model pop-
ulation of individual persons, and subsequently
simulating interviews of this population, Such an
undertaking might have particular merit since
the main threads of logic for the hospital problem
might have considerably wider potential applica-
tion—for example, a close analogy can be made
between periods of unemployment and hospital
episodes.
Accordingly, through a contractual arrange-
ment the present study was carried out by Re-
search Triangle Institute, Durham, N.C., in close
cooperation with staff members of the National
Center for Health Statistics. Dr. D. G. Horvitz
of the Research Triangle Institute was the pro-
ject director and principal author of this report.
He was assisted by Dr. D, T. Searls, formerly on
the Research Triangle Institute staff, and by
Irving Drutman (deceased) of North Carolina State
University. Mr. Drutman did most of the computer
programming. Other contributors to the study
were Mr. Joseph Snavely of the North Carolina
State University Computing Center and Mr.
Francis Giesbrecht of the Research Triangle
Institute, who developed appropriate expected
values and variances for the computer -generated
discharge rates. Walt R. Simmons prepared an
initial outline of the problem, proposed the simu-
lation approach, and coordinated contributions of
the staff of the Center to the project. Wilbur M.
Sartwell of the Center staff supervised much of the
computer calculation.
160
CONTENTS
Page
Preface == ----mcmmmm mmm eee i
I. Introduction ===-=-mcmm come ee ee eee 1
II. Project Objective Sm =m mmm mm mmm ee ome eee eee eee 3
III. Procedures=---==-mmmmm meme eee eee eee meme mem 3
SUMMAT Y= === === mm mm mmm em mee ee eee em 3
A Stochastic Model for Hospital Episodes-=--=--mceccmmmaccanaaun 5
Hospital Admissions Model-==mmme mmm meee cece eee a 5
Duration-of-Stay Model--=c=ccmmmmm ccc ———————— 6
Computer Simulation of Hospital Episodes-----==-eceomacaaaaan 6
Interview Simulation Model-=== mem mmm mo mee emma 11
Underreporting of Hospital Episodes-=====-cecmmmmcmmccaaao- 11
Length-of-Stay Response Errors--=-=-=cocecmmmmmcmmccceceee 12
Month-of-Discharge Response Errorg-=--ce-=cemmemeecmeo ano 13
Computer Simulation of Interviews -====c=em commemorate 13
Simulation Estimates of Errors in Hospital Discharge Data-------- 15
IV. ReSUllS === mmm mmm meee eee eee meee 16
Evaluation of Hospital Episodes Simulation----=-=ccceccmmaeaanao- 16
Evaluation of Interview Simulation======em emo o mmm 19
Estimates of Specific Error Components------««we- mmm ———————— 19
Methods for Increasing ACCUraCY-=-===mmme omen meee eee eee 22
V. Conclusiong--==emmmm mmm cee eee 23
Detailed TableS--=-=-mm memo ee eee eee eee m 25
Appendix. Outline for Computer Simulation of Hospital Discharges----- 35
IN THIS REPORT a study is presented on computer micro-simulation
of discharges from short-stay hospitals, and on the associated measure-
ment errors that occur in household interview surveys, as set forth in
the preface. A synthetic universe of 10,000 persons was established with
demographic characteristics similar to those of the U.S. civilian, non-
institutional population. On the basis of earlier theoretical work and em-
pirical record-check studies, this universe was subjected to a series of
stochastic operations to simulate hospital experience, and the reporting
of that experience in household intevviews.
Each individual person was moved from one state to another—e.g., from
not-in-a-hospital to in-a-hospital, or from in-a-hospital to discharged-
alive—by arandom process with probabilities which varied by such fac-
tors as age, sex, distance from death, number of days already in the
hospital, and a general health index. Thus it was possible to count the
simulated hospital discharges over a 12-month period, and to tabulate
them in a variety of ways.
At monthly intervals the living persons in the synthetic population then
were "interviewed" by the computer and reported their hospital expevi-
ence over the previous year. Two sets of simulated interview data were
tabulated. In one, respondents reported without ervov. For this set, com-
parisonwith total experience reflected the impact on discharge statistics
of the missing data for persons not living at the time of interview. In
the other, response was conditioned by probabilities of reporting cor-
rectly, which varied by distance between interview and discharge, length
of stay, reason for hospitalization, and other less significant factors.
Comparison of this latter set of data with total experience gives a mech-
anism for studying a wide range of problems found in the interview data.
Throughout the study, emphasis was placed on the development and use
of a flexible method of analysis. The report is not an evaluation of the
reporting of hospital discharges in the Health Interview Survey.
SYMBOLS
Data not available---=mmcecmmccc cmc i
Category not applicable--=-==ceeccoacmaaax
Quantity Zero-----===-=me--mmmmmmmemmmmee -
Quantity more than O but less than 0.05----- 0.0
Figure does not meet standards of
reliability or precision------=coececaeau- *
COMPUTER SIMULATION OF
HOSPITAL DISCHARGES
l. INTRODUCTION
The Health Interview Survey of the National
Center for Health Statistics provides estimates of
the number of discharges from hospitals on an
annual basis for the living, civilian, noninstitu-
tional population. The data are gathered in a
household interview survey by means of personal
interviews conducted each week, during a 52-
week period, in area probability samples of house-
holds throughout the United States. The informa-
tion on discharges (along with hospital utiliza-
tion) is obtained for each resident in the sample
households for a reference period of 12 months
prior to the week of interview,
There are some readily recognized factors
in the survey procedure which cause the number
of discharges reported by the respondents to dif-
fer from the actual number which occurred in
hospitals during the reference year. One impor -
tant factor is the failure of the respondents to
report correctly each hospital episode during the
reference year. A second factor is thatthe survey
covers only persons living on the date of inter-
view, The hospital experience of persons who
died in the year prior to interview is not included.
If the difference between reported discharges
and all discharges taking place during the ref-
erence year is examined on a weekly or monthly
basis, a definite decreasing trend or decay, mov-
ing backward in time from the date of interview,
of the number of discharges reported by the re-
spondents in the Health Interview Survey is ob-
served. Explanations for this decay curve include
the following factors.
1. Response errors.—Underreporting can be
expected to increase with increasing length of the
recall period. In other words, recent discharges
are more likely to be recalled and reported ac-
curately than discharges which occurred earlier
in the reference year.
2. Persons in their last year of life.—A
study of hospital utilization in the last year of
life reports that the ''daily discharge rate per
1,000 deaths increases gradually from less than
1 during the twelfth month before death to about
3 on the day before death.'! The Health Inter-
view Survey obtains information from persons
who will die inthe year following the date of inter-
view. The discharges for these persons for the
reference year are more frequent for the period
immediately prior to the date of interview than
for earlier periods in the reference year, thus
contributing to the observed decay curve.
3. Population growth. —Only living persons
residing in the sample households on the date of
the interview are eligible for the survey. The
size of this population is probably at least 1.5
percent smaller 12 months prior to the date of
interview, since during this period there are
births and other additions to the household pop-
ulation such as returnees from mental and penal
institutions. During this same period, losses in
the household population occur, but these are not
recorded since they involve persons who died or
were institutionalized.
4. Hospital discharge trend.— A portion ofthe
observed trend may be a legitimate consequence
of natural phenomena related to the hospitaliza-
tion needs of the population. If there is an in-
creasing trend in hospital admission rates, then
the same trend will be present in the discharge
rates. Such a trend is not expected to be very
great during a period as short as 1 year.
Response errors in reported hospital dis-
charges have been studied by the Survey Research
Center, University of Michigan, in cooperation
with the Bureau of the Census and the National
Center for Health Statistics. The first study
employed a sample of individuals with known
hospitalization records.? These persons were
interviewed concerning their hospital experience,
and the results were compared with the records
obtained from hospitals. The comparisons con-
firmed that underreporting of hospitalization in-
creases with length of recall period. For dis-
charges occurring near the beginning of the 12-
month period prior to interview such underreport-
ing was particularly serious. The study estimated
underreporting of hospital episodes for the ref-
erence year to be 10 percent.
A second study compared three survey pro-
cedures for obtaining hospital episode data, in-
cluding the Health Interview Survey procedure
which was used as the standard.’ Reporting ac-
curacy was found to be significantly improved by
using a revised interview schedule with a mail
followup to obtain information concerning hos-
pital stays that had been overlooked in the inter-
view,
With respect to decedents during the ref-
erence year, the Division of Vital Statistics of
the Center conducted a study of hospitalizations
during the last year of life from the records of a
sample of deaths in the Middle Atlantic States,
i.e., New York, New Jersey, and Pennsylvania.!
The study estimated that the hospital discharges
reported in the Health Interview Survey for the
Middle Atlantic States needed to be adjusted up-
ward by approximately 8 percent to include the
experience of decedents. A similar study on a
national scale is now nearing completion.
The Health Interview Survey collects data
from a new sample of households each week.*
It is therefore possible to compare the hospital
discharges reported for a particular calendar
period by two or more of these weekly samples.
For example, consider the number of hospital
discharges reported for the month prior to inter-
view of each weekly sample and compare this
with the number of hospital discharges reported
for the same month by each sample interviewed
4 weeks later, The average discrepancy for the
paired weekly samples represents an estimate of
the combined effects of mortality and response
errors for the second month prior to interview.
Such factors as population growth or hospitaliza-
tion trends are not included in the observed dif-
ference.
Analyses of this type have been carried out
with Health Interview Survey data to estimate the
relationship between underreporting (including
mortality and response errors) and the time in-
terval between discharge and date of interview,
Simmons and Bryant derived adjustment factors
based on these internal analyses by which hos-
pital discharges reported in the Health Interview
Survey need to be inflated according to the dis-
tance between discharge and interview to produce
an estimate of total hospital discharges, including
discharges for persons dying during the reference
year. Although so extensive an adjustment pro-
cedure has not been adopted, publication of hos-
pital discharges reported in the Health Interview
Survey is now based on data for the most recent
6 months of the reference year. The 12-month-
reference period is retained in the interview.
While research has resulted in greater un-
derstanding and knowledge of the role played by
various factors affecting observed discrepancies,
this understanding and knowledge is still insuffi-
cient for specification of a completely satisfactory
procedure of data collection and estimation. Part
of this difficulty might be explained by the fact
that the major studies of response error and mor -
tality factors have been carried out independently.
An ideal research design might conduct a pro-
spective study on a large population sample for 1
year, observe (independently) the actual hospitali-
zation experience of this sample, and interview
those persons living at the end of the year. The
required data for a fuller understanding would
probably result from such a study. However, this
is not considered a feasible research project; it
might be impossible to carry itout satisfactorily.
An alternative research approach isto simu-
late this prospective study on a computer. This
implies specifying a population to be followed over
time, with the initial state of each individual know,
such as age, whether or not in a hospital, and if
so, the number of days the individual has already
spent in a hospital, It also requires the specifica-
tion of the transition probabilities for each pair
of possible states for each time period (such as
a week), including mortality. The division of the
population into the various states for each time
period is then generated successively by means of
the transition probabilities. In this way the hos-
pital discharges can be counted for each time
period, including those of individuals discharged
dead as well as those of individuals who die in
subsequent time periods.
The household interview among living persons
in the generated population at the end of 1 year
can also be simulated. This simulation uses a
probability function relating failure to reporthos-
pital episodes to the number of weeks between
discharge and interview. The simulated interview
data can then be compared with the generated hos-
pital discharge data and the distribution of the dis-
crepancy among the contributing factors deter-
mined for each time period.
The computer simulation approach was used
in this project.
ll. PROJECT OBJECTIVES
The major purpose of this project was to
develop a research tool for comparison of alter-
native hospital episode interview survey proce-
dures. It was expected that the computer simula-
tion approach could lead to relatively inexpensive
evaluation of the effects of alternative procedures
and eventually to more efficient and accurate pro-
cedures for the continuous collection and estima-
tion of hospital discharge statistics.
Specific objectives of the project were:
1. To develop probability models for gener-
ating (a) hospital admissions and durations
of stay for a given population, and (b) in-
terview data on hospital episodes as col-
lected in the Health Interview Survey.
2. To determine suitable parameter inputs
for the models from existing data.
3. To program an IBM 1410 computer for
experimental simulation under the mod-
els.
4. To estimate, through computer simula-
tions, the specific effects of the various
factors related tothe discrepancy between
hospital discharges reported in the inter-
view survey and all discharges.
5. To suggest, on the basis of the research
results, a method for continuous collec-
tion and adjustment of hospital discharge
data.
ll. PROCEDURES
SUMMARY
The initial phase of this project was concerned
primarily with developing a probability model for
generating hospital episodes for individuals on a
computer. The model adopted assumes that each
individual in the population of interest has a par-
ticular probability of being hospitalized each week.
It further assumes that this weekly hospital ad-
mission probability remains constant for a given
individual over the time period of interest (pro-
vided he is not in his last year of life), but varies
from individual to individual. Based on empirical
studies of data available from the Health Inter-
view Survey and on theoretical considerations, it
was determined that the generalized gamma dis-
tribution provides a suitable and consistent model
for the distribution of the weekly admission prob-
abilities over the population. Once an individual
is hospitalized, the model provides for discharge
from the hospital on a daily probability basis
with the chance of discharge conditional on the
number of days already hospitalized. The log-
normal distribution was adopted as the duration-
of-stay model, following empirical analysis of
length-of-stay data available from the Health In-
terview Survey.
A computer program was developed in the
second phase of this project to generate hospital-
ization histories for each individual in a model
U.S. population. The weekly admission probabil-
ities and daily discharge probabilities employed
in the computer program were estimated for in-
dividuals in each of 12 age-sex groups consistent
with the hospital episodes model developed in the
first phase. In brief, the computer program gen-
erates uniform random numbers to compare with
the appropriate weekly hospital admission prob-
ability for an individual during each week thatthe
individual is not hospitalized. When an individual
is hospitalized by the computer, it then generates
uniform random numbers to compare with the ap-
propriate daily discharge probabilities until the
individual is discharged. The computer records
the day of admission and day of discharge for
each hospital episode generated.
This basic computer program, with some
modifications, was carried out for an initial pop-
ulation of 10,000 individuals, distributed by age
and sex to represent the U.S. civilian, noninstitu-
tional population, for a period of 108 weeks or
756 days. The modifications included introducing
births and deaths in order to give a dynamic di-
mension to the population and using a separate
set of daily hospital admission probabilities for
individuals in their last year of life. These latter
probabilities increased gradually as the day of
death approached. Except for deliveries, reasons
for hospitalization were not assigned in the com-
puter simulation program. The computer deter -
mined on a random basis those deliveries which
were to occur in a hospital.
In the third phase of the project a relatively
simple model was devised to simulate the re-
sponses obtained in household interviews for in-
dividuals experiencing one or more hospital epi-
sodes in the year prior to interview. For each
hospital episode, the model simulates on a prob-
ability basis failure to report the episode, reported
length of stay (if the episode is reported), and re-
ported month of discharge. The model treats re-
porting of each hospital episode as a random event
dependent on length of the recall period and length
of hospital stay for the episode. The distribution
of errors in reported length of stay is approxi-
mated in the model by a normal or Gaussian dis-
tribution. Response errors in the reported month
of discharge are simulated in the model by first
approximating errors in the reported date of ad-
mission by a normal distribution. The reported
length of stay is then added to the reported date
of admission to obtain the reported discharge
date.
A computer program to generate interview
results consistent with the interview simulation
model was developed in the fourth phase of the
project. The input data for this program con-
sisted of the 108 weeks of hospital episode data
generated by the first computer program together
with parameter values for the interview simula-
tion model. Estimates of the necessary param-
eters were based on evidence from exploratory
work which had been done in the National Center
for Health Statistics and especially on the results
obtained in the previously mentioned response
error study conducted by the Survey Research
Center, University of Michigan. This interview
simulation computer program was run for 13
separate interview dates 4 weeks apart beginning
with week 60 of the 108-week period for which
hospital episode data had been generated. The
results were tabulated in three separate cate-
gories by the computer for each interview date,
These results included number of discharges and
number of hospital days, by sex, age, and each of
13 four-week periods prior to the interview date,
The three tabulation categories were 'interview
reported" results for persons alive on the date of
interview, which include simulated interview re-
porting errors; ''perfect interview' results for
persons alive on thedateof interview, which sim-
ulate the results which would be obtained by the
household interviews if there were no response
errors of any kind; and "all discharges' which
consist of the actual results for all hospital epi-
sodes generated by the first computer program
for the year prior to the interview date for all
persons, whether alive or dead on the interview
date.
The data generated by the computer for the
13 interview dates were averaged and estimates
of annual hospital discharge rates and annual hos-
pital days per 1,000 persons by age and sex were
derived for each of the three tabulation categories.
Using these results, both separate and combined
estimates of the effects of interview response
errors and of exclusion of persons who died dur-
ing the reference year on hospital discharge data
collected in the Health Interview Survey can be
derived.
A STOCHASTIC MODEL
FOR HOSPITAL EPISODES
Hospital Admissions Model
The model for hospital admissions was deter-
mined soon after the project was initiated. This
was primarily due to a fortunate exposure to re-
search on a mathematical model of an index of
health by Dr. Chin Long Chiang, University of
California at Berkeley.5 The hospital admissions
of an individual during a time interval of length
t can be treated as random events in time, that
is, as a stochastic process. A simplified model
assumes that the probability of the individual
being hospitalized during a small time interval
dt is given by Adt, where X is a positive
constant. ? If it is further assumed that this prob-
ability Adt is independent of the number of pre-
vious hospital admissions for the individual, then
the process is a Poisson process. It follows that
the probability of exactly x admissions of the in-
dividual occurring during the time ¢ is given by
aM ar)
x!
P(t) = 2=0.1,2,... + OD
If the time interval t is taken as 1 year (i.e.,
t =1), then the probability density function for the
number of hospitalizations annually for the indi-
vidual is Poisson, where the parameter A is the
expected number of hospital episodes during this
period.
Suppose now that the probability of being
hospitalized in a small time interval varies from
individual to individual in a population so that x
varies over the population. If the distributions of
the X's is gamma, then the distribution of the
population by number of hospital episodes yearly
is negative binomial, derived as follows.
8More rigorously, the probability of one or more hospital
admissions for anindividual inthe small interval dt is given
by dt +o (dt) where the term o (dt) denotes a
auantity which is of smaller order of magnitude than dt and
is the probatility that more than cne admission occurs.
From equation (1) above, the distribution of
admissions annually for an individual with param-
eter \ is
f(xI\) = x=012,.... (2)
x!
For allindividuals in the population, the distribu-
tion of X's is assumed to be a gamma distribu-
tion, i.e.,
sO) = or (BN! ao #30050, kB 3
Then the joint distribution of x and A is
AB+1) ya+x-1_ 4)
e
FINE) = Ss
The distribution of the population by number of
hospital episodes annually, thatis f (x), is found
by integrating equation (4) with respect to A.
Thus, -
f=J f(xINg (Nd)
LN)
x=0,1,2,.... (5
which is the negative binomial distribution.
Data available from the Health Interview Sur -
vey for the period July 1958-June 1960 were used
to determine the goodness of fit of the negative
binomial distribution to the observed frequencies
of persons with 0, 1, 2, 3, and 4 or more hospital
episodes in the average year. A separate fit was
made for males and females in each of the follow-
ing six age groups: under 15 years, 15-24, 25-34,
35-44, 45-64, and 65 years and older. Each fit
was accomplished by estimating the parameters
« and 8 by the method of moments, that is, from
the relations
X=a/B
= «(1+8)/B8*
where x and s? are the observed mean and vari-
ance respectively. The comparisons of the ob-
served and expected frequencies for the 12 age-
sex groups were considered to be fairly good.
While a satisfactory fit of the negative binomial
distribution is not sufficient evidence to claim
the model to be valid, it does indicate that the
model provides an excellent basis for generating
hospital episodes reasonably consistent with ob-
servation.
Duration-of-Stay Model
Once an individual is hospitalized, his length
of stay depends largely on the reason for the hos-
pitalization. Each diagnosis can be considered to
generate its own length-of-stay distribution; for
example, the length-of-stay distribution for ton-
sillectomies will be different from that for pneu-
monia cases. Since the overall length-of-stay
distribution is a mixture of many different dis-
tributions, it is not expected that any one distribu-
tion will fit well. For purposes of computer sim-
ulation, the distribution of duration of stay ob-
served in the Health Interview Survey could have
been used, except that the data had been grouped
into fairly large intervals, particularly for the
upper tail of the distribution. A smoothed distri-
bution was preferred.
In order to obtain some insight into an ap-
propriate theoretical distribution for duration of
stay, the conditional probabilities of discharge on
a particular day, given thatthe individual has been
hospitalized up to that day, were computed for the
July 1958-June 1960 Health Interview Survey data
for grouped periods on an average daily basis.
The rise and fall of these conditional probabil-
ities as duration of stay increased was charac-
teristic of the log-normal distribution. Accord-
ingly, this distribution was fitted to the available
duration-of-stay data separately within age and
sex groups. Since the agreement between these
expected and observed proportions was considered
satisfactory, the log-normal distribution was
adopted as the duration-of-stay model.
Computer Simulation of Hospital Episodes
The stochastic models for hospital admission
and duration of stay developed above suggest that
hospital episodes for the U.S. civilian, noninstitu-
tional population can be readily simulated on a
computer by means of a set of daily (or weekly)
transition probabilities for each individual. These
probabilities are assumed to remain constant over
time for an individual, at least for periods up to
2 years, but to vary from individual to individual.
On a given day, say i, an individual can be
in one of S+1 states. These states are:
H = not in hospital
I
Hj
in hospital ; days for a particular
episode, j=1,2,..., SS.
For each state on day 7, transition probabilities
are specified for the two eligible states for the
individual on day i; +1. Thus, for individual k
in state H on day i:
P, = the probability of being hospitalized
on day 7 +1
1-P, = the probability of remaining out of
the hospital on day i; +1.
Similarly, for individual k in state H; on day i:
P, = the probability of being discharged
on day :i+1 (i.e., going to state H)
1-P;, =the probability of remaining in the
hospital on day +1 (i.e., going to
state Hj).
In brief, then, by specification of S +1 probabil-
ities (Py and Pj, j=1,2,...,S) for individual
k, a computer can be programmed to generate a
hospitalization history for this individual during
a designated time period. If the individual is not
in the hospital initially, the computer generates
a uniform random number R, between zero and one
to compare with P. If R, < P,, individual k is
hospitalized on the first day (i.e., transferred
from state Hto state H,). The computer then
generates a second uniform random number R, to
compare with Py. If R, < P,, individual k is
discharged on the second day; otherwise individual
k remains in the hospital for a second day and a
third uniform random number R, is generated for
comparison with P,,, etc., until discharge oc-
curs. Following discharge, the next uniform ran-
dom number is again compared with P,. If the
initial random number R, > P,, individual k re-
mains in state Hand R, is compared with P,,
etc., until hospitalization occurs or the designated
time period is exhausted. The computer is pro-
grammed to record the day of admission and the
day of discharge for each hospital episode gener -
ated.
The probability of hospital admission (P,)
was specified on a weekly basis rather than a
daily basis, except for individuals in their last
year of life, This change was necessary in order
to reduce computer time. If an individual was ad-
mitted to the hospital in a given week, the com-
puter assigned the specific day of the week, and
hence the day of admission, by means of a ran-
dom sequence.
The weekly admission probabilities were es-
timated by first fitting a negative binomial distri-
bution to the distribution of the population by num-
ber of hospital episodes annually, as observed in
the July 1958-June 1960 Health Interview Surveys,
for each of 12 age-sex groups. Delivery episodes
were excluded from the female age groups. The
« and B parameters estimated in the fitting proc-
ess for a particular age-sex group (table A) are
also, in accordance with the hospital admissions
model, the parameters of the gamma distribution
of x (equation 3), where x is the expected annual
number of hospital episodes for a given individual
in the group. While it would have been possible
to determine a x for each individual in a group
by sampling the appropriate gamma distribution
at random, this was not considered necessary.
Rather, each of the 12 age-sex groups was divid-
ed further into 10 equal subgroups. It was planned
initially to assign the first subgroup in each age-
sex group a value of A corresponding to the 5%
point of the appropriate gamma distribution, the
second a \ corresponding to the 15% point, and
so on to the corresponding to the 95% point for
the 10th subgroup. Since the gamma distributions
of interest were highly skewed, the tables of the
incomplete gamma-function used to determine
these \ values were lacking in some detail.” The
tables are entered for arguments uand p where
Bra?
pP a—1.
However, the tables did not give values of the
argument u below the 40th percentile in all cases
of interest and below the 50th percentile in a few
cases, Thus, the first four or five subgroups in
each age-sex group were assigned A's corre-
sponding to the interpolated 20th percentile (or
25th percentile) values of uv. The average value
of the assigned A's in each age-sex group was
adjusted to the observed mean of the distribution
of hospital episodes annually by adjusting the A
corresponding to the 95% point.
The constant weekly admission probability
P,, which applied toall individuals in a subgroup,
was obtained by dividing each assigned A by 52.
These weekly admission probabilities for the 120
subgroups are given in table B. Each newborn
individual was assigned to one of the 10 subgroups
in the "under 15 years' age group of the same
sex.
Table A. a and B parameters of the negative binomial distributions fitted to the
distribution of the population in 12 age-sex groups by number of annual hospital ep-
isodes
[See equation 5]
Male Female
Age
a B a B
Under 15 years-=--e=e-cececcmcccccceccccnneaa= 0.3097 4.9090 0.2432 4.7083
15-24 yearsese-ce-ceecoccmccccccccccccccce—a 0.2369 3.7665 0.1398 1.4480
25-34 yearsesmsc-scecccmcmcmcmcccccccceceee—- 0.2824 4.2410 0.2290 1.7924
35-44 yearse=eeemcceccccccccccccccccccecn———— 0.2834 3.6292 0.3901 3.3889
45-64 yearsemee-ceccmccmccccccccccccmceceaaa 0.2833 2.6920 0.3622 3.2396
65+ yearses==---cccmcacccccccncccccececeeee—- 0.3906 2.6129 0.3569 2,8701
Table B. Estimated weekly hospital admission rates per 1,000 persons not in their last
year of life,
weekly and annual hospi cal admission rates
put probabilities x 103)
excluding deliveries, by age,sex, and 10 percent subgroups, and average
for all subgroups combined (computer in-
Age groups
Subgroup
Under 15 | 15-24 25-34 35-44 45-64 65+
years years years years years years
Male
leceemmer cece cccce ccna 0.146 0.233 0,273 0,318 0.430 0.548
2emmmmeme emcee ————— 0.146 0.233 0.273 0.318 0.430 0.548
3emmmmmmm ecm eeen eee 0.146 0,233 0,273 0,318 0.430 0.548
femme 0.146 0.233 0.273 0.318 0.430 0.548
EE 0,219 0.233 0.273 0.318 0.430 0.822
bomen mmm 0.439 0.350 0.409 0.477 0.644 1.370
Jmmmmmccccccc mmc 0.768 0.700 0.793 0.927 1.250 2.284
Beceem cece 1.382 1.325 1.202 1.404 1.894 3.655
Jee cece eee 12 522 2.574 2.788 3.257 4,395 6.076
10-mccccommc cece cae 15,549 6.074 6.144 7+177 9.685 12,336
Average weekly rate for
all subgroups combined--- 1.1463 1.2188 1.2701 1.4832 2.0018 2.8735
Average annual rate for
all subgroups combined=-=-- 59.608 63.378 66.045 77.126 104.094 | 149.422
Female
lececmmccccccccccnccncaaa 0,187 0.327 0.344 0.422 0.332 0.375
2emeececmccmm mmc ccn a 0.187 0,327 0.344 0.422 0.332 0.375
3eccmmcccccccc mca ccaeaa 0.187 0.327 0.344 0.422 0.332 0.375
foocamccmmmc ccc macceaaae 0.187 0.327 0.344 0.422 0.332 0.375
Semecccmmc emcee mcnnee 0.187 0.327 0.344 0.633 0.499 0.563
b-=ccceemmmccccccceaaaa- 0.280 0.327 0.516 1.055 0.890 1.005
FE rr 0.560 0.491 1.238 1.759 1.521 1.729
3 15 0 im 8m 1.060 1.325 2.475 2,814 2,564 2,895
Jmmememcccccm ccc nec ceea 2,061 3.435 5.054 4,678 4,452 5.025
TE) so ro 0 rae rr el i se 4,904 11.354 13.709 9.497 10.102 11.404
Average weekly rate for
all subgroups combined--- 0.9800 1.8567 2.4712 2,2124 2.1366 2.4121
Average annual rate for
all subgroups combined--- 50.960 96.548] 128.502; 115.045 | 111.103 125,429
This rate was incorrectly computed,
The correct value is 6,227.
computor runs.
The error was
not discovered until after the
The expected annual rate for the computer
generated episodes would have been raised from 59.6 to 63.1 per 1,000 persons by use of
the correct value.
A slightly different model was used to gener-
ate the hospital histories of persons in their last
year of life, Prior to generating a random number
to determine if an individual would be hospital-
ized in the week of interest, the computer first
checked whether or not the individual had entered
his last year of life. If so, the computer changed
to a set of daily probabilities of being hospital-
ized which increased gradually as the day of death
approached, These probabilities were estimated
from data on hospital utilization during selected
time periods prior to death reported in the Middle
Atlantic States study.! First, rough estimates of
admission rates per 1,000 deaths and number of
Table C. Estimated daily hospital admission probabilities for persons in their last
year of life as a function of time period to death
Daily Povsous Daily
Period 3 to death admissions hospital admission
eric prise ©o per 1,000 osp probabil-
deaths per 1,000 ities
deaths
1 and 2 daySe==-=-eemeecmceccecccccmcecemeee——————— 41,8 674.9 0.061935
2 and 3 dayS--==smmmeccececcescmesesseeeecee—e—————— 30.4 702.3 0.043286
3 and 4 dayS=---=-m-mcemcmcccecccccccmccc cece 31.8 731.1 0.043496
4 and 5 dayS-==mcemmmmmececccccceccccecccce mmm 18.8 746.91 0.02517]
5 and 6 daySe=s=mmeecmccccecccemeccece ecm ———— 23.1 767.0 0.030117
6 and 7 daySe=-=-mmecececcceccemcccccceemec———————— 27.5 791.5 0.034744
1 and 2 weekS=r=-mmecccmccccccc cece ccc ec, ———— 7.3 813.5 0.008919
2 and 3 weekS=-=mmcmcemccmmmce nme cece, ————— 6,2 845.9 0.007329
3 and 4 weekS----ecmcmmmccacacccccmccmmc ee meeae een 7.0 880.2 0.007953
1 and 2 monthS==-meccaccccecccmccnccccc meee cece ——— 3.1 915.2 0.003387
2 and 3 monthS=--=ccccccacccccccccccc ccc ccc cane 2.6 949.3 0.002739
3 and 4 months----eecccmcccccmccccccccccc ceca 1.8 963.9 0.001867
4 and 5 monthsS====cccccccccccmcccccccccc ccc cae 1.1 967.1 0.001137
5 and 6 months==eecmccccceeccemceccccnnccccncccann 1.3 977.4 0.001330
6-12 monthS===mecececccceccccc ccc ec ccc ccc cece ena 0.65 985.1 0.000660
lRatio of first to second column,
Table D. Probability of birth occurring
in a hospital, by age of mother, 15-44
years
Total Annual
annual births Prob-
births in hos-| ability
Ave of per pital of de-
1,000 per livery in
females, 1,000 | hospital
1960 females
15-24
years==--- 166.32 135.86 0.816859
25-34
years---- 152.86 145.79 0.953749
35-44
years---- 36.60 31.40 0.857923
persons per 1,000 deaths not in the hospital as a
function of the time period prior to death were
derived from changes (first differences)in the
nights of care rates and from the discharge rates.
The ratio of these two quantities provided there-
quired estimates of daily admission probabilities
as a function of days to death. These estimates,
shown in table C, were then plotted and the func-
tion smoothed graphically. The smoothed func-
tion provided 365 admission probabilities, one for
each day in the last year of life.
Except for deliveries, reasons for hospital-
ization were not assigned in the simulation pro-
gram. Females with delivery dates less than 31
days away from the day of interest were not ad-
mitted to hospital during this period. On the as-
signed delivery dates, the computer determined
on a random basis which deliveries were to oc-
cur in hospitals. The probability of a delivery
taking place in a hospital was estimated for three
age groups of mothers by dividing the number of
births in hospitals per 1,000 females® by the rate
for all births. These probabilities are shown in
table D.
The log-normal distribution
1 —(in t—u)/202
f(t) ——
o @n' to
, 120 6)
was fitted to the observed distribution of length
of hospital stay (excluding deliveries) for each of
the 12 age-sex groups using unpublished Health
Interview Survey data for the period July 1958-
June 1960. The parameters, x and o, in f(t)
were estimated from the equations
2
ON +02
syxl=e9"~1,
where x and s? are the mean and variance of the
observed duration-of-stay distribution. The con-
ditional probabilities (P;, ) of discharge on day
t, given that the individual had been hospitalized
for the previous ¢-1 days, were then estimated
from the fitted log-normal duration-of-stay dis-
tributions. The computer program limited length
of stay to a maximum of 100 days so that Pg,
was set equal to .999999,
Separate sets of discharge probabilities were
estimated for females 15-24, 25-34, and 35-44
years of age hospitalized for deliveries. The es-
timates were derived in the same manner as dis-
cussed above, using unpublished length-of-stay
data for deliveries obtained from the Health In-
terview Survey, July 1958-June 1960. Length of
stay was limited to a maximum of 21 days for fe-
males 15-24 years, 24 days for females 25-34
years, and 30 days for females 35-44 years.
Duration-of-stay distributions were not
available for persons in their last year of life.
However, average length-of-stay estimates by
sex in age classes under 45, 45-64, and 65 years
and over were obtained from the study of hospital
utilization by decedents in the Middle Atlantic
States.! The variances of the duration-of-stay
distributions for these age-sex classes were im-
puted by using the relationship observed between
s? and x for these distributions among persons
not in their last year of life. Thus, estimates of
the conditional discharge probabilities were de-
rived as above with length of stay limited to a
maximum of 100 days.
The estimates of the parameters up and ¢ for
the log-normal fit of the duration-of-staydistri-
butions in each of the above cases are given in
table E.
The computer operations for generatinghos-
pitalization histories for persons not in their last
year of life (Phase I) and for persons in their last
10
year of life (Phase II) are given in detail in the
Appendix.
The basic computer program, with modifica-
tions as discussed below, was carried out for an
initial population of 10,000 individuals for 108
weeks or 756 days. This population was distrib-
uted by age and sex to represent the U.S, civilian,
noninstitutional population.
The initial population was given a dynamic
dimension by introducing births and deaths, The
births were distributed over a 2-year period ac-
cording to 1960 monthly birth rates and then as-
signed specific days within months at random. A
total of 237 births (121 male and 116 female) were
assigned the first year and 240 (123 maleand 117
female) the second year. Coinciding with the birth
dates, deliveries were assigned to females in the
15-24, 25-34, and 35-44 years of age groups.
A simple three-digit code was used to record
dates on the computer, with the first day of the
108-week period coded 001. The first 26 days of
the hospital episodes simulation program were
utilized to establish the appropriate initial dis-
tribution of the population over the states H and
H;. This was necessary since all individuals
were in state H (i.e., not in hospital) on day 001.
An alternative procedure would have required
assignment of about 22 individuals to the hospital
states H; on day 001. Since the average length of
stay in short-term hospitals is approximately 8
days and less than 10 percent of the episodes
exceed 15 days, allowing the computer 26 days to
establish an equilibrium distribution over the
states H and H; is considered adequate. There
were no additions to the population from births
assigned prior to day 027. Hospitalization his-
tories for newborn infants were generated by the
computer only for the days following birth,
In order to introduce appropriate hospital
admission rates for individuals entering their
last year of life, death dates were assigned by
age and sex covering a 3-year period. A total
of 93 deaths were assigned in the first year, 94
in the second, and 89 in the third. As with the
birth dates, these were distributed first accord-
ing to 1960 monthly death rates and then were
assigned specific days within months at random.
The third year death dates were necessary since
individuals scheduled to die in that year enter
last year of life sometime during the second year.
Table E. Estimates of the parameters u and o for log-normal distributions fitted to
duration-of-stay distributions, by sex and age
Persons not in their last year of life
Female
Age Male }
Deliveries Deliveries
excluded only
Mu a Mu a M o
Under 15 yearS=====mmemmccecoccccceocccoaan 1.220 1.12] L116, 1.15 vivie Sie
15-24 yearS-=m---sm-cccececnnmcccnnce nena 1.51) 1.10} 1.19 1.01 1.32 0.47
25-34 yearS-------smeccccccmccecmmecceneann 1.63 1.02 1.467 0.94] 1.33 0.53
35-44 yearS--emmcmemcccmccmcecceedene mee 1.74 1.00 1.65] 0.90.1 1.37 0.69
45-64 yearS-=mmememecmmemccecemcecceece————— 2.081 0.93] 1.94] 0.9 Til wis
65+ years-------s-cmcmcccccccccccee nen 2.30; 0.891 2.33, 0.85
Persons in their last year of life
Age Male Female
Mu a Mu 0
Under 44 yearS-=s=-mme-emecccceccccccceccccca-" 2.21 0.90 1.96 0.93
45-64 years--smeemecmccmeccececseceeee—————— 2.53 0.82 2.94 0.70
65+ years-s=--e-sccemccmcccecemne eee ———— 2.64 0.80 2.36 0.84
A four-digit number was used to code the day of
death for computer purposes; all individuals not
in their last year of life at the end of the second
year were assigned 9999 as their day of death.
No deaths were assigned prior to day 0027.
INTERVIEW SIMULATION MODEL
A relatively simple model was devised for
simulating the responses obtained in interviews
with individuals experiencing one or more hos-
pital episodes during the 12 months prior to the
date of interview. For each hospital episode, the
model simulates on a probability basis failure to
report the episode, reported length of stay (if the
episode is reported), and reported month of dis-
charge.
Underreporting of Hospital Episodes
The response error study by the Survey Re-
search Center, University of Michigan, reported
three major factors related to underreporting of
hospital episodes.? It was found that underreport-
ing increases with increasing time between dis-
charge and interview, decreases with increasing
length of stay, and increases for personally em-
barrassing or threatening types of illness. Only
the first two factors are included in the interview
simulation model. The Michigan study reported
percent underreporting by number of weeks be-
tween hospital discharge and interview for three
length-of-stay groups.? The Center also had pro-
duced, through internal analysis of reported data,
rough distributions of underreporting by number
11
Table F. Probability of failure to report
hospital episodes,by length of stay and
number of weeks between discharge and
interview, and average probability of
failure
ve 7 Length of stay
Weeks between dis-
©" eharge and
interview 1 2-4 5+
J day days days
RE Nondelivery episodes
1-4 weekse-eoeoena-- 0:07} 0.04 0.01
5-8 weeks-~===ce--- 0,131 0.05 0.02
9-12 weeks-=c-emmmea -0,18| 0.06 0.04
13-16 weeks =v mmmmu- 2221-+0,07 0.05
17-20" peeks === mu=- 0.24 0.08 0.06
21-24 -weekS-mmmmuax 0.261 0.09 0.07
25-28 weeks----m--- 0.287 0.11 0.08
29-32 weeks~-====u-- 0.29; 0.14 0.09
33-36 weeks-------- 0.30] 0.18 0.09
37-40 weeks-=c-=u-- 0,30| 0.22 0.10
41-44 weekS~=mm=m-- 31 0.27 0.10
45-48 weeks-mmmaamn 0.32. 0.33 0.11
49-52 weeks-==----- 0.32; 0.39 0.46
53-56 weeks====-=-- 0.32} . 0.39 0.46
57-60 weeks-====--- 0.32] 0.39 0.46
Average probabil-
ity of failure----| 0.2571 0.1871 0.147
Delivery episodes
1-4 weeks--mmecmcan- 0.00 0.00 0.00
5-8 weekS-==cmmucaa 0.00 0.00 0.00
9-12 wWeekS-=wmenaaa 0.01 0.01 0.00
13-16 weekSe=mamuu= 0.01 0.01 0.00
17-20 weekSe==meeu= 0.02 0.02 0.01
21-24 weekS=mmmmuu= 0.02 0.02 0.01
25-28 weekS-=mmacu= 0.03, 0.03 0.03
29-32 weekS-==mmum= 0.03 0.03 0.03
33-36 weekS=mmmmuu= 0.03 0.03 0.03
37-40 weekSemmmmeu= 0.04 0.04 0.04
41-44 weekS==mmmmea= 0.05 0.04 0.04
45-48 weekSemmmmaan 0.05 0.05 0.05
49-52 weekSe=mmmmu= 0.06 0.05 0.05
53-56 weekSe=meama= 0.07 0.06 0.06
57-60 weekSew=meunua 0.07 0.06 0.06
Average probabil-
ity of failure----| 0.033| 0.030 0.027
of weeks between discharge and interview for
four length-of-stay classes. After study of data
from these sources, smooth curves were fitted
for each of the length-of-stay groups, and esti-
mates of underreporting rates for hospital epi-
sodes.- as a function of the time interval between
12
discharge and interview (in 4-week periods) were
obtained for the model. The model treats report-
ing of each hospital episode as a random event
dependent on length of the recall period and length
of the hospital stay for the episode.
These estimated underreporting rates were
used for nondelivery episodes only. Since the data
upon which they were based included all episodes,
these estimates are slightly optimistic. The re-
sponse error study mentioned above found only
3 percent underreporting of deliveries, whereas
the average underreporting for all diagnoses was
10 percent. A separate set of underreporting rates,
averaging 3 percent, was constructed for delivery
episodes. These were also made dependent on
length of recall period and length of hospital stay.
The estimated rates of underreporting of non-
delivery and delivery episodes were treated as
probabilities in the computer simulation. They
are shown in table F for 15 four-week periods
prior to interview. The last two intervals (53-56
weeks and 57-60 weeks) were included to allow
for overreporting of episodes occurring more than
12 months prior to interview. These were in-
cluded in the model by telescoping forward, again
on a probability basis as discussed below, epi-
sodes reported by the respondent with actual dis-
charge dates in the 14th or 15th 4-week periods
prior to interview. The same underreporting rates
were used for these latter two periods as were
estimated for weeks 49-52 (the 13th 4-week
period).
Length-of-Stay Response Errors
The Michigan study found the average length
of stay reported in household interviews to be
slightly greater than the average calculated from
hospital records.? One explanation given for this
is that underreporting is more likely for short-
stay episodes than for longer episodes, so that
the average of reported episodes has an upward
bias. Thus, it is quite possible that duration-of-
stay response errors are symmetrically distrib-
uted about zero. The model for interview sim-
ulation in this study made use of this hypothesis,
but also introduced a slight positive shift in the
mean of the distribution of reporting errors in
length of hospital stay.
The model approximates the distribution of
length-of-stay response errors by a normal or
Gaussian distribution with a mean error of zero
in an expected 95 percent of the responses and a
mean error of 2 days in the remaining 5 percent.
Thus, the overall distribution of errors is as-
sumed normal with mean equal to 0.05 x 2.0 or
0.1 day. Unit variance was assigned thesenormal
error distributions; this is considered a conserv-
ative value for this parameter.
A reported length of stay for a given episode
is generated in two steps according to this model.
First, a uniform random number between zero and
one is compared with 0.05. If it is less than 0.05,
2 days are added to the actual length of stay;
otherwise the actual length of stay is left un-
changed. Second, a random normal deviate is
generated and added to either the adjusted length
of stay or the actual length of stay, depending on
the previous comparison of the random number
with 0.05. The resulting length of stay in days is
accepted as the reported duration of stay.
Month-of-Discharge Response Errors
The first Michigan study found that for 82
percent of the episodes, the respondent correctly
reported the month of admission; about 11 percent
were reported 1 or more months later than shown
in the hospital records, and 7 percent were ear-
lier by 1 or more months. ? The later study, com-
paring three alternative hospitalization survey
procedures, showed 14 percent reported the month
of discharge later, 9 percent earlier, and 77 per-
cent correctly, using the Health Interview Survey
procedure.® The month of discharge is calculated
by use of the reported admission date and the
reported length of hospitalization. The evidence
in these two studies indicates a greater tendency
to telescope the hospital episode forward rather
than backward in time, although the shift is a
modest one. The bulk of the inaccurate reports
were plus or minus 1 month of the correct month,
The model adopted for simulation of response
errors leading to incorrect classification of the
month of discharge also approximates errors in
the date of admission by a normal distribution.
As with the length-of-stay response errors, this
distribution is a weighted combination of two nor-
mal distributions, the first with mean zero to apply
in an expected 95 percent of the episodes and the
second with a mean of 10 days applicable to the
remaining 5 percent. The overall error distri-
bution has mean equal to 0.05 x 10 or 0.5 days.
The variance assigned these distributions depend-
ed on the number of weeks between date of inter-
view and date of admission. This interval was di-
vided into 4-week periods and the assigned stand-
ard deviation was set equal to 0.4 times the num-
ber of 4-week periods in the interval. Thus, the
model permits larger errors in reported date of
admission with increasing length of recall peri-
od. As with the length-of-stay model, these pa-
rameters are considered conservative.
A reported month of discharge for a given
episode is generated in three steps. In the first
step a uniform random number between zero and
one is compared with 0.05. If it is less than 0.05,
10 days are added to the actual admission date;
otherwise the actual admission date is left un-
changed. In the second step, a random normal
deviate is generated and multiplied by a stand-
ard deviation ¢ depending on the number of weeks
between the interview date and the date of ad-
mission. This product is added to either the ad-
justed admission date or the actual admission
date, depending on the prior comparison of the
random number with 0.05. In the third step, the
reported length of stay is added to the adjusted
admission date obtained in step two to yield the
reported discharge date and hence the reported
month of discharge.
Computer Simulation of Interviews
The output of each computer -generated hos-
pitalization includes the day admitted, whether the
episode was for a delivery or not, and the day dis-
charged. The output also includes the age, sex,
and day of death for each individual experiencing
one or more episodes during the 108 weeks of
interest, These data make up the input for com-
puter simulation of interviews on a specified in-
terview date. The basic steps in the computer
program for this simulation are outlined below.
1. The death date for each individual is
compared with the interview date to
determine if the individual is alive and
hence eligible for interview. If the indi-
14
vidual has died the computer proceeds
to the next individual.
If the individual is alive on the inter-
view date, the computer determines
whether the admission date for the first
episode occurred prior to the interview
date. If not, the next episode is examined.
If the admission date is earlier than the
interview date, the discharge date for the
episode is checked to determine if it is
a completed episode. If not, the computer
records an incomplete episode and pro-
ceeds to the next episode.
. If the episode is completed prior to the
interview date, the number of days be-
tween interview and discharge is com
puted to determine if discharge occurred
more than 420 days prior. If so, the com-
puter proceeds to the next episode. -
If the episode is completed less than
420 days prior to the interview date, a
uniform random number is generated and
compared with the appropriate probabil-
ity of failure to report the episode (based
on the number of weeks between inter-
view and discharge dates, length of stay,
and reason for hospitalization as shown
in table F). Ifthe generated random num-
ber is less than this probability, the epi-
sode is recorded as nonrecalled and the
computer proceeds to the next episode,
If the episode is recalled, a second uni-
form random number is generated and
compared with 0,05. If itis less than 0.05,
the computer adds 10 days to the actual
admission date and continues. If not, the
computer continues.
A random normal deviate is generated
and multiplied by the appropriate stand-
ard deviation ¢ (based on number of
weeks between interview and admission
dates). The resulting product is added
to the adjusted or actual admissiondate,
whichever is appropriate as per step (6),
to obtain the reported admission date of
the episode.
8. A third uniform random number is gener-
ated and compared with 0.05. If itis less
than 0.05, the computer adds 2 days to
the actual length of stay for the episode
and continues, If not, the computer con-
tinues,
9. A second random normal deviate is gen-
erated and added tothe adjusted or actual
length of stay, whichever is appropriate
as per step (8), to obtain the reported
length of stay.
10. The reported length of stay is added to
the reported admission date to deter-
mine the reported discharge date.
11. The interval between the interview date
and reported discharge date is compared
with 364 to determine if the episode is
reported with discharge date in the year
prior to interview, If so, the computer
records the appropriate output data for
the reported episode and proceeds to ob-
tain "interview data' for the next epi-
sode. If the reported discharge date is
more than 364 days prior to the inter-
view date, the computer proceeds to the
next episode.
This interview simulation program (Phase
III) was carried out for 13 interview dates 28 days
apart beginning with day 418. The hospitalization
histories for the 1,870 individuals with one or
more episodes generated by the hospital simula-
tion program (Phases land II)over the 108-week
period provided the interview simulation input
data. The results of the simulation for each inter-
view date were tabulated by the computer and the
following tables printed out.
1. Number of nonrecalled discharges by sex
and age in each of 13 four-week periods
prior to the interview date.
2. Number of nonrecalled delivery dis-
charges for females by age in each of
the 13 four-week periods.
3. Number of incomplete episodes by sex,
age, and type of episode (i.e., nonde-
livery and delivery).
4. Number of reported discharges of 1-day
stays by sex and age for the 13 four-
week periods.
5. Number of reported discharges of 2-4-
day stays by sex and age for the 13 four-
week periods.
6. Number of reported discharges of 5-or-
more-day stays by sex and age for the
13 four-week periods.
7. Number of reported discharges by sex
and age for the 13 four-week periods.
8. Number of reported delivery discharges
for females by age for the 13 four-week
periods.
9. Number of reported hospital days as-
sociated with reported discharges in the
13 four-week periods by sex and age.
10. Number of persons by sex and age and
reported number of completed episodes
in the year prior to interview.
11. Number of persons by sex and age and
reported number of completed nonde-
livery episodes in the year prior to in-
terview,
12. Number of reported days in hospital in
each of 17 four-week periods prior to
interview for reported discharges by sex
and age.
13. Number of reported days in hospital in
each of 17 four-week periods prior to
interview for reported delivery dis-
charges for females by age.
The computer print-out of these tables is
designated by the heading "interview reported."
The computer program also tabulated this same
set of tables using actual results for all episodes
with discharge in the year prior to interview ex-
perienced by the persons alive on the date of in-
terview, that is, with no response errors of any
kind. These tables are designated in the computer
print-out by the heading ''perfect interview."
Finally, the results for persons who died in the
year prior to the interview date were tabulated
by the computer and added to the "perfect inter-
view" tables. The computer print-out of these
tables is designated by the heading 'all dis-
charges."
SIMULATION ESTIMATES OF ERRORS
IN HOSPITAL DISCHARGE DATA
The computer-generated data for the 13 in-
terview dates were averaged and estimates of
annual hospital discharge rates by age and sex
derived for the "interview reported," "perfect
interview," and "all discharges data tabulation
categories, Similar sets of estimates were also
derived for discharge rates excluding deliveries,
annual hospital days per 1,000 persons with and
without deliveries included, and average length
of stay. These estimates are given in tables 1-5.
The population bases for these rate estimates are
given in table 6.
Estimates of the effects of interview re-
sponse errors (using data for the full 12 months
prior to interview) and of exclusion of persons
who died during the reference year on hospital
discharge data can be derived from tables 1-5.
For example, interview response errors are
estimated to reduce the annual discharge rate
per 1,000 living persons by 106.0 - 94.0= 12.0 or
11.3 percent (table 1). In addition, exclusion of
persons who died during the reference year re-
duces the annual discharge rate by an estimated
additional 6.6 discharges per 1,000 persons
(112.6 - 106.0) or 5.9 percent. The overall annual
rate based on the interview procedure is esti-
mated to be less than the actual annual discharge
rate by 112.6 - 94.0 = 18.6 per 1,000 persons or
16.5 percent. Similar estimates of effects of pro-
cedural errors on hospital discharge data can be
determined from the tables for specific age-sex
groups. Although input parameters for this study
were based in part on empirical data, the specific
output estimates of underreporting should be con-
sidered illustrative rather than necessarily are-
15
flection of the situation which prevails in the
Health Interview Survey.
Estimates of the percent underreporting of
hospital discharges by number of weeks between
discharge and interview for all discharges, de-
liveries only, and discharges excluding deliveries
were computed for "interview reported' versus
"perfect interview," "perfect interview'' versus
"all discharges," and "interview reported' versus
"all discharges.' These estimates are given in
tables 7-9. A similar set of percent underreport-
ing estimates was computed for hospital dis-
charges by recall period and actual length of stay
and are shown in tables 10-12,
IV. RESULTS
EVALUATION OF
HOSPITAL EPISODES SIMULATION
Several aspects of the computer-generated
hospital episode data were examined in order to
evaluate the accuracy of the simulation. First,
the generated distributions of the persons ineach
of the 12 age-sex groups by number of annual
nondelivery episodes (perfect interview data) were
compared with the expected distributions. With
but minor exceptions, the computer simulation
program generated distributions of the number of
nondelivery episodes equivalent to the expected
negative binomial distributions.
It is noted that, except for females 35-44
years of age, the expected frequencies of two or
more episodes were higher than generated. This
tendency on the low side could be due to inade-
quate representation of the upper tail of the gam-
ma distribution of the weekly admission probabil-
ities (i.e., the X values). It is possible that this
aspect could be improved by subdividing the 10th
subgroup in order toinclude X values correspond-
ing, for example, to the 99th percentile. An alter-
native explanation of the observed deficiency of
persons with two or more episodes is that the uni-
form random number subroutine, used in the
computer program, failed to generate small ran-
dom numbers in close order proximity as fre-
quently as expected statistically.
The second aspect examined was a compari-
son of the generated annual discharge rates by
age and sex, excluding deliveries, with the ex-
pected rates (table G). The sampling errors in-
dicate that the differences in these rates are not
statistically significant. The annual discharge
rates generated by the computer for males and
females 65 years and older are greater than the
expected rates shown in table G since they in-
16
clude persons in their last year of life who were
alive on the interview date (and hence subject to
higher admission rates). The expected rates were
not adjusted for the higher admission probabil-
ities assigned to persons in their last year of life.
The Health Interview Survey annual discharge
rates, excluding deliveries, reported for the peri-
od July 1958-June 1960 are higher than the ex-
pected rates for the computer simulation since
the published rates are based on data reported for
the most recent 6 months of the year prior to in-
terview. On the other hand, the weekly admission
probabilities were derived from unpublished
Health Interview Survey data on the distribution
of the population by number of annual nondelivery
episodes based on reported experiences for the
12 months prior to interview.
The third aspect examined in evaluating the
computer simulation of hospitalization histories
was the distribution of persons in the hospital
on the interview date by age in comparison with
the unpublished Health Interview Survey distri-
bution for the Sunday prior tointerview. The data,
given in Table H, show the two distributions to
be in close agreement.
Fourth, the average length of stay in days by
sex and age for the computer episodes (perfect
interview data) are compared with the July 1958-
June 1960 Health Interview Survey results in table
J. Agreement, slightly better for females than
males, is fairly good. The sample size (episodes)
for males 15-24, 25-34, and 35-44 years of age,
is only about 30 for each of these age classes,
accounting in part for the variability observed in
their length-of-stay averages.
The distribution of the generated lengths of
stay has not been tabulated in detail. However,
the distribution for 1-day, 2-4-day, and 5-or-
more-day stays is available from table 10. This
distribution is compared with the distribution
Table G.
per 1,000 persons per year, and
observed rate, by sex and age
Comparison of computer generated and expected number of nondelivery episodes
simulated population base and standard deviation of
: Standard
Sex and ane Observed Expected Simulonsa deviation
g number! number p pe of observed
ase
rate
Male
Under 15 yearS=sem-meecececrccccaccceecn-= 64.9 62.1 1,740 5.20
15-24 yearS=--ememmcemcaccccncemoec nn ————— 55%.7 62.2 608 S77
25-34 yearS---mememmmmececcceeeecce—————— 54.0 64.5 613 8.87
35-44 yearSe-sm-emcmcmemecececsmemeceeeee————— 68.5 75.1 641 9.44
45-64 yearS---mmmmemmmmcacmccee ee ————— 102.0 100.1 965 9.04
65+ yearS-m-mrmemmcomcccmmecc meee 168.1 142.0 345 18.11
Female
Under 15 yearS=-=--mmecececcccanaeneea= 51.5 50,2 1,675 4,70
15-24 yearS-===-ememmmccccccccceacaanan= 97.8 95.0 683 10.95
25-34 yearS-e-ememmme-ecemcccccccccesceaaa= 105.8 125.8 669 12.85
35-44 yearSe=-emmmmcmmcccececmeccaan———— 122.3 112.4 695 11.22
45-64 yearS-e=mmmmmmemccccccccccccenna= 105.0 97.4 1,043 8.63
65+ yearS=-e=mmemcmmcececccccccccccn ann 135.2 119.3 429 14,92
The observed rates are inflated slightly by the experience of personsin their last
year of life.
These persons are not included in the expected number.
Table H. Number and percent distribution of persons in hospital on day of interview,
by age: computer simulation! versus Health Interview Survey
Computer simulation Health Interview Survey
Age
Nurber Percent Number in Percent
distribution thousands distribution
All agesS-===--==cmceeo-- 344 100.0 367 100.0
Under 15 yearS-=eeeeececeeeee-- 43 12.53 48 13.1
15-24 yearS—==-=-meecccecmcccean-— 37 10.8 42 11.4
25-34 yearS-=m-mmmmcmmcecameaa- 40 11.6 43 11.7
35-44 yearsS-=--m-cemcccencooo- 42 12.2 54 14.7
45-64 yearSe-emememeecececeeean 110 32.0 106 23.9
65+ yearS---meemcecmcccconcoao 72 20.9 74 20.2
ltotal of incomplete episodes
for 13 interview dates.
2Average number of persons in short-stay hospitals last
July 1959-June 1960.
Sunday night, United States,
17
Table J. Comparison of average length of
stay in days, by sex and age: computer
generated! versus Health Interview Sur-
vey?
Health
Computer .
Sex and age Interview
generated Survey
Length of stay in
Male days
All ages----- 10.1 10.3
Under 15 years----=- 6.0 6.1
15-24 years-=======- 9.6 8.2
25-34 years-=--==-=-- 10.7 92.3
35-44 yearS-e=====-= 8.4 11.8
45-64 years-==-e=--- 13.3 12,2
65+ yearse=meememen= 13.7 15,9
Female
All ages-==-=- 6.9 7.2
Under 15 years-=---=- 5.6 5.8
15-24 yearse-===e=w== 4.4 445
25-34 yearseemmmmm=- 4,6 5.2
35-44 yearS-eemmme== 6.6 6,7
45-64 yearS=====m=e= 10.9 11.4%
65+ years-==eemena- 15.4 14.0
lperfect interview data;
interview dates.
2See table 1, p. 14, in reference 8.
average of 13
of discharge rates for these same length-of-stay
groups as derived from unpublished July 1958-
June 1960 Health Interview Survey data in table
K. Agreement is quite good.
It seems clear from the above analysis that
the hospital episodes simulation model and com-
puter program are quite satisfactory. Further
improvements, one of which has already been
mentioned, are possible. It would be desirable
that the various hospitalization statistics within
age-sex groups generated by the computer have
greater reliability than can be obtained with a
population run of 10,000. The computer program
should also be revised to permit individuals to
shift over time from their initial age group to the
next higher age group. This is particularly im-
portant for the two older age groups, as will be
made clear from results discussed in later sec-
tions. For example, under the present program
when 2-year histories are generated, the number
of persons 65 years and older for the second year
is reduced significantly due to deaths during the
first year. The assignment of reasons for hospital-
ization within age-sex groups can be added to the
computer program with relatively little difficulty.
Length-of-stay distributions for each reason or
condition would be more realistic if this change
were made in the program,
Table K. Comparison of length-of-stay distributions: computer generated discharges!
versus Health Interview Survey discharges
Computer generated | Health Interview
discharges Survey discharges
Length of stay
Percemt Rate per Percent
Number distri- 1,000 distri-
bution persons bution
Total-=-scceccccccccccce ccc cece neee 1,071.1 100.0 114.5 100.0
1 day====eecccmccccccrennccc cece cece e een 131.8 12.3 12.6 11.0
2-4 daySe=ee=s=eeccmccccccmcccccecccccennnaan 383.5 35.8 41.0 35.8
5+ daySmeemmccceccceccncrecn cence creme 555.8 51.9 60.9 53.2
; Perfect interview data; average of 13 interview dates.
2 Unpublished data, July 1958-June 1960.
18
EVALUATION OF
INTERVIEW SIMULATION
The interview simulation model introduced
errors due to failure to report hospital discharges
which occurred in the year prior to interview,
failure to report discharge dates accurately, and
failure to report length of stay accurately. As dis-
cussed previously, the parameters for generating
these errors were based largely on results ob-
tained in the Michigan study. Percent underre-
porting of hospital discharges as generated by
the computer is compared with the Michigan study
data in table L separately by length of stay and
by weeks between discharge and interview. As
expected, since the assigned probabilities were
based on these two factors, the generated results
essentially reproduced the Michigan study data.
A more detailed comparison of the computer-
generated underreporting rates with the assigned
rates jointly by length of stay and interval between
discharge and interview is given in table M. As
in table L, the generated underreporting rates in-
clude the effect of reporting the discharge date
inaccurately. Thus, the computer overreported
2-4-day stays and 5-or-more-day stays for the
4-week period immediately prior to interview. The
agreement between the observed and expected re-
sults in table M is fairly good, but not outstand-
ing. The total number of episodes for each cell
was not large for any one interviewing date, rang-
ing from 10for the 1-day stays to 30 for the 2-4-
day stays and 40 for the S-or-more-day stays.
However, the generated results shown are aver-
ages for 13 interviewing dates, and hence are
based on fairly substantial numbers of cases.
The effect of inaccurately reported discharge
dates may be responsible for the several instances
of somewhat larger differences than expected.
The computer simulations of failure to report
the discharge date and/or the length of stay ac-
curately have not been evaluated in detail. As
discussed in the next section, the net shifting of
discharge dates by the computer was essentially
negligible. The proportion of discharge dates re-
ported accurately (i.e., within the same 4-week
period as the actual discharge date) has not been
determined. The average length of stay for the
Table L. Percent underreporting of hos-
pital discharges, by length of stay and
number of weeks between discharge and
interview: computer generated! versus
Michigan study?
Length of stay and tod
weeks between dis- Computer Mishigm
charge and interview | 8€M€rate Suny
Length of stay
Totaleeecencas 11..3 12.9
1 day-=ecceccccnnaa- 23.2 26.9
2-4 daySe=eemcmanna- 11.3 1%.0
54 days - 8.5 9.9
Weeks between dis-
charge and interview
Total-===cuma= 11.3 12,0
1-20 weekSemmenccnana 4.9 5.0
21-40 weekS=mmmmman= 10.7 9,0
41-52 weekSememmanna 23.0 24,0
Interview reported versus perfect in-
terview; average of 13 interview dates.
Includes errors in reported discharge
dates.
2See table 15, p. 21, and table 40, p.
36, in reference 2.
interview reported discharges was 0.3 of a day
greater than for the perfectinterview discharges,
which agrees with the Michigan study.? The dis-
tributions of reported length of stay by actual
length of stay have not been tabulated, however.
Based on this limited evaluation, the inter-
view simulation program appears to have been
fairly successful. Further analysis is necessary
before any suggestions regarding revisions in the
model and computer program can be made.
ESTIMATES OF
SPECIFIC ERROR COMPONENTS
As mentioned in the introduction, a definite
decreasing trend can be observed in the number
of discharges reported in the Health Interview
Survey when tabulated by month prior to interview.
It is of considerable interest to determine the fac-
19
Table M. Percent underreporting
assigned rates?
of hospital discharges
number of weeks between hospital discharge and interview:
by actual length of stay and
computer generated! versus
1-day stay 2-4-days stay 5+-days stay
Weeks between discharge
and interview Computer | Assigned | Computer | Assigned | Computer | Assigned
generated rate generated rate generated rate
Total-==ecceccaea- 23.2 24,8 12.3 15.6 5.53 9.8
1-4 week§-=mmmmmmmmmmn-e 3.2 7.0 30,3 4.0 30.3 1.0
5-8 weeks--eeccoccnnnan 16.3 13.0 4,9 5.0 2.3 2.0
9-12 weekS--=memencuan= 20.0 18.0 2.8 6.0} 1.2: 4.0
13-16 weekS-==ceenccenu= 21.6 22.0 3.1 7.0 . S21 5.0
17-20 weeksS~ecemcmcnnnnan 18.8 24,0 11.4 8.0} 5.1 6,0
21-24 weekS-===mmeccnen 23.5 26.0 4.4 9.0 4,9} 7.0
25-28 weekS==cmcmccnnan 22.8 23.0 10.3 11.0 9.1 8.0
29-32 weekS~==mcmcmcnen 31.0 29.0 8.1 14.0 9.6 9.0
33-36 weeks-==--scencnna 25.5 30.0 13.3 18.0 10. k 9.0
37-40 weekS-=mmccncca-a- 32:1 30.0 11.1 22.0 5.0 10.0
G1 bls WeekS ww www www 20.4 31.0 21.7 27.0 6.7 10.0
45-48 weeks=mrmmemmacnn 3.5 32.0 26.4 33.0 15,0 11.0
49-52 weekS=wevccccnana 32.4 32.0 29.0 39.0 37.2.} 46,0
Interview reported versus perfect interview; average of 13 interview dates. In-
cludes errors in reported discharge dates (see table 10).
Nondelivery episodes only.
3Percent overreported.
tors contributing to this decay curve and the mag-
nitude of their respective effects. Accordingly,
estimates have been derived of the component
parts of the discrepancy between the interview re-
ported discharges and all discharges in 4-week
intervals prior to interview, using the computer
generated hospital episode and interview simula-
tion data. These estimates are given in absolute
numbers of discharges (average of 13 interview
dates) and also as a percent of all discharges in
each of the 13 four-week periods in the year prior
to interview in table N. The average estimates
for 12, 24, 36, and 52 weeks prior to interview
are also shown in this table.
The observed decay curve is shown in the
column headed ‘interview reported." The dis-
crepancy (i.e., all discharges less interview re-
ported discharges) increases as the interval be-
tween discharge and interview increases, as does
the number of not reported discharges and also
the number of discharges of persons who died in
the year prior to interview (all discharges less
perfect interview discharges). The error com-
20
ponent due to shifting of discharge dates fluctu-
ates from positive (back in time) to negative
(forward in time), but remains at a fairly low
level; the average of this component is essentially
zero for the year prior to interview.
It is clear that the number of discharges of
persons who died in the year prior to interview
should increase as the interval between discharge
and interview increases, since this group is
somewhat larger numerically at the beginning of
the year of interest and decreases in size as the
interview date is approached. This might suggest
that the total number of discharges should also
increase as the interval between discharge and
interview increases. This is incorrect, although
the average of the generated 'all discharges"
over the 13 interview dates does exhibit this in-
correct relationship in table N and also in table
8. This error is due to the unfortunate oversight
of failing to age the population in the computer
simulation program. Since the living population is
aging and also increasing in size during the year
and since the number of persons living on the date
Table N.
Estimated contribution of
error components to discrepancy between interview
reported and all discharges, by number of weeks between discharge and interview
[Average of 13 interview dates)
Discrep- All dis- Net
Weeks between ALL dig Perfect Inter- ancy: Not charges shifting
discharge and areas inter- view all less fesorted less per- | of dis-
interview & view reported | interview p fect in- charge
reported terview date 2
Number of discharges
1-4 weeks-===-= 85.5 82.2 82.2 3.3 1.5 33 -1.5
5-8 weeks--=-=---~ 86.2 81.8 77.8% 8.4 3.8 4.4 0.2
9-12 weeks-----~ 86.5 81.7 78.4 8.1 4,2 4.8 -0.9
13-16 weeks---- 88.6 83.5 | 78.1 10.5 3.3 5.1 0.1
17-20 weeks===- 88.1 82.3 74.9 13.2 5.8 5.8 1.6
21-24 weeks----~ 87.6 82.1 76.3 11.3 7.2 5.5 -1.4
25-28 weeks-=-=~ 87.8 82,2 73.0 14.8 8.2 5.5 1.0
29-32 weeks-=-- 88.3 82.4 72.5 15.59 Ded 5.9 0.1
33-36 weeks-=--- 88.1 82.1 71.2 16.9 9.9 6.0 1.0
37-40 weeks---- 89.1 82.7 13.8 15.3 10.4 6.4 -1.5
41-44 weeks-~--- 89.3 82.9 71.4 12:9 11.3 6.4 0,2
45-48 weeks---~ 89.0 82.5 64.8 24.2 16,2 6.5 1.9
49-52 weeks-~--- 89.4 82.7 54.8 34.6 28.5 Go? -0.6
Average esti-
mate for:
1-12 weeks====- 86.1 81.9 79.5 6.6 3.2 4.1 -0.7
1-24 weeks-==== 87.1 82.3 78.0 9.1 4.6 4.8 -0.3
1-36 weeks-===-- 87.4 82.3 76,1 11,3 6.1 5:2 0.02
1-52 weeks-===- 88.0 82.4 73.0 15.0 9.4 5.6 -0.02
Percent distribution of all discharges
1-4 weekS-===== 100.0 96,1 96.1 3.9 1.8 3.9 -1.8
5-8 weeks-=--===~ 100.0 94.9 90.3 9:7 4.4 vl 0.2
9-12 weeks-=-==- 100.0 94.5 90.6 9.4 4.9 5.5 -1.0
13-16 weeks---- 100.0 9,2 88.1 11.9 6.0 5.8 0.1
17-20 weeks=--=-= 100.0 93.4 85.0 15.0 6.6 6.6 1.8
21-24 weeks---=~ 100.0 93.7 87.1 12,9 8.2 6.3 -1.6
25-28 weeks-=-=-- 100.0 93.6 83.1 16:9 9.3 6.4 Le2
29-32 weeks=--~-- 100.0 93.3 82.4 17:6 10.8 6.7 0.1
33-36 weeks---- 100.0 93:2 80.8 19.2 11.2 6.8 1.2
37-40 weeks-=--~ 100.0 92.8 82.8 17.2 11,7 1.2 -1.7
41-44 weeks---- 100.0 92.8 80.0 20.0 12.6 7.2 0.2
45-48 weeks---- 100.0 92.7 72.8 2742 18.2 7.3 157
49-52 weeks~-=~- 100.0 92.5 61.3 38.7 31.9 7.5 -0.7
Average esti-
mate for:
1-12 weeks====- 100.0 95.2 922.3 Le? 3.8 4.8 -0.9
1-24 weeks--=--- 100.0 94.5 89.5 10.5 5.4 5.5 -0.4
1-36 weeks=-===-=~ 100.0 94.1 87.1 12.9 140 5.9 0.02
1-52 weeks-===--~ 100.0 93.7 82.9 17.1 10.7 6.4 -0.02
Ipischarges of persons who died
A negative value means discharge date shifted forward in time.
T
during the year prior to interview.
21
of interview, but already in their lastyear of life,
is somewhat larger on the date of interview than
at the beginning of the reference year, the number
of discharges of persons alive on the interview
date (perfect interview discharges) should de-
crease as the time interval between discharge and
interview increases. This is the key phenomenon
previously stated in the introduction. Hence "all
discharges" should either decrease or remain
constant as the interval between discharge and
interview increases.
The computer incorrectly generated a rela-
tively constant monthly number of discharges
during the reference year for persons alive on
the interview date (perfect interview discharges),
at least on the average for the 13 interview dates
(see table N), because persons 65 years and older
who died were not replaced by new persons from
the 45-64 year age group. This reduced the 65
years and over age group over time. The number
of discharges of living persons was reduced from
1,088 in the year prior to the first interview date
to 1,049 in the year prior to the last interview date.
Similarly, the number of all discharges was re-
duced from 1,162 in the year prior to the first
interview date to 1,111 in the year prior to the
last interview date. Without these decreases
(which should not have occurred) the total number
of discharges by weeks between discharge and in-
terview would have remained approximately con-
stant and the number of discharges among persons
living on the date of interview would have de-
creased with increasing time interval between
discharge and interview.
While the average levels shown in table N
(and in table 8) for all discharges, perfect inter-
view discharges, and interview reported dis-
charges are not correct as to level, the estimates
of the error components and of the discrepancy
itself are considered satisfactory. This shouldbe
clear, since the weaknesses in the generation
model tend to be compensating when the discrep-
ancy and its components are computed.
Table N shows the underestimate of all dis-
charges from an interview procedure using data
reported for the entire reference year to be 17.1
percent. If only the data reported for the 24 weeks
(approximately 6 months) immediately prior to
interview are used, the underestimate of all dis-
22
charges is reduced to 10.5 percent. The major
source of this reduction is the not reported error
component which is cut in half (5.4 versus 10.7
percent). It is of interest to note that, even if no
response errors were made, the number of re-
ported discharges in the interview is estimated to
be lower than all discharges by approximately 4
percent if reporting is confined to the 4 weeks
immediately prior to interview and 6.4 percent
when reporting for the year prior to interview.
METHODS FOR
INCREASING ACCURACY
Inspection of tables 1-4 shows that the aver-
age annual hospital discharges and hospital days
for persons alive on the interview date within each
age-sex group are underestimated by approxi-
mately 11 percent when a procedure using all data
reported for the 12 months prior to interview is
employed. The estimates are improved when they
are based only on the episodes with reported dis-
charge dates occurring in the most recent 6 months
prior to interview. The generated data have not
been tabulated on this basis so that the improve-
ment for each of the age-sex groups has not been
ascertained. However, the average underestimate
is reduced by a factor of two, approximately, with
this procedure. It is doubtful that basing the esti-
mates of interestonly on hospitalizations reported
within a shorter time interval than 6 months be-
tween interview and discharge wouldbe economi-
cally efficient. Apparently it is possible to further
increase accuracy by use of Procedure B as re-
ported in the study by the University of Michigan
in which three alternative survey procedures were
compared.? The relative biases in the average an-
nual number of discharges and hospital days by
age and sex with this procedure can be estimated
by means of the interview simulation program on
the computer. The program would require a set
of parameters (i.e., probabilities of failure tore-
port the episode, etc.) appropriate to Procedure
B. Apparently, the data for estimating these pa-
rameters are available from the study which com-
pared Procedure B with the standard procedure
used in this project.
Further improvement in the accuracy of the
hospital statistics based on the Health Interview
Survey through changes in the interview procedure
is doubtful. A method of adjusting the surveysta-
tistics is necessary. One such method, discussed
briefly in the introductory section, uses the J-
analysis technique of Simmons and Bryant to de-
rive inflation factors by which reported hospital
discharges are weighted to estimate total actual
discharges, including those of persons not alive
on the interview date. Because of limited time,
evaluation of the Simmons and Bryantapproach by
means of the generated data was not carried out.
Estimation of inflation factors toimprove the
accuracy of published hospital statistics based on
the Health Interview Survey appears both feasible
and desirable. Using the observed data to derive
the adjustment factors has considerable appeal.
It seems advisable to explore alternative methods
of estimating adjustment factors using simulation
models.
V. CONCLUSIONS
A probability model for generating hospital
admissions and duration of stay for the U.S. pop-
ulation together with an IBM 1410 computer pro-
gram for simulation of hospitalization histories
under the model were developed in this project.
The simulation program was carried out for an
initial population of 10,000 individuals for a peri-
od of 108 weeks; while the results were judged
very satisfactory, there is room for improvement
in several aspects. These are:
Estimation of weekly admission probabilities
should, at the very minimum, be based on
data obtained in the Health Interview Survey
for the most recent 6 months prior to inter-
view. These probabijlities should be improved
further by appropriate adjustment of the ob-
served episodes distributions to reflect all
hospitalizations rather than reported hos-
pitalizations.
The estimated daily admission probabilities
for persons in their last year of life were
based on sketchy data and should be improved,
using data obtained from a national study.
The simulation program should permit indi-
viduals in specific age-sex groups to shift to
the next older group over time. This is par-
ticularly essential for the 45-64 and 65 years
and over age groups, since deaths reduce
these groups significantly over time if the
population is age-static. This could be ac-
complished, with relatively little change in
the existing program, by adding an age-shift-
ing date to be treated in a manner similar
to the birth and death dates already in the
program.
Reasons for hospitalization should be included
in the program, to be assigned on a probabil-
ity basis, provided sufficient data are avail-
able for developing length-of-stay distribu-
tions by reason.
A probability model and computer program
for simulating interview data on hospital episodes
as collected in the Health Interview Survey were
also developed in this project. The computer pro-
gram was carried out for 13 interview dates 28
days apart using the data generated by the hos-
pital episodes simulation program as input. The
generated interview data were also judged satis-
factory, providing estimates of the relative biases
due to measurement errors for each of the princi-
pal hospitalization statistics obtained in the Health
Interview Survey. It is noted that the estimated
relative biases are fairly substantial.
The interview simulation model was not an-
alyzed intensively, due to limited time available
to complete this project. The parameters asso-
ciated with errors in reporting length of stay and
discharge date are considered conservative. Fur-
ther study and analysis is necessary before any
suggestions on revisions in the model and com-
puter program can be made.
It is doubtful that further significant reduc-
tions in the measurementerrors of hospitalization
data collected in the Health Interview Survey are
possible without adding unduly to the cost. The
survey design suggests that satisfactory adjust-
ment factors can be estimated from the collected
23
data. The simulation models and computer pro-
grams developed in this project provide a useful
research tool for studying alternative methods of
adjustment.
The computer program for generating hos-
pitalization histories is essentially a program for
distributing episodes in the population consistent
with the negative binomial distribution. Hence, it
should be useful, with but minor revisions, for
simulating the distributions of other events which
»
have been observed to be negative binomial. These
include, for example, the distribution of the pop-
ulation by number of colds annually and by number
of doctor visits annually. Undoubtedly there are
other health variables in this class.
The hospital episodes computer program, re-
vised as suggested, should also be useful for stud-
ies of the effects on the demand for hospital beds
of trends in such variables as age, sex, reasons for
hospitalization, and duration of stay.
REFERENCES
INational Center for Health Statistics: Hospital utiliza-
tion in the last year of life. Vital and Health Statistics. PHS
Pub. No. 1000-Series 2-No. 10. Public Health Service. Wash-
ington. U.S. Government Printing Office, July 1965.
2National Center for Health Statistics: Reporting of hos-
pitalization in the Health Interview Survey. Vital and Health
Statistics. PHS Pub. No. 1000-Series 2-No. 6. Public Health
Service. Washington. U.S. Government Printing Office, July
1965.
National Center for Health Statistics: Comparison of hos-
pitalization reporting in three survey procedures. Vital and
Health Statistics. PHS Pub. No. 1000-Series 2-No. 8. Pub-
lic Health Service. Washington. U.S. Government Printing
Office, July 1965.
4.8. National Health Survey: The statistical design of
the Health Household-Interview Survey. Health Statistics.
PHS Pub. No. 584-A2. Public Health Service. Washington.
U.S. Government Printing Office, July 1958.
SSimmons, Walt R., and Bryant, E. E.: An evaluation of
hospitalization data from the Health Interview Survey. Am.J.
Pub.Health 52(10):1638-1647, Oct. 1962.
6National Center for Health Statistics: An index of health,
mathematical models. Vital and Health Statistics. FHS Pub.
No. 1000-Series 2-No. 5. Public Iealth Service. Washington.
U.S. Government Printing Office, May 1965.
"Pearson, K., ed: Tables of the incomplete T-function.
Cambridge, England. Cambridge University Press, 1957 print-
ing of original 1922 edition.
80.5. National Health Survey: Hospital discharges, United
States, 1958-1960. Health Statistics. PHS Pub. No. 584-B32.
Public Health Service. Washington. U.S. Government Print-
ing Office, Apr. 1962.
000
24
Table
1,
10.
11.
12,
DETAILED TABLES
Page
Average annual number, number per 1,000 persons, and percent distribution of pa-
tients discharged in year prior to interview for each of three types of simula-
tion, by sex and age-=======--- i 1 —
Average annual number, number per 1,000 persons, and percent distribution of pa-
tients discharged in year prior to interview, excluding deliveries, for each of
three types of simulation, by sex and age-=~=-=--- meme meme em EEE ———————————
Average annual number, days per 1,000 persons, and percent distribution of hospi-
tal days in year prior to interview, for each of three types of simulation, by
sex and age---===--ecccccccmccceecccmeccccecmcecceeecmeesseeeeeceecsmeeee—————————
Average annual number, days per 1,000 persons, and percent distribution of hospi-
tal days in year prior to interview, excluding deliveries, for three types of
simulation, by sex and age-====--cccccccmcccccccccccncccemmcccmcccmeccmcme mmm ne
Average length of stay in days for each of three types of simulation, by sex and
Population changes during year prior to interview and population bases used in
obtaining ratesS--s-eeececcccmmccmcemecec cece mmcmes sneer ceeec cme m me ————
Percent underreporting of hospital discharges, by type of discharge and number of
weeks between discharge and interview: interview reported versus perfect inter-
NV CW mmm nm mm -——— -————— [EE Sp.
Percent underreporting of hospital discharges, by type of discharge and number of
weeks between discharge and interview: perfect interview versus all discharges---
Percent underreporting of hospital discharges, by type of discharge and number of
weeks between discharge and interview: interview reported versus all discharges--
Percent underreporting of hospital discharges, by actual length of stay and num-
ber of weeks between discharge and interview: interview reported versus perfect
INET VIEW mmm mm eee ee ee ee ee ee ee ee ee ee -—
Percent underreporting of hospital discharges, by actual length of stay and number
of weeks between discharge and interview: perfect interview versus all discharges-
Percent underreporting of hospital discharges, by actual length of stay and num-
ber of weeks between discharge and interview: interview reported versus all dis-
charges — 5 -———————————
26
27
28
29
30
31
32
32
33
33
34
25
Table 1.
Average annual number,
number per 1,000 persons,
and percent distribution of patients
discharged in year prior to interview for each of three types of simulation, by sex and age
[Average of 13 interview dates]
For living persons
Interview reported
Perfect interview
All discharges
discharges discharges
Sex and age
Nusoee Percent Nuee Percent Pubes Percent
Number P distri- | Number P distri- | Number P distri-
1,000 bution 1,000 bution 1,000 bution
persons persons persons
Both sexes
All ages-- 949.6 94.0 100.0; 1,071.0 106.0 100.0 1,143.4 112.6 100.0
Under 15 years-- 167.0 48.9 17.6 199.2 58.3 18.6 202.8 59.3 17.7
15-24 years----- 176.8 136.9 18.6 196.6 152.3 18.4 196.8 152.3 17.2
25-34 years----- 186.2 145.2 19.6 201.4 157.1 18.8 201.9 157.4 17.7
35-44 years----- 133.0 99.6 14.0 149.9 112.2 14.0 157.6 117.7 13.8
45-64 years=---- 183.4 91.3 19.3 207.9 103.5 19.4 221.1 109.5 19.3
65+ years------- 103.2 133.3 10.9 116.0 149.9 10.8 163.2 203.5 14.3
Male
All ages-- 332.9 67.8 100.0 382.1 77.8 100.0 421.7 85.4 100.0
Under 15 years-- 95.5 54.9 28.7 113.0 64.9 29.6 116.3 66.8 27.6
15-24 years----- 29.3 48.5 8.9 35.7 58.7 9.3 35.7 58.6 8.5
25-34 years----- 30.2 49.3 9.1 33.1 54.0 8.7 33.1 53.9 7.8
35-44 years----- 39.2 61.2 11.3 43.9 68.5 11.5 47.3 73.7 11.2
45-64 years=---- 87.3 20.5 26.2 98.4 102.0 25,8 104.9 107.9 24.9
65+ years--=-=-=--- 51.2 148.4 15.3 58.0 168.1 15.1 84.4 235.1 20.0
Female
All ages-- 616.7 118.7 100.0 688.9 132.6 100.0 721.7 138.4 100.0
Under 15 years-- 71.5 42.7 11.6 86.2 51.3 12.5 86.5 51.6 12.0
15-24 years=-=--- 147.3 215.7 23.9 160.9 235.6 23.4 161.1 235.9 22.3
25-34 years~---- 156.0 233,2 25.3 168.3 251.6 24.4 168.8 251.9 23.4
35-44 years----- 93.8 135.0 15.2 106.0 1532.5 15.4 110.3 158.5 15.3
45-64 years----- 96.1 92.1 15.6 109.5 105.0 15.9 116.2 111.0 16.1
65+ years------- 52.0 121.2 8.4 58.0 135.2 8.4 78.8 177.9 10.9
26
Table 2.
Average annual number,
number per 1,000 persons,
and percent distribution of patients
discharged in year prior to interview, excluding deliveries, for each of three types of simula-
tion, by sex and age
[Average of 13 interview dates]
For living persons
Interview reported
Perfect interview
All discharges
discharges discharges
Sex and age
Number Percent Funboe Percent uvose Percent
Number PeX | distri-| Number pz distri- | Number P distri-
1,000 bution 1,000 bution 1,000 bution
persons persons persons
Both sexes Excluding deliveries
All ages-- 741.4 73.4 100.0 858.4 84.9 100.0 930.0 91.6 100.0
Under 15 years-- 167.0 48.9 22.5 199.2 58.3 23.2 202.8 59.3 21.8
15-24 years----- 84.9 65.8 11.5 102.5 79.4 11.9 102.6 79.4 11.0
25-34 years=====- 90.1 70.3 12,2 103.9 81.0 12.1 104.5 81.4 11,2
35-44 years-~=== 112.5% 84.4 15.2 128.9 96.5 15,0 135.8 101.4 14.6
45-64 years====-- 183.4 91.3 24,7 207.9 103.53 24,2 221.1 109.5 23.8
65+ yearg=====-- 103.2 133.3 13.9 116.0 149.9 13.6 163,2 203.5 17.6
Male
All ages-- 332.9 67.8 100.0 332.1 77.8 100.0 421.7 85.4 100.0
Under 15 years-- 95.5 54.9 25.7 113.0 64.9 29.6 116.3 66.8 27.6
15-24 years=----- 29.5 48.5 8.9 35.7 58.7 92.3 35.7 58.6 8.5
25-34 years=---- 30.2 49.3 9.1 33.1 54,0 87 33.1 53.9 7.8
35-44 years===--=- 39.2 61,2 11.3 43.9 68.5 11.5 47.3 73.7 11.2
45-64 years----- 87.3 90.5 26.2 98.4 102.0 25.53 104.9 107.9 24.9
65+ years======= 51.2 148.4 15.3 58.0 168.1 15,1 84.4 235.1 20.0
Female
All ages-- 408,5 © 78.6 100,0 476,3 91.7 100,0 508.3 97.5 100.0
Under 15 years-- 71.5 42.7 17.5 86.2 51.5 18.1 86.5 51.6 17.0
15-24 years=--=-- 55.4 81.1 13.6 66.8 97.8 14.0 66.9 98.0 13.2
25-34 years----=- 59.9 89.5 14.7 70.8 105.8 14,9 71.4 106.6 14.0
35-44 years=-=--- 73.6 105.9 18.0 85,0 122.3 17.3 88.5 127.2 17.4
45-64 years----- 96.1 92.1 23.5 109.5 105.0 23.0 116.2 111.0 22,9
65+ years====---- 52.0 121.2 12.7 58.0 135.2 12.2 78.8 177.9 13.5
27
Table 3. Average annual number, days per 1,000 persons, and percent distribution of hospital days
in year prior to interview for each of three types of simulation, by sex and age
[Average of 13 interview dates]
Sex and age
For living persons
Interview reported
Perfect interview
All discharges
Days Days Days
Number per Pree Number per Povcent Number per Pengene
of days 1,000 bition of days 1,000 botion of days 1,000 bution
persons persons persons
Both sexes Hospital days
All ages--| 7,917.1 783.3 100.0 | 8,604.6 851.4 100.0; 9,303.4 916,1 100.0
Under 15 years--| 1,066.3 31.2.1 13.5] 1,164.4 340.9 13.51:1,186,2 346.8 12.8
15-24 years----- 992.6 768.9 12.51:1,057.1 818.8 12.4] 1,057.6 818.6 11.3
25-34 years----- 1,082.0 844,0 13.,77:1,133.9 884.5 13.2) 1,135.2 884.8 12:2
35-44 years----- 988.9 740.2 12.5} 1,068.1 799.5 12,4] 1,105.6 825.6 11.9
45-64 years----- 0 2,27%.21:1,231,% 28.77 2,496.1 | ‘1,243.1 29.0]. 2,678.31:1,326,5 28.8
65+ years------- 1,516.1 1,958.8 19.141 1,685.0 2,177.0 19.5} 2,140.5] 2,669.0 23.0
Male
All ages--| 3,497.4 711.9 100.0 | 3,844.8 782.6 100.0] 4,238.2 837.9 100.0
Under 15 years-- 622.5 357.8 17.8 681.4 391.6 17.7 702.2 403.1 16.6
15-24 years----- 303.7 499.5 8.7 342.9 564.0 8.9 342.9 563.1 8.1
25-34 years----- 335.2 546.8 9.6 352.7 575.4 9.2 352.7 574.4 8.3
35-44 years-=--- 337.0 525.7 9.6 368.1 574.3 9.6 392.7 611.7 9.3
45-64 years----- 1,186.5] 1,229.5 33.9 1,305.2 |1,352.5 34.0) 1,360.2] 1,399.4 32.1
65+ years------- 712.5] 2,065.2 20.4 794.51.2,302.9 20.6, 1,087.5, 3,029.2 25.6
Female
All ages--| 4,419.7 850.9 100.0 | 4,759.8 916.4 100.0 5,065.2 971.1 100.0
Under 15 years-- 443.8 265.0 10.1 483.0 288.4 10,2 484.0 288.6 9.6
15-24 years----- 688.9] 1,008.6 15.6 714.2] 1,045.7 15.0 714.7| 1,046.4 14.1
25-34 years----- 746.8] 1,116.3 16.9 781.2 | 1,167.7 16.4 782.5 1,167.9 15.4
35-44 years----- 651.9 938.0 14.7 +700.0'| 1,007.2 14.7 712.91 1,024.3 14.1
45-64 years----- 1,084.7 1,040.0 24,5] 1,190.9 | 1,141.8 25.01 1,318.11{ 1,259.0 26.0
65+ years------- 803.61 1,873.2 18.2 890.5 | 2,075.8 18.7) 1,083.0} 2,377.0 20.8
28
Table 4. Average annual number, days per 1,000 persons, and percent distribution of hospital days
in year prior to interview, excluding deliveries, for three types of simulation, by sex and age
[Average of 13 interview dates ]
Sex and age
For living persons
Interview reported
Perfect interview
All discharges
Days Days Days
Number per Percent Number per Peroont Number per Porcent
distri- distri- distri-
of days 1,000 bution of days 1,000 DL LOon of days 1,000 tution
persons persons persons
Both sexes Hospital days excluding deliveries
All ages--| 7,042.0 696.7 100.0 | 7,740.9 765.9 100.0 | 8,439.7 831.1 100.0
Under 15 years--| 1,066.3 312.) 15.11 1,164.4 340.9 15,0} 1,186.2 346.8 14.1
15-24 years----- 618.5 479.1 8.8 688.9 533.6 8.9 689.4 533.6 8.2
25-34 years----- 698.7 545.0 9.9 756.4 590.0 9.8 757-7 590.6 9.0
35-44 years----- 871.2 652.1 12.4 950.1 711.2 2.3 987.6 737.6 11.7
45-64 years----- 2:,271.2} 1,135.1 32.3 2,496.1] 1,243.1 32.21 2,678.3 1 1,326.5 31.7
65+ years------- 1,516.1) 1,958.8 21.5] 1,685,0| 2,177.0 21.8} 2,140.5] 2,669.0 25.3
Male
All ages--| 3,497.4 711.9 100.0 | 3,844.8 782.6 100.0 | 4,238.2 857.9 100.0
Under 15 years-- 622.5 357.8 17.8 681.4 391.6 17.7 702.2 403.1 16.6
15-24 years----- 303.7 499.5 8.7 342.9 564.0 8,9 342.9 563.1 8.1
25-34 years----- 335.2 546.8 9.6 352.7 575.4 9.2 352.7 574.4 8.3
35-44 years----- 337.0 525.7 9.6 368.1 574.3 9.6 392.7 611.7 9.3
45-64 years----- 1,186.5 1,229.5 33.91 1,305.2] 1,352.5 34.0) 1,360.2 1,399.4 32.1
65+ years------- 712.5 { 2,065.2 20.4 794.51 2,302.9 20.61 1,087.5 | 3,029.2 25.6
Female
All ages--| 3,544.6 682.4 100.0] 3,896.1 750.1 100.0 | 4,201.5 805.5 100.0
Under 15 years-- 443.8 265.0 12.5 483.0 288.4 12.4 484.0 288.6 1.5
15-24 years----- 314.8 460.9 8.9 346.0 506.6 8.9 346.5 507.3 8.2
25-34 years----- 363.5 543.3 16.3 403.7 603.4 10.4 405.0 604.5 9.6
35-44 years----- 534.2 768.6 15.1 582.0 837.4 14.9 594.9 854,7 14,2
45-64 years----- 1,084.7 | 1,040.0 30.61 1,190.91 1,141.8 30.6).1,318.1 1 1,258.9 31.4
65+ years------- 803.6 | 1,873.2 22.6 890.51] 2,075.8 22.8 1.2,053.0 } 2,377.0 25.1
29
Table 5. Average length
of stay in days for each
[Average of 13 interview dates ]
of three types of simulation,
by sex and age
Sex and age
For living persons
Interview reported
Perfect interview
All discharges
Number Number Number ’
_ | Number | Average _ | Number | Average _ | Number | Average
of os of dis- | length of hot of dis-| length o2 253 of dis- | length
Fa charges | of stay a charges | of stay oa charges | of stay
Both sexes
All ages--| 7,917.1 949.6 8.3] 8,604,6 | 1,071.0 8.0] 9,303.4} 1,143.4 8,1
Under 15 years--| 1,066.3 167.0 6.4 1,164.4 199.2 5.811,186.2 202.8 5.8
15-24 years----- 992.6 176.8 5.6 1,066.9 196.6 5.411,057.6 196.8 5.4
25-34 years----- 1,082.0 186.2 5.8711,133.9 201.4 5.6 1,135.2 201.9 5.6
35-44 years----- 988.9 133.0 7.411,068.1 149.9 7.111,105.6 137.6 7.0
45-64 years----- 2,271.2 183.4 12.4] 2,496.1 207.9 12.0} 2,678.3 221.1 12.1
65+ years----=--- 1,516.1 103.2 14,7 | 1,685.0 116.0 14.5] 2,140.5 163.2 13.1
Male
All ages--| 3,497.4 332.9 10.5 | 3,844.8 382,1 10.114,238.2 421.7 10.1
Under 15 years-- 622.5 95.5 6.5 681.4 113.0 6.0 702.2 116.3 6.0
15-24 years----- 303.7 29.5 10.3 342.9 35.7 9.6 342.9 35.7 9.6
25-34 years----- 335.2 30.2 11.1 352.7 33.1 10.7 352.7 33.1 10.7
35-44 years----- 337.0 39.2 8.6 368.1 43.9 8.4 392.7 47.3 8.3
45-64 years----- 1,186.5 87.3 13.6] 1,305.2 98.4 13.3(1,360.2 104.9 13.0
65+ years------- 712.5 51.2 13.9 794.5 58.0 13.7 | 1,087.5 84.4 12.9
Female
All ages--| 4,419.7 616.7 7.2] 4,759.8 688.9 6,9] 5,065.2 721.7 7.0
Under 15 years-- 443.8 71.5 6.2 483.0 86.2 5.6 484.0 86.5 5+6
15-24 years----- 688.9 147.3 4.7 714.2 160.9 4.4 714.7 161.1 4.4
25-34 years----- 746.8 156.0 4.8 781.2 168.3 4.6 782.5 168.8 4.6
35-44 years----- 651.9 93.8 6.9 700.0 106.0 6.6 712.9 110.3 6.5
45-64 years----- 1,084.7 96.1 1.3.0 0,190.9 109.5 10.9} 1,318.1 116.2 11.3
65+ years------- 803.6 52.0 15.5 890.5 58.0 15.41 1,053.0 78.8 13.4
30
Table 6. Population changes during year prior to interview and population bases used in obtain-
ing rates
[Average of 13 interview dates ]
Births Deaths Rate: bases
Initial priow prior Births | Deaths ina
Sex and age DUEDRY first fins during during | Jc per- | Inter- [Perfect |,1; gis.
L pey day of [day of yeer year sons view |inter- hotees
Li year year reported | view ohare
Both sexes
All ages-- 10,000 144.5 $8.6 | 235.6 96.4 | 10,225 10,107 | 10,107 | 10,155
Under 15 years-- 3,167 144.5 5.4 235.6 7.9 3,534 3,416 3,416 3,420
15-24 years----- 1,293 0.3 2.0 1,291 1,291 1,291 1,292
25-34 years----- 1,286 1.4 2.2 1,282 1,232 1,282 1,283
35-44 years----- 1,343 v 1.5 we 5.3 1,336 1,336 1,336 1,339
45-64 years----- 2,045 v 14.9 oy 22.6 2,008 2,008 2,008 2,019
65+ yearg------- 866 g 55.1 56.4 774 774 774 802
Male
All ages-- 4,866 74.0 32.81 119.5 53.3 4,973 4,913 4,913 4,940
Under 15 years-- 1,615 74.0 3.6 119.5 4.5 1,800 1,740 1,740 1,742
15-24 years----- 610 0.2 3 %.8 608 608 608 609
25-34 years----- 615 0.6 1.0 613 613 613 614
35-44 years----- 645 s 1.0 2.6 641 641 641 642
45-64 years----- 989 9.4 ‘ 14.7 965 965 965 972
65+ years------- 392 18.0 28.7 345 345 345 359
Female
All ages-- 5,134 70.5 25.8 | 116.1 43.1 5,252 5,194 5,194 5.219
Under 15 years-- 1,552 70.5 1.8 116.1 3.4 1,733 1,675 1,675 1,677
15-24 years----- 683 . 0.1 . 0.2 683 683 683 683
25-34 years----- 671 . 0.8 5 1.2 669 669 669 670
35-44 years----- 698 . 0.5 wr 2.7 695 695 695 696
45-64 years----- 1,056 = 3 5.5 . 7.9 1,043 1,043 1,043 1,047
65+ years------- 474 . 7.1 27.7 429 429 429 443
Ipistribution based on table
29, p. 42, of reference 8.
31
Table 7.
between discharge and interview: interview reported versus perfect interview
[Average of 13 interview datc ]
Percent underreporting of hospital discharges, by type of discharge and number of weeks
Delivery and nondelivery
Delivery discharges
Discharges excluding
discharges deliveries
Weeks between
discharge and
interview Inter- Percent {| Inter- Percent | Inter- Percent
view Tentoes under- view Jontees under- view Doxteer under-
re- : re- re- : re- re- : re-
ported view ported | ported view ported | ported view ported
Total----- 949,51 1,071.1 11.4 208.3 212.7 Z.) 741.2 858.4 13.7
lobmmmmmcmmmmeam 82.2 82.2 0.0 16.8 16.2 13.7 65.4 66.0 0.9
5=8-mmme meme 77.2 81.8 4.9 16.4 16.5 0.6 61.4 65.3 6.0
9-12-=-emmmmme- 78.4 81.7 4.0 16,2 16.5 1.8 62.2 65.2 4.6
13-16=-=mmmmm=m= 78.1 83.5 6.5 16,3 16.5 1,2 61.8 67.0 8.8
17-20--====-==-===~ 74.9 82.3 9.0 15.7 16.5 4.8 59.2 65.8 10.0
21-24 mmm e 76.3 82.1 7-1 16.4 16.7 1.8 59.9 65.4 8.4
A 73.0 82,2 2),2 15.8 16.5 4,2 57.2 65.7 12.9
29-32-~-cmmmenn 72.8 82.4 11.7 16.0 16.4 2.4 56.8 66.0 13.9
33=36=mmmmmmmmm 71.2 82.1 13.3 15.8 16,2 2:3 55.4 65,9 15.9
37-40==mmmmmmmmm 73.8 82.7 10.8 15.6 16.0 2.5 58.2 66.7 12.7
41-4fmecmccmmenm 71.4 82.9 13.9 16.1 16.2 0.6 35.3 66.7 17.1
45-48=mmmmmmmmmm 64.8 82.5 21.5 15.0 16.2 7.4 49.8 66,3 24.9
49-52-=-nemmennn 54.8 82.7 33.7 16,2 16.3 0.6 38.6 66.4 41.9
lpercent overreported.
Table 8.
between discharge and interview: perfect interview versus all discharges
[Average of 13 interview dates]
Percent underreporting of hospital discharges, by type of discharge and number of weeks
Delivery and nondelivery
Delivery discharges
Discharges excluding
discharges deliveries
Weeks between
discharge and
interview Perfect Popoant Perfect Yooous Perfect Yorgene
inter- ALL, fenders inter- All Bhdey inter- ALY [det
yigew ported yigw ported view ported
Total--=-- 1,071.17 1,143.5 6.3 212.7 212.7 0.0 858.4 930.8 7.8
lebmmmmm mmm eee 82.2 85.5 3.9 16.2 16,2 0.0 66.0 69.3 4.8
81.8 86.2 5.1 16.5 16.5 0.0 65,3 69.7 6.3
81.7 86.5 5.3 16.5 16.5 0.0 63,2 70.0 6.9
83.5 88.6 5.8 16.5 16.5 0.0 67 .0 72.1 7:1
82.3 88.1 6.6 16.5 16.5 0.0 65.8 71.6 8.1
82.1 87.6 6.8 16.7 16.7 0.0 65.4 70.9 5.3
82.2 87.8 6.4 16.5 16.5 0.0 65.7 71.3 7.9
82.4 88.3 6.7 16.4 16.4 0.0 66.0 71.9 8.2
82.1 88.1 6.8 16.2 16.2 0.0 65.9 71.9 8.3
82.7 89.1 7.2 16.0 16.0 0.0 66.7 73.1 8.8
82.9 89.3 7.2 16.2 16.2 0.0 66.7 73.1 8.8
82.5 89.0 7:3 16.2 16.2 0.0 66.3 12.8 8.9
82.7 89.4 7.5 16.3 16.3 0.0 66.4 73.1 9.2
32
Table 9.
between discharge and interview: interview reported versus all discharges
[Average of 13 interview dates |
Percent underreporting of hospital discharges, by type of discharge and number of weeks
Delivery and nondelivery
Delivery discharges
Discharges excluding
discharges deliveries
Weeks between
dischenge and | yy... Percent | Inter- Percent | Inter- Percent
interview : - : d :
view All under view All under view All under-
re- re- re- re- re- re-
ported ported ported ported ported ported
Total~==~- 949.5 1,143.5 17.0 208.3 212.7 2.1 741.2 930.8 20.4
l-b4emmmmmm meen 82.2 85:5 3+9 16.8 16,2 13.7 65.4 69.3 5.6
5-8--crmmcm manne 77.8 86.2 9.7 16.4 16.5 0.6 61.4 69.7 11.9
9-12--ncmmmmmmmm 78.4 86.5 9.4 16.2 16.5 1.8 62.2 70.0 11.)
13-16====--===u== 78.1 88.6 11.9 16.3 16.5 1.2 61.8 72.1 14.9
17-20==mmmmmmmmm 74.9 88.1 15.0 15.7 16.5 4.8 59.2 71.6 17.3
21-24--nnmmmmmmm 76.3 87.6 12.9 16.4 16.7 1.8 59.9 70.9 15:5
25-28 cmmm mmm 73.0 87.8 16.9 15.8 16.5 4.2 57.2 71.3 19.7
29-32--=-mmmnm 72.8 88.3 17.6 16.0 16.4 2.4 56.8 71.9 2..0
33-36-==mmmmmm—— 71.2 88.1 1942 15.8 16,2 2.5 55.4 71.9 22.9
37-40====-mmmmmm 73.8 89,1 17.2 15.6 16.0 2.5 58.2 73.1 20.4
41-44ecnmmmmmmem 71.4 89.3 20.0 16.1 16,2 0.6 55.3 13.1 24.4
45-48mmmmmmmmnmm 64.8 89.0 27.2 15.0 16.2 7.4 49.8 72.8 31.6
49-52--ccmmmnman 54.8 89.4 38.7 16.2 16.3 0.6 38.6 73:1 47.2
lpercent overreported.
Table 10.
Percent underreporting of hospital discharges,
weeks between discharge and interview: interview reported versus perfect interview
[Average of 13 interview dates]
by actual length of stay and number of
1-day stay 2-4-day stay 5+-day stay
Weeks between Inter- Inter- Inter-
discharge and view Jeuiese Percent | view Jontese Percent | view Yogtese Percent
interview re- view under- re- io under- re- anier under-
ported dig re- ported Siew re- ported Yow re-
dis- ported dis- orted dis- r orted
charges charges charges charges Pp charges charges Pp
Total-==-=-- 101.2 131.8 23.2 339.9 383.5 11.4 508.8 555.8 8.5
lobfmemcmmmce meen 9.1 9.4 3.2 29.5 29.4 19.3 43.7 43.5 19.5
5=8emmme mma 8.2 92.8 16.3 27-1 28.5 4.9 42.5 43.5 2.3
9=l2emmm mmm mmm 8.0 10.0 20.0 27.3 28.3 2.3 42.9 43.4 1.2
13-16===-mmmemmn 8.0 10.2 21.6 28.4 29.3 3k 41.7 44.0 32
17-20=-==mmumnun 7.8 9.6 18.8 26.4 29.8 11.4 40.7 42.9 5.1%
21=24mccmmmmean 7.5 2.8 23.43 28.2 29.5 4.4 40.7 42.8 4.9
25-28===cmmmmmmm 7.8 10.1 22.8 26.2 29,2 10.3 39.0 42.9 |' 9.1
29-32--mmvmmmm mn 6.9 10.0 31.0 27.2 29.6 8.1 38.7 42.8 9.6
33-36=cmmmmmm—n 7.6 10,2 25.5 25.5 29.4 13.3 38.2 42.5 10.1
37-40=mmmmmm mmm 7.2 10.6 32.1 26.5 29.8 11.1 40.1 42.2 5.0
41-44 mem mmm 8.6 10.8 20.4 23.8 30.4 2%.7 39.0 41.8 6.7
45-48---nmcmnne 7.4 10.8 31.5 22.3 30.3 26.4 35.1 41.3 15.0
49-52--ccmcnnnn- 7.1 10.5 32.4 21.3 30.0 29.0 26.5 42.2 37.2
lpercent overreported.
33
Table 11. Percent underreporting of hospital discharges,
weeks between discharge and interview: perfect interview versus all discharges
[Average of 13 interview dates)
by actual length of stay and number of
1-day stay 2-4-day stay 5+-day stay
Yok Debussy Perfect POTCONE Perfect Percent Perfect Percent
ee inter. All ink vg inger- 41). he Inger. all nder-
View dis- Ton tov Sige ree view i 8 To
is- hareges is- charges dis- charges
charges i ported charges 2 ported charges 2 ported
Total----- 131.8 136.1 3.2 383.5 405.4 5.4 555.8 602.2 7.7
9.4 9.6 2.1 29.4 30.4 3.3 43.5 45.5 4.4
9.8 10.2 3.9 28.5 30.0 5.0 43.5 46.1 5.6
10.0 10.3 2.9 28.3 30.0 5.7 43.4 46,2 6.1
10.2 10.53 2:9 29.3 31.1 5.8 44.0 47.1 6.6
9.6 92.8 2.0 29.8 31.6 3.7 42.9 46.6 7.9
9.8 10.0 2.0 29.5 31.2 5.4 42.8 46.5 8.0
10.1 10.4 2.2 29.2 30.8 5.2 42.9 46.6 7+
10.0 10.3 22 29.6 31.2 541 42.8 46.8 8.5
10.2 10.5 2.9 29.4 31.0 552 42.5 46.6 8.8
10.6 1.0 3.6 29.8 31.7 6.0 42.2 46.4 9.1
10.8 11.2 3:6 30.4 32.3 5.9 41.8 45.8 8.7
10.8 11.3 4.4 30.4 32.2 5:6 41.3 45.5 9.2
10.5 11.0 4.5 30.0 31.9 6.0 42.2 46.5 2.2
Table 12. Percent underreporting of hospital discharges,
weeks between discharge and interview: interview reported versus all discharges
[Average of 13 interview dates]
by actual length of stay and number of
l-day stay 2-4-day stay 5+-day stay
Weeks between Inter- Inter- Inter-
discharge and view ALL Percent | view All Percent view All Percent
interview aed dic under- 2g did- under- re- dis- under-
porte re- porte re- orted re-
dis- charges ported dis- charges ported ii hi charges ported
charges charges charges
Total===-==~ 101.2 136.1 25.6 332.9 405.4 16.2 508.8 602.2 15.3
lefevre mm 9.1 9.6 Sed 29.5 30.4 3.0 43.7 45.5 4.0
5-8-cmmccn mr 8.2 10,2 19.6 27.1 30.0 9.7 42.5 46.1 7.3
9-l2e-cmmmmmm—- 8.0 10.3 22.3 27.3 30.0 8.3 42.9 46,2 Zed
13-16=mcmmcmc cme 8.0 10.5 23.8 28.4 31.1 8.7 41.7 47.1 11.5
17-20===mmecmmm= 71.8 95 20.4 26.4 31.6 16.5 40.7 46.6 12.7
21-24 mem 745 16.0 25.0 28.2 31.2 9.6 40.7 46.5 12.5
25-28~= mm mmm mmm 7.8 10.4 25.0 26.2 30.8 14.9 39.0 46.6 16.3
29-32-cmmcmmm mmm 6.9 10.3 33.0 27.2 31.2 12.8 33.7 46.8 17.3
33-36==cmmmmmmm 7.6 10.3 27.6 25.5 31.0 17.7 38.2 46.6 18.0
37-40 mmm 7:2 11.0 34.5 26.5 31.7 16.4 40.1 46.4 13.6
4l-bbfmmmm mmm mm 8.6 11.2 23.2 23.8 32.3 26.3 39.0 45.8 14.8
45-48-=--nmmmnnn 7.4 11,3 34.5 22,3 32.2 30.7 35.1 45.5 22.9
49-520 ccnmnnn- 7+X 11.0 35.5 21.3 31.9 33.2 26.5 46.5 43.0
34
APPENDIX
OUTLINE FOR COMPUTER SIMULATION OF HOSPITAL DISCHARGES
[Input data are found in table B for the MP1 matrix, in table C for the MP2 matrix, and in table D for the MP3 matrix. For other matrices in the
computer program, data are not reproduced in this report because of their bulk
Each age-sex group of n individuals is assigned
birth dates by, delivery dates c¢,, and death dates
dy,
cludes:
1.
2.
3.
4.
where k=1,2,...,n.
The input data also in-
Weekly admission probabilities P, appropriate
to the kth individual according to his age, sex,
and subgroup as per the MP1 matrix;
Daily discharge probabilities P; appropriate
to the kth individual according to his age, sex,
and number of days already hospitalized as per
the MP2 matrix;
Probabilities P, of being hospitalized for a de-
livery according to age as per the MP3 matrix;
Daily discharge probabilities P; for delivery
hospitalizations according to age and number of
days already hospitalized as per the MP4 ma-
trix.
These probability matrices are all used in Phase
I. In Phase II, the input data consists of birth dates,
delivery dates, death dates, and the number of days to
death m for individuals determined in Phase I to be in
their last year of life. The input data for Phase II also
includes:
i.
Daily admission probabilities FB, according to
the number of days of life remaining to the in-
dividual as per the MP7 matrix;
Daily discharge probabilities Py according to
age, sex, and number of days already hospital-
ized as per the MP8 matrix.
Histories are generated separately for each of the
n individuals in an age-sex group. Starting with the
first individual the basic steps in the computer program
are as follows:
I. Determine
m=365— (d,— by)
whether dy — b, >364. If no, set
and day i =1 and proceed to
III (Phase II). If yes, set /=b, and:
a. Generate uniform randomnumbers R; for each
day from b, to 756 as outlined below. First,
however, check, is dy, - 1 < 365? If yes, set
m=365-(d,-i) and proceed to II. If no, is
c,—30 756? If yes, record 7 +1 asthe
discharge date for this admission and loop back
e. Generate R._. (j=1 to 100)
II. Is cy=12?
to I for the next (k +15t)
proceed to I-e.
individual. If no,
and proceed to I-f.
Is R;,; < P. 5 If no, proceed to I-g. If yes,
proceed to I-h.
Is i+, =156? If yes, record 757 as the dis-
charge date and loop to I for the next individual.
If no, loop back to I-e. taking j=,+1.
Record +; as the discharge date for this ad-
mission. Is 7+) = 756? If yes, loop back to
I for the next individual. If no, loop back to I-a.
taking i=1+j +1.
If no, loop back tol-a. taking r=7+1.
If yes, proceed to II-a.
a.
Generate random number R,.
no, loop back to I-a. taking 7 +7 +1.
proceed to II-b.
Record ; as a delivery admission date; then
proceed to II-c.
Generate R;,, (s=11t030). IsR,
: B - : ny .
3 \ ’ ) Lf Ee i
5 Ta , ) . : wr im }
lt )
op B51 i eo A HE A { Ea
; aii a es n oo
Ne tht = if al
a if 5 : oo
- E A :
oo bo, (i
g he
i mgd g i m
) i i
n
'
| den gl
vy :
; a
Fam Ey -
{
x i =
* i, » LT
n- Nie ) nl, - ie
hz rr - .
A
I Bt ' i 4 ll. 2
A ot FE It " th R » BR . pe
a vp Gn sd BITRE Fey nti, hr ke hl CE
al wg B . ft BN Fs
, ot mye 5 41 = ) i
eh (rE Trl 2 oi HE om ¢ | .
) ry I's ! i 5 os «19 , a
it ig i i » Tiel ) EO o " Ea
, 1 i
: FP i whi Fa - A N pl | Dandy )
= Fea A a : } ie]
iv Ten " ony i N E a i
kJ y It ul fe oo x = i ' a hs
g i go. a i ' i el at ig § a oi
+ ii " mT a TS [a
. ud oo hw a p wT a il y
uw i J da: u x ” 5 wl gt = '
. yi 2 wl : )
A% = uF ' 3 i . - — RE
B 5 A I ptt bt i
, 5. mr | } u 3, sami
r i 5 a . ot ey i pi +d
. oe - 3 Ly | mn fis rr
i BET a ' v h i TL, ea!
: Bb) 0 Ape } s +r . - Bb
a iby: ahes 5! FH, = ails To
‘R. nl K i i fo - . E
= n . Es ni 3 . " . ' el )
wt " 1 . .
v Bes ' Ji 2 ha . Uy, Nl i . pgm =
) . i En : AN
I i AH i ai, ! = ky :
. mt y EE 1 pt -
: ea FEE pd P= : " i
1 5 i y »
ir Eis EL A. xy. po. J WT Bp == Eat ep }
Lk - - A hh J Ih
Su aly Ei fr ' TF pk f r i J L 4 f ire
i ap Se Ly B Br ho at i
Ln 1 a Wh a BR 2 i=
Iu nL. Ss = n EJ . a =
Cp ir a STL il i
} > I » «ld on . iat ia ny mal
) » : wy 2 . i 5 . i fe - I
WE Sn a 1 - ma aL ~
IT : I ‘ - w oo EA Lo i$ # . i
1 ) 1 nl a y = : 5 i oh
i - pi = uF ge | " ) Lamps RK B - EE . 5 EY 4 LF .
: 1 iE i a io a B
wl In 1 = a | n » A=} RL .
: RLiE mn =, I, E
C3 : ES
- en = Si
. 5 i nr
LS 1
# "
»
j
iT
i gi B EY
ls i Jos F of u -
ET mf E.
: aN i ih % | . i )
ol hs IE hE as SRE
. d : a inl po f ih"
- . ake on La op it
Series 1.
Series 2.
Series 3.
Series 4.
Series 10.
Series 11.
Series 12.
Series 20.
Series 21.
Series 22.
OUTLINE OF REPORT SERIES FOR VITAL AND HEALTH STATISTICS
Public Health Service Publication No. 1000
Programs and collection procedures.—Reports which describe the general programs of the National
Center for Health Statistics and its offices and divisions, data collection methods used, definitions, and
other material necessary for understanding the data.
Reports number 1-4
Data evaluation and methods rveseavch.—Studies of new statistical methodology including: experimental
tests of new survey methods, studies of vital statistics collection methods, new analytical techniques,
objective evaluations of reliability of collected data, contributions to statistical theory.
Reports number 1-13
Analytical studies.—Reports presenting analytical or interpretive studies based on vital and health sta-
tistics, carrying the analysis further than the expository types of reports in the other series.
Reports number 1-4
Documents and committee veports.—Final reports of major committees concerned with vital and health
statistics, and documents such as recommended model vital registration laws and revised birth and
death certificates.
Reports number 1 and 2
Data From the Health Interview Suvvey.—Statistics on illness, accidental injuries, disability, use of
hospital, medical, dental, and other services, and other health-related topics, based on data collected in .
a continuing national household interview survey.
Reports number 1-26
Data From the Health Examination Suvvey.—Statistics based on the direct examination, testing, and
measurement of national samples of the population, including the medically defined prevalence of spe-
cific diseases, and distributions of the population with respect to various physical and physiological
measurements.
Reports number 1-12
Data From the Health Records Suvvey.—Statistics from records of hospital discharges and statistics
relating to the health characteristics of persons in institutions, and on hospital, medical, nursing, and
personal care received, based on national samples of establishments providing these services and
samples of the residents or patients. .
Reports number 1-3
Data on mortality. —Various statistics on mortality other than as included in annual or monthly reports—
special analyses by cause of death, age, and other demographic variables, also geographic and time
series analyses.
Reports number 1
Data on natality, marriage, and divovce.—Various statistics on natality, marriage, and divorce other
than as included in annual or monthly reports—special analyses by demographic variables, also geo-
graphic and time series analyses, studies of fertility.
Reports number 1-7
Data From the National Natality and Mortality Surveys. — Statistics on characteristics of births and
deaths not available from the vital records, based on sample surveys stemming from these records,
including such topics as mortality by socioeconomic class, medical experience in the last year of life,
characteristics of pregnancy, etc.
Reports number 1
For alist of titles of reports published in these series, write to: National Center for Health Statistics
U.S. Public Health Service
Washington, D.C. 20201
lh he
NATIONAL x
2 / 2 PE LA
For HEALTH Number 14
STATISTICS
Replication
OW TOE CR
Analysis of Data From
Complex Surveys
\%
U.S. DEPARTMENT OF / EEN,
HEALTH, EDUCATION, AND WELFARE [5 A),
Pes ; CAVA 7 4
Public Health Service eX Ne
Public Health Service Publication No. 1000-Series 2-No. 14
For sale by the Superintendent of Documents, U.S. Government Printing Office
Washington, D.C., 20402 - Price 35 cents
NATIONAL CENTER| Series 2
For HEALTH STATISTICS | Number 14
VITALand HEALTH STATISTICS
DATA EVALUATION AND METHODS RESEARCH
Replication
An Approach to the
Analysis of Data From
Complex Surveys
Development and evaluation of a replication
technique for estimating variance.
Washington, D.C. April 1966
U.S. DEPARTMENT OF
HEALTH, EDUCATION, AND WELFARE Public Health Service
John W. Gardner William H. Stewart
Secretary Surgeon General
NATIONAL CENTER FOR HEALTH STATISTICS
FORREST E. LINDER, Pu. D., Director
THEODORE D. WOOLSEY, Deputy Director
OSWALD K. SAGEN, Pu. D., Assistant Director
WALT R. SIMMONS, M.A., Statistical Advisor
ALICE M. WATERHOUSE, M.D., Medical Advisor
JAMES E. KELLY, D.D.S., Dental Advisor
LOUIS R. STOLCIS, M.A., Executive Officer
OFFICE OF HEALTH STATISTICS ANALYSIS
Iwao M. Moriyama, Pu. D., Chief
DIVISION OF VITAL STATISTICS
‘Rosert D. Grove, Pu. D., Chief
DIVISION OF HEALTH INTERVIEW STATISTICS
Paiute S. LAWRENCE, Sc. D., Chief
DIVISION OF HEALTH RECORDS STATISTICS
Monroe G. SirkEN, Pu. D., Chief
DIVISION OF HEALTH EXAMINATION STATISTICS
ArTHUR J. McDowEgLL, Chief
DIVISION OF DATA PROCESSING
SipNEY BINDER, Chief
Public Health Service Publication No. 1000-Series 2-No. 14
Library of Congress Catalog Card Number 66-60010
PREFACE
The theory of design of surveys has advanced greatly in the past
three decades. One result is that many surveys now rest upon complex
designs involving such factors as stratification and poststratification,
multistage cluster sampling, controlled selection, and ratio, regression,
or composite estimation. Another result is a growing concern and
search for valid and efficient techniques for analysis of the output from
such complex surveys.
A central difficulty is that most of the standard classical techniques
for statistical analysis assume that observations are independent of one
another and are the result of simple random sampling, often from a
universe of normal or other known distribution—a situation that does not
prevail in modern complex design. This report reviews several aspects
of the problem and the limited literature on the topic. It offers a new
method of balanced half-sample pseudoreplication as a solution to
one phase of the problem.
The entire matter of how best to analyze data from complex sur-
veys is nearly as broad as statistical theory itself. It encompasses not
only the technical features of analysis, but also relationships among
purpose and design of the survey, and the character of inferences
which may be drawn about populations other than the finite universe
which was sampled. This report treats only a very small sector of
the subject. but, it is believed, introduces a scheme which may be
widely useful, There is good reason to hope that the method, or
possibly variations of it, may have utility beyond the somewhat narrow
area with which it deals specifically.
The exploration and developments reported here are the outgrowth
of discussions among a number of people, as is nearly always true
when the subject is a pervasive one. But they are particularly the
product of a study by Philip J. McCarthy of Cornell University under
a contractual arrangement with the National Center for Health Sta-
tistics. Contributions of the Center were coordinated by Walt R.
Simmons. Professor McCarthy wrote the report. Garrie J. Losee of
the Center was responsible for initial work on half-sample replication,
as employed in NCHS surveys, and prepared the appendix to this re-
port.
“i na . ty x n Sel uh Heo :
i TA oy ie . a 5 dies = b nt i=g .
sty du
Sv ma os
IRE Ar = yi - Sarpy” J
) J i RINE, SEE SrR ashi -f moog
= yg et ps pt Sl i I re i
i who HN i on il
ii aly gw ge iy
u jv » . fe i j
sire a hf - Fi eit JR A
Ee is J de eR i i
E 7 ' i
Age
x -
Rag
a h
sig ligt ot SE
Ede RE Ji aie
oy wy gad pd . : fe 0 = } {hz . fe i)
|! en : fide
i LATA
E Roan i orl fe ir
:
Sr ; x
rat ys . a Ka
! oo " 1 . h ) Le
A FL IF
CONTENTS
IPCC TAC, mmm sm ms sm ps ss ne er ee em et em em
INTLOAUCTION == =m = =m mm om om am rm er rm te en mm
Complex Sample Surveys and Problems of Critical Analysis----------------
General Approaches for Solving Problems of Critical Analysis--------------
Two Extreme Approaches -- === mmm comme eee eee
Obtaining "Exact" Solutions--=--==-mecmmeommooc meme ae So Ft eR
Replication Methods of Estimating Variances------=-c---ooeommmmmmoooo
Pseudoreplication ====— === mcm meee eee eee
Half-Sample Replication Estimates of Variance From Stratified Samples--
Balanced Half-Sample Replication=======mm mc mmmm moe eee meee
Partially Balanced Half Samples======c=cm mmm
Half-Sample Replication and the Sign Test--===-e-coammmmmmm cee
Jackknife Estimates of Variance From Stratified Samples------eceeeea--
Half-Sample Replication and the Jackknife Method With Stratified
Ratio-Type Estimators-----=--c cme mmm meee
SUMMATY === mmm mmm mm mee eee ee ee em mmm mmm mmm
Bibliography === === comm mm mee ee eee
Appendix. Estimation of Reliability of Findings From the First Cycle of the
Health Examination SUrVey--=-ceom omens meee meee mmee meee mmm mem
Survey Design-=---=-eereem meee mre mem
Requirements of a Variance Estimation Technique--=-===neccmcccmmaaa-
Development of the Replication Technique---=-===-c-comommmmmmmo
COMPULET OULDUL = = = mm mm om me mm em mr 0 mr st me mm on
LUIS LT ALI = evo me mre sen mm io eg es se st sm
24
29
31
33
33
33
34
37
37
A key feature of statistical techniques necessary to the analysis of data
Jrom complex surveys is the method of calculating variance of the sam-
ple estimates. Earlier divect computational procedures are either in-
appropriate or much too difficult, even with high speed electronic com-
puters, to cope with the elaborate stratification, multistage cluster sam-
pling, and intricate estimation schemes found in many current sample
surveys. A different approach is needed.
A number of statisticians have attempted solution through a variety of
schemes which employ some form of replication ov random grouping of
observations. These efforts are recalled in this veport, as a part of the
background review of principal issues present in choice of analytic
methods suitable to the complex survey.
Among the estimating schemes in vecent use is a half-sample pseudo-
replication technique adapted by the National Center fov Health Statis-
tics from an approach developed by the U.S. Bureau of the Census. This
method is described in detail in the report. Typically, it involves sub-
sampling a parent sample in such a way that 20-40 pseudoveplicated es-
timates of any specified statistic are produced, withthe precision of the
corresponding statistic fromthe parent sample being estimated from the
variability among the replicated estimates.
One difficulty in using this method is that the 20-40 estimates are cho-
sen from among the thousands ov millions of possible replicates of the
same character, and hence may yield an unstable estimate of the var-
tability among the possible replicates. The report presents a system for
controlled choice of a limited number of pseudoreplicates— often no
move than 20-40 for a major national suvvey— such that for some classes
of statistics the chosen small number of veplicates has a variance al-
gebraically identical with that of all possible replicates of the same
character within the parent sample, and the same expected value as the
variance of all possible replicates of the same character for all possi-
ble parent samples of the same design. Illustrations of the technique and
guides for its use are included.
SYMBOLS
Data not available--=--eeemmammcccaceem eee ——
Category not applicable--==mmeececcanoam- oa
Quantity Zero=-=-===mmm-mmmmomo meee me cm
Quantity more than 0 but less than 0.05----- 0.0
Figure does not meet standards of
reliability or precision-------=ccmeenuan-
REPLICATION
AN APPROACH TO THE ANALYSIS OF DATA
FROM COMPLEX SURVEYS
Philip J. McCarthy, Ph. D., Cornell University
INTRODUCTION
A considerable body of theory and practice
has been developed relating to the design and
analysis of sample surveys. This material is
available in such books as Cochran (1963),
Deming (1950), Hansen, Hurwitz, and Madow
(1953), Kish (1965), Sukhatme (1954), and Yates
(1960), and in numerous journal articles. Much
of this theory and practice has the following
characteristics: the sampled populations contain
finite numbers of elements; no assumptions are
made concerning the distributions of the pertinent
variables in the population; major emphasis is
placed on the estimation of simple population
parameters such as percentages, means, and
totals; and the samples are assumed to be "large"
so that the sampling distributions of estimates
can be approximated by normal distributions.
Furthermore, it has frequently been appropriate
to regard the principal goal of sample design as
that of achieving a stated degree of precision for
minimum cost, or alternatively, of maximizing
precision for fixed cost.
Sample surveys in which major emphasis is
placed on the estimation of population parameters
such as percentages, means, or totals have been
variously called 'descriptive' or "enumerative'
surveys and, as noted above, the work in finite-
population sampling theory has been primarily
concentrated on the design of such surveys.
Increasingly, however, one finds reference in
the sample survey literature to 'analytical"
surveys or to the use of "analytical statistics."
Cochran (1963, p. 4), for example, says:
In a descriptive survey the objective is
simply to obtain certain information about
large groups: for example, the numbers of
men, women, and children who view a tele-
vision program. In an analytical survey,
comparisons are made between different sub-
groups of the population, in order to dis-
cover whether differences exist among them
that may enable us to form or to verify
hypotheses about the forces at work in the
population. . . . The distinction between de-
scriptive and analytical surveys is not, of
course, clear-cut. Many surveys provide
data that serve both purposes.
Although there are some differences in emphasis,
Deming (1950, chap. 7), Hartley (1959), and Yates
(1960, p. 297) make essentially the same dis-
tinction. Kish (1957, 1965) more or less auto-
matically assumes that data derived from most
complex sample surveys will be subjected to
some type of detailed analysis, and applies the
term "analytical statistical methods'' to proce-
dures that go much beyond the mere estimation
of population percentages, means, and totals.
The Health Examination Survey (HES) of the
National Center for Health Statistics (NCHS) is
one example of a sample survey that, in some
respects, might be classified as an enumerative
survey, but whose principal value will undoubtedly
be in providing data for analytical purposes. Some
of the basic features of Cycle I of HES are pre-
sented in a publication of the National Center for
Health Statistics (Series 11, No. 1). A brief de-
scription of the survey, quoted from the report,
is as follows:
The first cycle of the Health Examination
Survey was the examination of a sample of
adults. It was directed toward the collection
of statistics on the medically defined prev-
alence of certain chronic diseases and of a
particular set of dental findings and physical
and physiological measurements. The prob-
ability sample consisted of 7,710 of all non-
institutional, civilian adults in the age range
18-79 years in the United States. Altogether,
6,672 persons were examined during the
period of the Survey which began in October
1959 and was completed in December 1962.
A rather detailed account of the survey designhas
been published by the Center (Series 1, No. 4).
The enumerative and analytical aspects of
this survey, and the inevitable blending of one
into the other, are well illustrated in two reports
that have been published on the blood pressure
of adults (NCHS, Series 11, Nos. 4 and 5). Not
only does one find in these reports the distribution
of blood pressure readings for the entire sample,
but one also finds the comparison (with respect
to blood pressure) of subgroups of the population
defined by a variety of combinations of such
demographic variables as age, sex, arm girth,
race, area of the United States, and size of
place of residence. It seems unnecessary to
argue where the enumerative aspects end and
the analytical aspects begin. For all practical
purposes, and by any definition one chooses to
adopt, the survey is analytical in character. The
same will be true of almost any sample survey
that one examines, at least as far as many users
of the data are concerned. Certainly this is the
view of the staff at NCHS.
The principal goal of this report will be to
examine some of the problems that arise when
data from a complex sample survey operation
are subjected to detailed and critical analysis,
and to discuss some of the procedures that have
been suggested for dealing with these problems.
Particular emphasis will be placed on a pro-
cedure for estimating variances which is especial-
ly suitable for sample designs similar to those
used in the Health Interview Survey and the Health
Examination Survey.
COMPLEX SAMPLE SURVEYS AND
PROBLEMS OF CRITICAL ANALYSIS
Simple random sampling, usually without
replacement, provides the base upon which the
presently existing body of sample survey theory
has been constructed. Major modifications of
random sampling have been dictated by one or
both of two considerations. These are as follows.
(1) One rarely attempts to survey a finite
population without having some prior knowledge
concerning either individual elements in the pop-
ulation, or subgroups of population elements, or
the population as an entity. This prior information,
depending upon its nature, can be used in the
sample design or in the method of estimation to
increase the precision of estimates over that
which would be achieved by simple random sam-
pling. Thus we have such techniques as stratifica-
tion, stratification after the selection of the
sample (poststratification), selection with prob-
abilities proportional to the value of some auxil-
iary variable, ratio estimation, and regression
estimation.
(2) Many finite populations chosen for survey
study are characterized by one or both of the
following two circumstances: the ultimate popula-
tion elements are dispersed over a wide geograph-
ic area, and groups or ''clusters'' of elements can
be readily identified in advance of taking the sur-
vey, whereas the identification of individual pop-
ulation elements would be much more costly.
These circumstances have led to the use of multi-
stage sampling procedures, where one first
selects a sample of clusters and then selects a
sample of elements from within each of the chosen
clusters.
In addition to these two main streams of
development, whose results are frequently com-
bined in any one survey undertaking, there is a
wide variety of related and special techniques
from which choices can be made for sample
design and for estimation. Thus one can use
systematic sampling, rotation sampling (in which
a population is sampled over time with some
sample elements remaining constant from time
to time), two-phase sampling (in which the results
of a preliminary sample are used to improve
design or estimation for a second sample), un-
biased ratio estimators instead of ordinary ratio
estimators, and so on. Finally, it is necessary
to recognize that measurements may not be ob-
tained from all elements that should have been
included in a sample, and that such nonresponse
may influence the estimation procedure and the
interpretation of results.
As sample design and estimation move from
simple random sampling and the straightforward
estimation of population means, percentages, or
totals to a stratified, multistage design with ratio
or regression forms of estimation, it becomes
increasingly difficult to operate in an 'ideal"
manner even for the purest of enumerative sur-
veys. Ideally, one would like to be assured that
the "best" possible estimate has been obtained
for the given expenditure of funds, that the bias
of the estimate is either negligible or measur-
able, and that the precision of the estimate has
been appropriately evaluated on the basis of the
sample selected. Numerous difficulties are en-
countered in achieving this goal. Among these
are: (1) the expressions that must be evaluated
from sample data become exceedingly complex,
(2) in many instances, these expressions are only
approximate in that their validity depends upon
having ''large'' samples, and (3) most surveys
provide estimates for many variables—that is,
they are multivariate in character—and this, in
conjunction with the first point, implies an ex-
tremely large volume of computations, even for
modern electronic computing equipment. This
last point is accentuated when one wishes to study
the relationships among many variables innumer-
ous subpopulations. The foregoing difficulties are
well illustrated by the Health Interview Survey
(No. A-2), the Health Examination Survey (Series
1, No. 4), and the Current Population Survey
(Technical Paper No. 7). The sample designs
and estimation techniques for these three sur-
veys are somewhat similar, although the Current
Population Survey employs a composite estimation
technique (made possible by the rotation of sample
elements) that is not employed in the other two
surveys.
We have at various points in the preceding
discussion used the term ''complex' sample
surveys, implying thereby that the sample design
is in some sense or other complex. Little is to
be gained by arguing the distinction between sim-
ple or complex under these circumstances, al-
though several observations are perhaps inorder.
We are, of course, primarily concerned with
the complexities of analysis that result from the
use of a particular sample design and estimation
procedure. These complexities arise from various
combinations of such factors as the following.
The assumption of a functional form for the dis-
tribution of a random variable over a finite pop-
ulation is rarely feasible and thus analytical
power for devising statistical procedures is lost.
The selection of elements without replacement,
or in clusters, introduces dependence among ob-
servations. Estimators are usually nonlinear and
we are forced to use approximate procedures for
evaluating their characteristics. Some design
techniques that are known to increase the pre-
cision of estimates almost invariably lead to the
negation of assumptions required by such common
statistical procedures as the analysis of variance
(e.g., strata having unequal within-variances).
Further comments on these points will be made
later in this report.
New dimensions of complexity, both concep-
tual and technical, arise as one progresses from
a truly enumerative survey situation to a purely
analytical survey. Each of these will now be
discussed briefly.
On the conceptual side, the major question
concerns the manner in which one chooses to
view a finite population—either as a fixed set of
elements for which a statistical description is
desired, or as a sample from an infinite super-
population to which inferences are to be made.
In simplest terms, this can be viewed as follows.
An infinite superpopulation, characterized by ran-
dom variable y with mean
E(y) =u, and with variance
E(y—-wl=o0?,
is assumed as a basis for the sampling process.
N independent observations on y lead to a finite
population with mean
N =
A/MZ , =¥, and with variance
|=
N oe
a/N-1 2 (y; -7)?=5?%, while a
simple random sample, drawn without replace-
ment, from the finite population has the observed
mean
n
(1/n) 2 y;=¥% and variance
i=l
(/n-1) 2 (3-9)? = ot,
i=
Ordinary sampling theory assumes that we wish
to describe the realized finite population of N ele-
ments, and we have
E(y|fixed N values of y) = ¥
2
V(y|tixed nN values of y) = N=n S
N' n
a 2
V (y|fixed N values of y) = A 2
where the symbol A indicates an estimator of a
population parameter. If, however, we wish to
draw inferences for the infinite superpopulation
from our observed sample, and therefore take
expectations over an infinite set of finite popula-
tions of Nv elements, then itis straightforward to
demonstrate that
EF) =u
V(y)=o%n
A
V(y)=s%n
In effect, the only formal difference in the two
views is that the finite population correction is
omitted in the variance of y and in the estimate
of the variance of y. This point has been made
by Deming (1950, p. 251) and Cochran (1963,
p. 37). Cochran says, in reference to the com-
parison of two subpopulation means:
One point should be noted. It is seldom of
scientific interest to ask whether ¥; = ¥, be-
cause these means would not be exactly equal
in a finite population, except by a rare chance,
even if the data in both domains were drawn
at random from the same infinite population.
Instead, we test the null hypothesis that the
two domains were drawn from infinite pop-
ulations having the same mean. Consequently
we omit the fpc when computing V(y;) and
Vides
Actually, it can also be argued that one would
rarely expect to find two infinite populations
with identical means. Careful accounts of statis-
tical inference sometimes emphasize this fact
by distinguishing between "statistically significant
difference’ and 'practically significant differ-
ence," and by pointing out that null hypotheses
are probably never "exactly" true.
In practice, the survey sampler is ordinarily
in a position to control only the inference from
the sample to the finite population. He may know
that the finite population is indeed a sample,
drawn in some completely unknown fashion from
an infinite superpopulation, but when he tries to
specify this superpopulation, his definition will
ordinarily be blurred and indistinct. Professional
knowledge and judgment will therefore play a
major role in such further inferences. Further-
more, comparisons with other studies, com-
parisons among subgroups inhis finite population,
and a consideration of related data must be brought
to bear on the problem. There seems to be little
that one can say in a definite way at present about
this general problem, but very perceptive com-
ments on this subject have been made by Deming
and Stephan (1941) and by Cornfield and Tukey
(1956, sec. 5). Even the answer to the specific
question of whether or not to use finite popula-
tion corrections in the comparison of domain
means would appear to depend upon the circum-
stances.
The technical problems raised by the ana-
lytical use of data from complex surveys differ
in degree but not really in kind from those faced
in the consideration of enumerative survey data.
These problems are primarily of two types:
1. As indicated earlier, most analytical uses
of survey data involve the comparison of sub-
groups of the finite population from which the
sample is selected. These subgroups have been
frequently referred to as ''domains of study."
The basic difficulty raised by this fact is that
various sample sizes, which in ordinary sam-
pling theory would be regarded as fixed from
sample to sample, now become random variables.
Furthermore, this occurs in such a manner that
it is usually not possible to use a conditional
argument—that is, it is not possible to consider
the drawing of repeated samples in which the
various sample sizes are viewed as being equal
to the size actually observed—as can be done
when estimating the mean of a domain on the
basis of simple random sampling.
2. In making critical analyses of survey
data, one is much more apt to use statistical
techniques that go beyond the mere estimation
of population means, percentages, and totals
(e.g., multiple regression). Ordinary survey
theory has attacked the problem of providing
estimates of sampling error for certain esti-
mates, e.g., ratio and regression estimates, but
the body of available theory leaves much to be
desired. An example of each of these problems
will now be described briefly.
One of the most frequently used techniques
from sampling theory is that of stratification.
A population is divided into L mutually exclusive
and exhaustive strata containing N,.N,,. .., N|
elements; random samples of predetermined
size n, n,,..., n are drawn from the respec-
tive strata; a value of the variable y is obtained
for each of the sample elements; and the popula-
tion mean is estimated by
i>
:
= EV
where y, is the mean of the n, elements drawn
from the hth stratum, N is the total size of the
population, and W,=N,/N. It is easily shown
that an unbiased estimate of the variance of ¥
is given by
AD L., 2
VY) =ZW Q-D sin,
where s? is the variance for the variable y as
estimated in the hth stratum, and f, =n,/lV, is
the sampling fraction in the hth stratum. For
purposes of illustration, let us assume that the
strata are geographic areas of the United States,
that ny, NVy,and N refer to all adults (18 years of
age and over) and that the variable y is blood
pressure,
Suppose now that one wishes to estimate
the average blood pressure for males in the 40-
45 year age range with arm girth between 38 and
40 centimeters. This special group of adults is a
subpopulation, or domain of study, with reference
to the total finite population, and elements of the
domain will be found in each of the defined strata.
The weights for the strata and the fixed sample
sizes do not refer to this subpopulation and, over
repeated drawings of the main sample, the number
of domain elements drawn from a stratum will
be a random variable. Furthermore, the total
number of domain elements in a stratum is un-
known. Under these circumstances, let
n, 4= the number of sample elements in
the hth stratum falling in domain d.
yi ¢= the value of the variable for the ith
sample element from domain d inthe
hth stratum.
Nk :
N, =z (/f) ny 4= the estimated total
- number of elements
in the domain.
hd
Fna=C/my 4) Z yp; q =sample mean,
i 2
for Ath stratum,
of elements fall-
ing in domain d.
Then an estimate of the domain mean and its es-
timated variance are given by
A > 1/£,) Z yhid
A h 1
Vy ——r
2 2 (1/£p) npg
AD 2(1—
{sl pg B05 fn)
Ni h ny(ny=1)
[2 Whi, a ~Vna)
.n A
+ny4(1- RL 7]
where h=1,2,...,L and i=12,..., ny 4.
These expressions have been presented and dis-
cussed by a number of authors— Durbin (1958, p.
117), Hartley (1959, p. 15), Yates (1960, p. 202),
Kish (1961, p. 383), and Cochran (1963, p. 149.
The factor (1/N?) was evidently omitted in the
printing of this formula ).
Three points concerning these results are
worthy of note in the context of the present
discussion. These are:
1. The estimate is actually a ratio estimate—
technically, a combined ratio estimate. Itis there-
fore almost always biased for small sample sizes,
and the variance formula is only approximately
correct,
2. The complexity of the formulas, as re-
gards derivation and computation, has been in-
creased considerably over that of ordinary strati-
fication.
3. The variance has a between-strata com-
ponent as well as a within-strata component and,
if n, 4 is small as compared with n,, this be-
tween-strata component can contribute substan-
tially to the variance. Thus we see that changing
emphasis from the total population to a sub-
population has introduced added complexities in
theory and computations.
As regards the use of more advanced sta-
tistical techniques in the critical analysis of
survey data, we shall simply refer to the diffi-
culties that have been encountered in obtaining
exact theory when ordinary regression techniques
are applied to random samples drawn from a
finite population. Cochran (1963, p. 193) summa-
rizes this very well:
The theory of linear regression plays a
prominent part in statistical methodology.
The standard results of this theory are not
entirely suitable for sample surveys because
they require the assumptions that the popula-
tion regression of y on x is linear, that the
residual variance of y about the regression
line is constant, and that the population is
infinite. If the first two assumptions are
violently wrong, a linear regression esti-
mate will probably not be used. However, in
surveys in which the regression of y on x
is thought to be approximately linear, it is
helpful to be able to use y, without having
to assume exact linearity or constant residual
variance.
Consequently we present an approach
that does not demand that the regression in
the population be linear. The results hold
only in large samples. They are analogous
to the large-sample theory for the ratio es-
timate.
Somewhat the same point is made by Hartley
(1959, p. 24) in his paper on analyses for do-
mains of study. He says:
. . nevertheless we shall nof employ regres-
sion estimators. The reason for this is not
that we consider regression theory inappro-
priate, but that this theory for finite popula-
tions requires considerable development be-
fore it can be applied inthe present situation,
Some developments have arisen since Hartley's
paper and reference to these will be given in the
next section,
As a final complicating factor, we note that
certain techniques used in some sample survey
designs are such that their effects on the pre-
cision of estimates cannot be evaluated from a
sample, even in the case of an enumerative sur-
vey. We refer specifically to the technique of
controlled selection, described by Goodman and
Kish (1950), and to instances in which only one
first-stage sampling unit is selected from each
of a set of strata (Cochran, 1963, p. 141).
In order to provide a convenient illustration
of some of the foregoing points, the appendix
presents a brief description of the ''complex'
sample design and estimation procedure employ-
ed in the Health Examination Survey, together
with a selection of examples that arose in the
more or less routine analysis of information
collected on blood pressure. The data given are
estimates of the percentage of individuals with
hypertension in various subclasses of the popu-
lation of adults, with the subclasses defined in
terms of such demographic variables as race,
sex, age, family income, education, occupation,
and industry of employment. These subclasses
cut across the strata used in the selection of
primary sampling units, and the variances of the
estimates are also affected by the various clus-
tering and estimation features of the design.
Most of the cited cases refer to the estimation
of variance for the percentage of hypertension
in a single subclass, although several examples
are given in which the percentages in two sub-
classes are compared. The variances were es-
timated by a replication technique that will be
introduced in ''Balanced Half-Sample Replica-
tion," a technique that to some extent overcomes
the problems of analysis that have just been raised.
The results obtained through the application of
this technique will be used for illustration at
several points throughout this report.
GENERAL APPROACHES FOR
SOLVING PROBLEMS OF
CRITICAL ANALYSIS
Two Extreme Approaches
It is possible to identify two extreme views
that one may hold with respect to the problems
raised in the preceding section. First of all,
one might conceivably argue that analytical work
with survey data should be done only ''by design."
That is, areas and methods of analysis should
be set forth in advance of taking the survey and
the sample should be selected so as to conform
as closely as possible to the requirements of the
stated methods. On the other extreme, one might
decide to throw up his hands in dismay, ignore
all the complicating factors of an already executed
survey design, and treat the observations as
though they had been obtained by random sampling,
presumably from some extremely ill-defined su-
per population.
The first approach, that of "design for a-
nalysis," is certainly the most rational view that
one can adopt. No careful examination of the
literature was made to search out actual ex-
periences on this point, but it would appear un-
likely that one could find any examples of large-
scale, complex, and multipurpose surveys in
which this approach had been attempted. A pos-
sible exception might be the Census Enumerator
Variation Study of the 1950 U.S. Census, as de-
scribed by Hanson and Marks (1958), although
this study was based primarily on the complete
enumeration of designated areas rather than on
a sample of individuals selected in accordance
with a complex sample design. In other cases
individuals have been randomly selected from a
defined population of adults to provide observa-
tions for a complex "experimental design," as
in the Durbin and Stuart (1951) experiment on
response rates of experienced and inexperienced
interviewers, but again this differs considerably
from the type of problem raised in the preceding
section. Another example in which an experi-
mental design has been applied to survey data is
provided by Keyfitz (1953) and discussed by
Yates (1960, pp. 308-314). In this case, the
sample elements were obtained by cluster sam-
pling, but the author investigates the possible
effects of the clustering and concludes that it
can be ignored in the analysis of variance.
Some recent work by Sedransk (1964, 1964a,
1964b) bears directly on the problem of design
for analysis and it assumes that the primary goal
of an analytical survey is to compare the means
of different domains of study. If ¥; and ¥, are
the estimated means for the sth and ;th do-
mains, Sedransk places constraints on the vari-
ance of their difference, for all ; and ;, and
searches for sample-size allocations that will
minimize simple cost functions. A variety of
different situations are considered. Random sam-
ples can be selected from each of the domains;
random samples can be selected from the over-
all population, but the number of elements falling
in each domain then becomes a random variable;
two-stage cluster samples can be selected from
each of the domains; and two-stage cluster
samples can be selected from the total population,
but again the number of elements falling in each
domain is a random variable, In the second and
fourth cases, the author considers double sampling
procedures and obtains approximate solutions to
guide one in choosing sample sizes for sampling
from the total population so as to satisfy the con-
straints which are phrased in terms of all pos-
sible pair-wise domain comparisons. Even if
one does not wish to impose constraints on
domain comparisons and on minimization of cost,
the cited papers contain of necessity many de-
velopments in theory that will be of assistance
in attacking the problems raised in the preceding
section. It should be observed that the com-
plexity of the designs considered is still far
from that of the Current Population Survey or
the Health Examination Survey.
Major difficulties in designing for analysis
are and will continue to be encountered when the
primary goal of a survey is to describe a large
and dispersed population with respect to many
variables, as the analytical purposes are some-
what ill defined at the design stage. Thus the
broad primary purposes of HES were to provide
statistics on the medically defined prevalence in
the total U.S. population of a variety of specific
diseases, using standardized diagnostic criteria;
and to secure distributions of the general popula-
tion with respect to certain physical and physio-
logical measurements. Nevertheless, analysis of
relationships among variables is also an impor-
tant product of the survey.
A similar set of circumstances arises with
respect to data on unemployment collected by the
Current Population Survey. Clearly the primary
goal is to describe the incidence of employment
and unemployment in the total U.S. population,
and yet the data obtained must also be used for
comparison and analysis. Faced with difficulties
of analysis, as described in the preceding sections,
one may wish to retreat to the opposite extreme
from design for analysis and view the observa-
tions as coming from a simple random sample.
Actually, this type of retreat would appear
to place the analyst in a difficult, if not un-
tenable, position. Cornfield and Tukey (1956)
speak of an inference from observations to
conclusions as being composed of two parts,
where the first part is a statistical bridge from
observations to an island (the island being the
studied population) and the second part is a
subject-matter span from the island to the far
shore (this being, in some vague sense, a pop-
ulation of populations obtained by changes in
time, space, or other dimensions). The first
bridge is the one that can be controlled by the
use of proper procedures of sampling and of
statistical inference. One may be willing to in-
troduce some uncertainty into the position of the
island, for example by ignoring finite population
corrections, in the hope of placing it nearer the
far shore than would otherwise be the case. How-
ever, there seem to be no grounds for suggesting
statistical procedures that may, unbeknownst to
their user, succeed only in moving the island a
short distance from the near shore.
The Health Examination Survey was carried
out on a sample chosen to be broadly ''repre-
sentative" of the total U.S. population. Among
other characteristics of design, the sample was
of necessity a highly clustered one. As is well
known, a highly clustered sample leads to es-
timates that have much larger standard errors
than would be predicted on the basis of simple
random sampling theory if the elements within
clusters tend to be homogeneous with respect
to the variables of interest. Since geographic
clustering leads to homogeneity on such char-
acteristics as racial background, socioeconomic
status, food habits, availability and use of medi-
cal care, and the like, it can therefore be ex-
pected that there will also be homogeneity with
respect to many of the variables of interest in
the Health Examination Survey. A portion of
this loss of precision is undoubtedly recovered
by stratification and poststratification, but there
is no guarantee that the two effects will balance
one another. Hence the ignoring of sample design
features might well lead to gross errors in
determining the magnitude of standard errors
of survey estimates. In effect, the situation might
be viewed as one in which inferences are being
made to some ill-defined population of adults,
rather than to the population from which the sam-
ple was so carefully chosen. These points have
been emphasized by Kish (1957, 1959) as they
relate to social surveys in general.
The effect of these sample design and es-
timation features on the variances of estimates
for the Health Examination Survey are illus-
trated in the data presented in the appendix. For
each of 30 designated subclasses, the variance
of the estimated percentage of adults in a sub-
class with hypertension was estimated by two
methods: (1) the replication technique to be in-
troduced in ''Balanced Half-Sample Replication"
was employed, thus accounting for most of the
survey features, and (2) the observations falling
in a subclass were treated as if they had arisen
from simple random sampling. In the second
case, variances were computed as pg/n, Where
p was the observed fraction of hypertensive
individuals among the n sample individuals falling
in a particular subclass. The ratios of the first
of these variances to the second, for the 30
comparisons, ranged from .45 to 2.87, with an
average value of 1.31; the ratios of the standard
errors ranged from .67 to 1.69, with an average
value of 1.12. These ratios probably overes-
timate slightly the true ratios, since the rep-
lication technique uses the method of collapsed
strata and in this instance does not account for
the effects of controlled selection. Furthermore,
they are subject to sampling variability.
Also included in the appendix are three
examples which refer to the estimated difference
between the percentages of hypertensive individ-
uals in each of two subclasses. In this instance,
the average ratio of variances is 1.51 while the
average ratio of standard errors is 1.23. These
comparisons are not as ''clean" as the ones for
single subclasses since the random sampling
variances were computed on the assumption of
independence of the two estimates, and this is
not necessarily the case. This set ofdata, limited
though it may be, tends to confirm the general
experience that estimates made from stratified
cluster samples will tend to have larger sampling
variances than would be the case for simple ran-
dom samples of the same size, although the
differences are not so pronounced as in situations
in which the intraclass correlation is stronger
than it is for this statistic.
Obtaining “Exact” Solutions
If one wishes to consider that the principal
goal of analytical surveys is either the estimation
or the direct comparison of the means of various
domains of study, then there already exists in
the literature a number of results that can assist
in achieving this goal. This '"'exact' theory can
be generally characterized as follows.
1. Ratio estimates of population means are
employed, primarily because sample size is a
random variable as a result of sampling clusters
with unequal and unknown sizes. This of course
introduces the possibility of bias for these es-
timates, although empirical research—e.g., Kish,
Namboodiri, and Pillai (1962)—indicates that the
amount of bias is apt to be negligible.
2. Expressions for the variance of a single
estimate and the covariances of two or more
estimates are obtained from the Taylor series
approximation, and variance estimates are con-
structed by direct substitution into these ex-
pressions. Hence variance estimates are subject
to possible bias,
3. In multistage sampling, it is either as-
sumed that the first-stage units are drawn with
replacement or that the first-stage sampling
fractions are very small. This means that vari-
ance estimates can be obtained without explicitly
treating within-first-stage unit sampling variabil-
ity.
4, The most powerful tool for deriving re-
sults for domain-of-study estimates has been that
of the ""pseudovariable' and the "count variable."
That is
¥hij =nij» if the jth element in the
ith first-stage unit of
the Ath stratum belongs
to the domain of interest
= 0, otherwise
uy; =1,if the jth element in the
ith first-stage unit of the
hth stratum belongs to the
domain of interest
=0,otherwise
Using this approach, which is related to Corn-
field's (1944) earlier work, it is possible to
specialize ordinary results to domain-of-study
results.
A very brief summary of some of the litera-
ture on these aspects of analysis is as follows,
where no attempt has been made to assign
priorities to the various authors. Results thatare
directly phrased in terms of domains of study
are given by Cochran (1963), Durbin (1958),
Hartley (1959), Kish (1961, 1965), and Yates
(1960). Related work on the estimation of the
variance of a variety of functions of ratio esti-
mators is presented by Jones (1956), Keyfitz
(1957), Kish (1962), and Kish and Hess (1959).
Aoyama (1955), Garza (1961), and Okamoto (1963)
discuss chi-square contingency-table analyses
in the presence of stratification. McCarthy (1965)
considers the problem of determining distribu-
tion-free confidence intervals for a population
median on the basis of a stratified sample.
Finally, we observe that, in the case of ''small"
samples, not even the ordinary normality as-
sumptions are able to do away with difficulties,
even without domain-of-study complications. Thus
unequal strata variances lead to difficulties in
obtaining tests and confidence intervals for a
population mean, although some approximate
solutions are available—e.g., Aspin (1949), Meier
(1953), Satterthwaite (1946), and Welch (1947).
If one has essentially unrecognized stratification—
that is, normal variables with common variance
but differing means—then it is necessary to
work with noncentral ¢, x%, and F distributions as
described by Weibull (1953).
Replication Methods of
Estimating Variances
As a result of the indicated theoretical and
practical difficulties associated with the esti-
mation of variances from complex sample sur-
veys, interest has long been evidenced in de-
veloping shortcut methods for obtaining these
estimates. For example, we noted earlier that
Keyfitz (1957) and Kish and Hess (1959) have
emphasized the computational simplicity that
can result when primary sampling units are drawn
with replacement from each of a number of strata,
and when one can work with variate values as-
sociated with the primary sampling units. There
are, however, other approaches that have been
suggested and applied to accomplish these same
ends. We refer in particular to methods that
have variously been referred to as interpene-
trating samples, duplicated samples, replicated
samples, or random groups. In the succeeding
discussion, the term 'replicated sampling'' will
be used to cover all of these possibilities. Ref-
erences will be made to the pertinent literature
but no attempt will be made to assign priorities
or to be exhaustive. Deming has been a consistent
and firm advocate of replicated sampling. He
first wrote of it as the Tukey plan (1950); his
recent book (1960) presents descriptions of the
applications of replicated sampling to many dif-
ferent situations, and contains a wide variety
of ingenious devices that he has developed for
solving particular problems.
In simplest form, replicated sampling is
as follows. Suppose one obtains a simple random
sample of n observations—drawn with replace-
ment from a finite population or drawn independ-
3
10
ently from an infinite population—and that the
associated values of the variable of interest are
Yi» Yr +--+» ¥n- Then, if y denotes the sample
mean and Y the population mean, E(y)=Y and
nm=2 ~rl-1)
v, (7) provides an unbiased estimate of V(y).
Suppose now that n observations are randomly
divided into t+ mutually exclusive and exhaustive
groups, each containing (n/t) elements, and that
the means of these groups are denoted by
V15.Vps ¢ oa Fo ot d8 clear that
t
F=2 yi/t
j=1
and that the variance of y¥ can be estimated by
A {= lewd
Vi (3) =z F-7)7t(t-1)
j=
In this simple case, the advantages gained by
using Vi (y) rather than V, (y)lie in the fact that
one has to compute the sum of t squared de-
viations instead of the sum of n squared de-
viations, If t is considerably smaller than n and
if such computations must be carried out for
many variables, the savings in computational
time may be substantial. Also, the kurtosis of
the distribution of the y; is less than that of
yi,» possibly offsetting some of the effect of
having fewer degrees of freedom to estimate V (3).
There is, of course, a loss of information
associated with the subsample approach for es-
timating variances since V, (3) is subject to
greater sampling variability than is V, (3). A
variety of ways have been suggested for measur -
ing this loss of information. Hansen, Hurwitz,
and Madow (1953, vol. I, pp. 438-449), who des-
ignate this the random group method of esti-
mating variances, make the comparison in terms
of the relative-variance of the variance estimate.
For example, they show by way of illustration
that the relative-variance of a variance estimate
based on 1,200 observations drawn from a nor-
mal distribution is 4.1 percent while the relative-
variance based on a sample of 60 random groups
of 20 observations each is 18.3 percent. Actually,
this approach places emphasis on the variance
estimate itself rather than on the fact that one
usually wants to use the variance estimate in
setting confidence limits for a population mean
or in testing hypotheses about a population
mean. Under these circumstances, Fisher (1942,
sec. 74) suggests a measure for the amount of
information that a sample mean provides re-
specting a population mean. Since his approach
has been questioned by numerous authors—e.g.,
Bartlett (1936)—we shall simply adopt the ex-
pedient of taking the ratio of, say, the 97.5
percentiles of Student's t distribution for 1,200
and 60 degrees of freedom, which is .981, and
interpreting this as a measure of the relative
width of the desired confidence intervals, This
is evidently the approach used by Lahiri (1954,
p- 307).
The foregoing is, of course, the familiar
argument that a sample of roughly 30 or more
is a 'large' sample when dealing with normal
populations, since ty, for 30 degrees of free-
dom is 2.042 and for a normal distribution is
1.960. Ninety-five percent confidence intervals
will, on the average, be only about four percent
wider when s? is estimated with 30 degrees of
freedom than when ¢’ is known. A slightly dif-
ferent measure has been proposed by Walsh
(1949), formulated in terms of the power of a
t-test,
If one wishes to use replicated sampling in
conjunction with drawing without replacement
from a finite population, then two different pos-
sibilities arise. One can first draw without re-
placement a sample of (n/t) elements, then re-
place these elements in the population and draw
a second sample of (n/f) elements, and continue
this process until + samples have been selected.
Denoting the sample means by 7, ¥,,..., 7 ,
we have
t
¥=2 yi /t
N= 2 G9 =D)
and
A - 2
gl) - R=n pa on
_N-(ai) S*
N n
This type of replication makes the successive
samples independent of one another, but it does
permit the possible duplication of elements in
successive samples and hence lowers the pre-
cision of y as compared with the original draw-
ing of a sample of n elements without replace-
ment. However, there may be a saving inthe cost
of measuring the duplicated items. Hence a
slightly larger sample could be drawn for the
same total cost, recapturing some of the loss of
precision,
Finally, one can draw a sample of (n/t)
elements without replacement, a second sample
without replacing the first sample, and so on.
This is, of course, equivalent to drawing a sam-
ple of n elements without replacement and then
randomly dividing the sample into t groups. It
follows that
=v
y-=
t
: 2
Zz (7,-)
v FT _ -n J=I
R= TD
and
A. _v_ N-n s’
dhol“ %
This latter variance is smaller than the preceding
one because of the difference in finite population
corrections. Note, however, thatthe independence
achieved by the first method of drawing makes
it easy to apply nonparametric methods (e.g.,
the use of order statistics) for estimation and
hypothesis testing. These points have been dis-
cussed, in a more general framework, by Koop
(1960) and by Lahiri (1954).
Although replicated sampling for simple
random sampling, as described in the preceding
paragraphs, does provide the possibility of a-
chieving some gains in terms of computational
effort, the principal advantages of replication
arise from other facets of the variance estimation
problem. Some of these facets may be identified
as follows.
1. There are instances of sample designs
in which no estimate of sampling precision can
be obtained from a single sample unless certain
assumptions are made concerning the population.
Systematic sampling is a case in point. (See
11
Cochran, 1963, pp. 225,226.) If the total sample is
obtained as the combination of a number of rep-
licated systematic samples, then one can obtain
a valid estimate of sampling precision. This
approach was suggested by Madow and Madow
(1944, pp. 8,9) and has been discussed at greater
length and with a number of variations by Jones
(1956). In some instances, estimates made from
replicated systematic samples may be less ef-
ficient than from a single systematic sample, and
then one must choose between loss of efficiency
and ease of variance estimation, as discussed by
Gautschi (1957).
2. As is well known, the ordinary Taylor
series approximation for obtaining the variance
and the estimated variance of a ratio estimate,
even for simple random sampling, provides a
possibly biased estimate of sampling precision.
As an alternative, one can consider drawing a
number of independent random samples, com-
puting a ratio estimate for each sample, and
then averaging these ratio estimates for the final
estimate. A valid estimate of sampling precision
can then be obtained from the replicated values
of the estimate. It is true, however, that the
bias of the estimator itself is undoubtedly larger
for the average of the separate estimates than
it is for a ratio estimate computed for the com-
plete sample since this bias is ordinarily a de-
creasing function of sample size. Thus gains
may be achieved in one respect, while losses
may be increased in the other. As far as the
author knows, no completed research is available
to guide one in making a choice between these two
specific alternatives. This problem is, however,
related to some suggestions and work by Mickey,
Quenouille, Tukey, and others, and their results
will be discussed in some detail in the following
section,
3. After an estimate and an estimated var-
iance have been obtained, confidence intervals
are ordinarily set by appealing to large sample
normality and to the approximate validity of
Student's t distribution. Replication can some-
times assist in providing ''better' solutions.
For example, consider a stratified population in
which the variable of interest has a normal dis-
tribution within each stratum, but where the
variance is different for the separate strata.
Difficulty is then encountered in applying the chi-
12
square distribution to the ordinary estimate of
variance, as discussed by Satterthwaite (1946),
Welch (1947), Aspin (1949), and Meier (1953).
However, the mean of a replicate will be nor-
mally distributed, being a linear combination of
normally distributed variables, and the chi-
square distribution can be applied directly to a
variance estimated from the means of a number
of independent replicates. This aspect of the
problem has been discussed at some length by
Lahiri (1954, p. 309).
4. If one is using a highly complex sample
design and estimation procedure, and if independ-
ent replicates can be obtained, then replicated
sampling permits one to bypass the extremely
complicated variance estimation formulas and
the attendant heavy programming burdens. Vari-
ance estimates based upon the replicated esti-
mates will mirror the effects of all aspects of
sampling and estimation that are permitted to
vary randomly from replicate to replicate. This,
of course, includes the troublesome domain-of-
study type of problem.
One major disadvantage of replicated sam-
pling has been mentioned in the preceding para-
graphs, namely that the variance estimate re-
fers to the average of replicate estimates rather
than to an estimate prepared for the entire sam-
ple. If the estimates are linear in the individual
observations, the two will be the same. They will
not be the same, however, for the frequently em-
ployed ratio estimator and the other nonlinear
estimators, and the average of the replicate es-
timators may possibly be subject to greater bias
than is the case for the overall sample estimate.
Another major disadvantage arises from the
difficulty of obtaining a sufficient number of
replicates to provide adequate stability for the
estimated variance. Thus the commonly used
design of two primary sampling units per stratum
(frequently obtained by collapsing strata from each
of which only a single unit has been drawn) gives
only two independent replicates, and the resulting
confidence intervals for an estimate are much
wider than they should be or need to be. Some
suggestions have been made for attacking this
problem, and they will be discussed in the follow-
ing section.
Another, but subsidiary, problem arises
with replication if one wishes to estimate com-
ponents of variance—that is, to determine what
fraction of the total variance of an estimate
arises from the sampling of primary sampling
units, what fraction arises from sampling within
primary sampling units, and the like. This prob-
lem does not appear to have been discussed at
any great length in the sampling literature and
will not be considered here since it bears more
directly on design than on analysis. Some of
Sedransk's work (1964, 1964a, and 1964b) does
relate to the problem, and McCarthy (1961) has
discussed the matter in connection with sampling
for the construction of price indexes.
PSEUDOREPLICATION
Half-Sample Replication Estimates of
Variance From Stratified Samples
If a set of primary sampling units is strati-
fied to a point where the sample design calls for
the selection of two primary sampling units per
stratum, there are only two independent repli-
cates available for the estimation of sampling pre-
cision. Confidence intervals for the correspond-
ing population parameter will then be much wider
than they need to be. To overcome this difficulty,
at least partially, the U.S. Bureau of the Census
originated a pseudoreplication scheme called half-
sample replication, The scheme has been adapted
and modified by the NCHS staff and has been used
in the HES reliability measurements. A brief de-
scription of this approach is given in a report of
the U.S. Bureau of the Census (Technical Paper
No. 7, p. 57), and a reference to the Census Bu-
reau method of half-sample replication was made
by Kish (1957, p. 164). We shall first present a
technical description of half-sample replication
as used by NCHS in the Health Examination Survey.
The theoretical development duplicates, in part,
work by Gurney (1962). We shall then suggest
several ways in which the method can be modi-
fied to increase the precision of variance esti-
mates,
Consider a stratified sampling procedure
where two independent selections are made
in each stratum, Let the population and
sample characteristics be denoted as
follows:
Popula- Popula-
tion tion
Stratum Weight mean
Sample Sample
variance observations mean
2 3
1
% Y, 8 Yi1Y12 Yi
2 Ww Y g 2
2 2 Y21Y22 Ya
h Ww Y g’ y
X h h Yh1Yh2 Yh
k ¥ Y, 8 Yr1YL2 Yo
An unbiased estimate of the population mean
Y is yse=Z, Wy ¥n> and the ordinary sample
estimate of V (yg) is
L 4
v F0=/D = Wy si= (1/4) Z Wy dy
where d= (y= yp):
Under these circumstances, a half-sample
replicate is obtained by choosing one of y,, and
yi» one of y, and y,,..., and one of y ,
and y,,. The half-sample estimate of the popu-
lation mean is
Pos = z Wy whi
where i is either one or two for each h. There
are 2! possible half samples, and it is easy to
see that the average of all half-sample estimates
is equal to yg. That is, for arandomly selected
half sample
E@psl yr yizr «+0 Yi» Y12)= Fst
If one considers the deviation of the mean
determined by a particular half sample, for ex-
ample y, ,=Z W, y,, from the overall sample
mean, the result is obtained that
(Fhs,1 = Fst) =Z Woy - 1/22 Wi, (7+ vg)
=(1/2) z Ww, Vni=yp)= a/2)z wy d,
In general, these deviations are of the form
(Pps = 7st) = (1/2) Ew d, * w, d, sak wy d))
where the deviation for a particular half sample
is determined by making an appropriate choice of
a plus or minus sign for each stratum. In the ex-
ample given above, each sign was taken as plus.
The squared deviation from the overall sample
mean is therefore of the general form
Fps- 750° = A/D Wl + A/DZ + WW dd, “4D
h *
q =2 agy/2L
and its variance by
aL
w #2
2 (ag, - a") /2
In the simple linear case that is being con-
sidered here, it is easy to show that an analysis
in terms of the i 's produces results that are
identical with those obtained by the standard
analysis. For example,
9p = Tot * Wid, /2, qf) = 34 — Wd,/2
9g = Fx - wd /2, Uy = Fu + Wd /2
Us) = Ya w,d,/2, Us = yg — Wd;/2
Go = To = Wd/2. af) = Fu + Wdi/2
q = Vu gt = 3
Furthermore
6 2 2.2 2.2 2,2
2 (qf, - Mr = (Wd + wd: + wid;)/4
which is the ordinary estimate of the variance of
Yst» just as in the case of the balanced half-
sample replicates.
There is another variant of the Quenouille
type of estimate which is closely related to the
half-sample approach. Suppose that a particular
half sample is chosen. Denote its elements by
yr Four -o 00 Fy and its mean by y,. The
set of remaining elements, one in each stratum,
then constitutes an independent half sample whose
24
mean we denote by ¥7,¥. A Quenouille-type esti-
mate is then defined by
25g — Fis + Ti)/2
which, in the simple linear case that is being
considered here, is identically equalto y,. This
approach does not provide an estimate of vari-
ance in the present instance since only one esti-
mate is obtained. In more complicated situations,
however, different half samples will provide dif-
ferent values of the Quenouille half-sample esti-
mate, and it might be possible to base estimates
of variance on these different values. This pos-
sibility will be discussed briefly in the following
section,
Half-Sample Replication and the
Jackknife Method With Stratified
Ratio-Type Estimators
We have introduced half-sample replication
and the Jackknife method inthe setting of a simple
linear situation, where they obviously havenoreal
utility. Under these circumstances, they merely
reproduce results that can be obtained by direct
analysis. If, however, more complicated methods
of sampling and estimation are employed, then
direct methods of analysis may not be available,
may require a prohibitive amount of computation
in comparison with the methods being considered
here, or may even give results that are in one
way or another inferior to those provided by half-
sample replication and the Jackknife.
Although one may accept on intuitive grounds
the general premise that half-sample replication
and the Jackknife do permit the "easy' computa-
tion of variance estimates that in one way or
another mirror most of the standard complexities
of sample design and estimation, the exactchar-
acteristics of the resulting estimates and their
corresponding estimates of variance are, for the
most part, unknown. This is particularly true for
half-sample replication, even though the intuitive
“appeal of this method may be more direct than
that of the Jackknife. No published or unpublished
references to the behavior of half-sample rep-
lication in complex situations were discovered,
and the notion of balanced half samples was
introduced in this report for the first time. On
the other hand, there is a growing body of
literature and data relating to the Jackknife. We
shall now summarize this material on the Jack-
knife and then report the results of a very small
experiment which compares results obtained by
balanced half-sample replication and by the Jack-
knife.
Although Quenouille (1956) introduced his
method of adjustment as a means for reducing
the bias of an estimator, our interest inthe Jack-
knife is primarily focused on its utility for vari-
ance estimation. One is naturally interested in
obtaining any reductions in bias thatare possible,
but there is a considerable body of empirical
evidence—notably in the work of Kish, Nam-
boodiri, and Pillai (1962)— which indicates that the
"combined ratio estimator" for population means,
subpopulation means, and differences of sub-
population means probably has negligible bias in
most practical surveys. On p. 863, Kish etal.
say '"Our empirical investigations, set ina theo-
retical framework, show that the bias in most prac-
tical surveys is usually negligible; the ratio of
bias to standard error (B/s) was small in every
test, even those based on small sub-classes."
There is actually very little published ma-
terial which has a direct bearing on our present
concern. The pertinent items are briefly sum-
marized.
1. Quenouille (1956) has shown by formal
analysis that the variance of his esti-
mator, where such an estimator is ap-
propriate, differs from the variance of
the unadjusted estimator by terms of
order 1/n?.
2. Durbin (1959) applied the method of Quen-
ouille to the ratio estimator r= y/x,
where a random sample was divided into
two groups of equal size. Thus he con-
sidered the estimator, of E (y)/E (x),
t= 2r —- {r, “+ 2/2
where r = (y/x), r= (y/x)), r= (y,/x,),
y and x are sample totals, y, and x,
are half-sample totals, and y, and x, are
the other half-sample totals. Durbin con-
siders two cases: (1) x is a normal vari-
able with variance O(n™!), andtheregres-
sion of y on x is linear, not necessarily
through the origin; (2) x is a gamma
variable with mean mand the regression
of y on x is linear. For the first case,
when terms of O(n %are ignored, the
result is obtained that the variance of t¢
is smaller than the variance of r. For
the second case, it was not necessary to
use an approximate form of analysis.
Durbin concludes that ", . . whenever the
coefficient of variation of x is less than
1/4, which will be satisfied by all except
the most inaccurate estimators, Quen-
ouille's estimator has a smaller mean
square error than the ordinaryratioes-
timator. This is an exact result for any
sample size."
Brillinger (1964) studies the properties
of these estimators in relation to maxi-
mum likelihood estimators. His conclu-
sions are: "Summing up the results of the
paper, one may say that Tukey's general
technique of setting approximate confi-
dence limits is asymptotically correct,
under regularity conditions, when applied
to maximum likelihood estimates and that
the technique provides a useful method of
estimating the variance of an estimate,
Also one may say that the estimate pro-
posed by Quenouille will on many occa-
sions have reduced variance, smaller
mean-squared error and closer to asymp-
totic normality properties, when com-
pared to the usual maximum likelihood
estimates."
Robson and Whitlock (1964) apply Quen-
ouille's method of construction to obtain
estimates of a truncation point of a dis-
tribution. One of the interesting features
of their work relates to the construction
of estimators that successively eliminate
bias terms of order n?,n73, etc. They
find for their particular problem that the
variance of these estimators increases
as the bias is decreased.
Miller (1964) is concerned with conditions
under which a Jackknife estimator, and its
25
associated estimated variance, will as-
ymptotically have a Student's ¢ distribu-
tion. Both of the situations described by
Miller are ones in which the unjackknifed
estimator had a proper finite or limiting
distribution under weaker conditions than
required for the Jackknife.
The foregoing five references are concerned
with estimators and with their bias and variance.
None deals with the problem of estimating vari-
ances. However, Lauh and Williams (1963) do
present some Monte Carlo results which relate
primarily to the estimation of variance. They are
again concerned with the estimation of a ratio,
E(y)/E(x), but the Quenouille procedure is ap-
plied to the individual sample observations instead
of to half samples as in the Durbin investigation.
That is, they compare the behavior of
a= (Z y)/ (Ex)
i=1
with the behavior of
where
= (Z y= y)/(Z x-x))
i=l i
70 j=1
and
UG = ng - (n-1) Za
This is similar to the estimator that was pro-
posed for stratified sampling in the preceding
section. Lauh and Williams define two populations
which are used for empirical sampling: (1) x is
a normal variable, while the regression of y on
x is linear through the ovigin; (2) x is a chi-
square variable with 2 degrees of freedom, and
the regression of y on x is linear through the
origin. Since the regressions are forced to go
through the origin, both ¢ and g* are unbiased
estimators of E(y)/E(x). For each population
1,000 samples of n are drawn, n=2,3,..., 9,
and a variety of variance estimators are con-
26
sidered. In particular, the ordinary estimate of
variance obtained from a Taylor series approxi-
mation was employed, denoted by v,(¢); also anes-
timate of variance was obtained from the 2% ’s,
namely
v(g*) = 3 (g*. - ¢"V¥/nln-1)
i=1 (0)
The results of this investigation are most in-
teresting, particularly the fact that the precision
of v (¢*) is much better than that of v,(g) when x
has an exponential distribution, and are sum-
marized by the authors as follows:
From the results of these two studies, it
may be inferred that the bias of the esti-
mator v,(g) is dependent upon the degree
of skewness of the original y and x popu-
lations. Estimators of the true variance taken
from higher order approximations lead only
to slight improvements over the second
order approximation v(g),and in some cases
the estimate is actually worse. The preci-
sion of v(g*) is nearly double that of v,(q)
for exponential x distributions and the bias of
v(g®) is smaller than that of v,(g). Thus
it appears that the split-sample estimator
q* may be definitely preferableto g insome
situations.
Finally, we note that extensive Monte Carlo
investigations of many of these points have been
initiated by Dr. Benjamin Tepping, Director of the
Center for Measurement Research in the U.S.
Bureau of the Census. Results of these investiga-
tions are not yet available.
The foregoing references are concerned only
with random samples drawn from infinite popula-
tions. Our principal concern is with stratified
samples drawn, usually without replacement, from
finite populations where complex estimation pro-
cedures are applied to the basic sample data, Un-
der these circumstances, a careful investigation
of the behavior of estimators and of variance es-
timators would undoubtedly require a large-scale,
Monte Carlo type of program, integrated with as
much analytic work as possible, This was not
feasible within the confines of the present study,
even in terms of planning. Nevertheless, we did
Table 5. Artificial population
Stratum
I Tt TIL
=. x Z = 3 x
3 4 5 4 7 3
4 6 9 8 9 4
11 20 24 23 25 12
Total 18 30 38 35 41 19
rR, = 18 = 6000 R, = 38 = 1.0857 R, =2L = 2.1579
30 35 19
R=27 = 1,1548
84
desire some small numerical model that would 4, The Jackknife estimate of variance, as
illustrate the various points that have beenraised. described in the preceding section.
As an example, we started with the small 5. The average of four balancedhalf-sample
artificial population that is used for illustrative estimates, as described in "Balanced
purposes in Cochran (1963, p. 178,179), However, Half-Sample Replication." It was neces-
since we wished to enumerate all possible samples sary to consider two sets of balanced half
and thereby investigate the behavior of both bal- samples, one complementary to the other,
anced half-sample replication and the Jackknife since this is a nonlinear situation. It is
method of variance estimation, and since com- assumed that one of these two sets will be
putations were to be carried out on a desk cal- chosen randomly in practical applications.
culator, we were not able to use the full popula- 6. The estimate of variance based on the
tion as given by Cochran. (Itdid not appear worth- four balanced half-sample estimates.
while to invest computer programming time on one This estimate of variance is the sum of
isolated and artificial example.) Accordingly one squared deviations of four half-sample
observation was dropped from each stratum and estimates about the combined ratio esti-
the following population was used as shown in mate, divided by four, and multiplied by
table 5. the finite population correction. This is the
For this population, all possible samples of manner in which the half-sample esti-
six, n,=2 in each stratum, were enumerated mate has been applied in the work of the
-3%=127 possible samples. For each sample, U.S. Bureau of the Census and in HES.
the following quantities were computed. Again the estimate was made for each of
the two sets of balanced half samples.
1. The combined ratio estimate. 7. The Quenouille estimate based on the
2. The estimate of variance based on the
ordinary Taylor series approximation to
the variance of the combined ratio es-
timate.
3. The Quenouille estimate of the population
ratio, using individual observations as
previously described.
balanced half samples. That is, a Quen-
ouille estimate was obtained from a half.
sample and its complement. This was
carried out for each half sample and then
averaged over the set of four balanced
half samples. The results of these com-
putations are summarized in table 6.
27
Table 6. Behavior of estimates and estimates of variance obtained by enumerating
samples drawn from artificial population of table 5
Variance
Mean Average
Estimate Bias | Standavd| gy, 000 square OE of the
ad error estimate | variance
estimates
Combined ratio estimate-----=- +.0118 «122 .0148 .0149 .0099 .000034
Quenouille estimate,
individual observations----- +.0034 «126 .0160 .0160 .0110 .000040
Balanced half-sample
estimate---cemecccceccccccaao +.0428 .104 .0109 «0127 0122 .000047
Quenouille estimate,
half samples-=c-ccecccccanam- -.0198 +137 .0188 .0192 where coo
In obtaining the variance estimates a finite pop-
ulation correction of (1— a) was applied uni-
formly. It can be readily demonstrated that this
is appropriate for the Jackknife and balanced
half-sample estimates of variance, at least inthe
simple linear case that was used to introduce
these techniques. Jones (1965) describes a modi-
fication of the Jackknife, whose purpose is to
introduce the finite population correction into the
"bias-reducing'' argument upon which the Quen-
ouille adjustment rests. This modification was
not used here.
Although it is clearly impossible to draw
any general conclusions from one artificial ex-
ample such as the above, the results are interest-
ing. In particular, we find that the combined ratio
estimate has almost negligible bias and that the
ordinary variance estimate seriously underesti-
mates the true variance; that the Quenouille es-
timate with individual observations does reduce
the bias at the expense of increasing the vari-
ance by about 8 percent, and that the Jackknife
estimate of variance again seriously underesti-
mates the true variance; that the average of the
four balanced half-sample estimates has the
smallest variance and the largest bias, while at
the same time providing a reasonable estimate
of the corresponding variance. The variance of the
variance estimates is largest for the estimate
based on the balanced half samples. Even if one
makes the comparison onthe basis of mean square
error, the balanced half-sample average is still
superior to the other estimates in spite of its
larger bias, except for the variance of the vari-
28
ance estimate. As a final point, the Quenouille
adjustment applied to the balanced half samples
does reduce the bias, but at the expense of a
marked increase in terms of variance.
This example, trivial and artificial as it
may be, does raise one question that concerns
half-sample replication. The use of the Quenouille
estimate and the Jackknife method of estimating
variances have usually been considered simulta-
neously. On the other hand, the half-sample rep-
lication method of estimating variances has
always been used to estimate the variance of the
estimate obtained from the entire sample, and
not the variance of the average of the half-sample
estimates. Of course when one does not have an
"exhaustive'' set of half samples, as in the case
where they are drawn with replacement, the
average of half-sample estimates would not be
appropriate. Here, however, we do have an
"exhaustive'' set of balanced half-sample esti-
mates, and we might well consider using their
average in place of the combined ratio estimate.
In terms of our example, a variance estimate
with average value .0122 is being used to esti-
mate the variance of the combined ratio esti-
mate, whose true value is .0148, This is some-
what better than the ordinary Taylor series var-
iance estimate, whose average value is .0099,
but not nearly as good as if one uses the half-
sample estimate of variance to estimate the
mean square error of the average of the balanced
half-sample estimates—namely, a quantity whose
average is .0122 to estimate a quantity whose
true value is ,0127.
A small amount of data which relates to the
foregoing point is presented in table IV of the
appendix. For each of the six subclasses for
which comparisons of percentages are presented,
the estimate obtained from the entire sample can
be compared with the average of the 16 balanced
half-sample estimates, It follows from the argu-
ment used in developing the Quenouille-type es-
timate that the difference between the two can be
viewed as an estimate of bias for the overall
estimate. This approach has been used by Deming
(1960, p. 425). These estimates of bias, expressed
as fractions of the estimated standard errors,
range from approximately .03 to about .38. These
data are reassuring, but they are also too frag-
mentary to support any general conclusions con-
cerning the bias of estimates for HES analysis.
In conclusion, attention should be drawn to
another avenue of approach to estimation and
variance estimation which is somewhat related to
the Jackknife method, although this relation has
not been explored or even noticed in the litera-
ture. Mickey (1959) presents a general method
for obtaining finite population unbiased ratio and
regression estimators building on the work of
Goodman and Hartley (1958). In addition, he con-
structs unbiased estimates of the estimator vari-
ance by the process of breaking up the sample
into subsamples, more or less along the lines of
Tukey's general version of the Jackknife. Williams
(1958, 1961) specializes these results to regres-
sion estimators, and considers their properties
in some detail. No detailed attention has been
given to this topic in connection with the prepara-
tion of the present report.
SUMMARY
Sampling theory provides a wide variety of
techniques which can be applied in sample design
to obtain estimates having essentially maximum
precision for fixed cost. These techniques are
particularly useful when populations are spread
over wide geographic areas so thathighly cluster-
ed samples must be obtained, and when extensive
prior information about the population under study
can be used in sample selection or in estima-
tion. Such complex sample designs do, however,
require extremely complicated and only approxi-
mate expressions for estimating from a sample
the variance of survey estimates. If an extremely
large number of widely differing types of esti-
mates are to be made from a single large-scale
survey, the burden of developing appropriate
variance expressions, of programming these for
a computer, and of carrying out the computations
may become excessive.
The foregoing problems are intensified, al-
though not appreciably changed in kind, if survey
analysis is to go much beyond the estimation of
population means, percentages, and totals. This
is particularly true when the goals ofanalysis are
to compare and study the relationships among
subpopulations, or domains of study. Investiga-
tors are then interested in applying such standard
statistical techniques as multiple regression or
analysis of variance, and find that many of the
assumptions required for the application of these
techniques are violated by the complexities of the
sample design. Some authors have used the term
"analytical survey' to refer to any survey in
which extensive comparisons are made among
subpopulations; other authors reserve this term
for surveys that are specifically designed to con-
trol the precision of these comparisons. There
seemed to be little point in arguing this issue in
the present report, since most surveys are multi-
purpose in character and it is usually impossible
to design for a specific comparison. The major
portions of the first three sections of the report
are devoted to a literature survey and discussion
of these topics.
Survey design (as opposed to the analysis of
survey data) requires the use of "exact" variance
expressions since it is necessary to balance the
effects on precision of a wide variety of sampling
techniques. It is possible, however, to bypass the
corresponding detailed variance estimation tech-
niques in the actual analysis of survey data through
the use of replication. This approach is discussed
in "Replication Methods of Estimating Variances,"
where an attempt is made to set forth its advan-
tages and limitations. Emphasis has been placed
upon variance estimation, although it is clear that
covariances can also be treated in the same
manner,
One of the most serious limitations of rep-
lication as applied to the analysis of complex
sample survey data arises from the difficulty of
obtaining a sufficient number of independent rep-
29
lications to assure reasonably stable variance
estimates. This fact has been particularly obvious
when a set of primary sampling units is stratified
to a point where the sample design calls for the
selection of two primary sampling units per
stratum, thus leading to only two independent rep-
licates. To overcome this difficulty the U.S.
Bureau of the Census and the National Center for
Health Statistics have been using a pseudoreplica-
tion method for variance estimation, called half-
sample replication. This procedure is described
in "Half-Sample Replication Estimates of Vari-
ances From Stratified Samples," and several
improvements, balanced half-sample replication
and partially balanced half-sample replication,
are introduced in "Balanced Half-Sample Repli-
cation" and ''Partially Balanced Half Samples."
Still another, but related, variance estimation
technique, the Jackknife, is described in ''Jack-
knife Estimates of Variance From Stratified
Samples." The application of these methods is
illustrated on an artificial set of data in '"Half-
Sample Replication and the Jackknife Method
With Stratified Ratio-Type Estimators," and the
appendix shows how balanced half-sample repli-
cation has been used in analyzing data obtained
from the Health Examination Survey.
It would appear that replication and pseudo-
replication are extremely useful procedures for
30
obtaining variance estimates when one is making
detailed analyses of data derived from complex
sample surveys. Nevertheless, there are many
unresolved problems relating to the application
of these methods. Among these are the following:
(1) The effects of certain sampling techniques
on variances will not be picked up—e.g., the
selection of one primary unit per stratum and
controlled selection; (2) The variance estimate
ordinarily refers to the average of the replication
estimates, whereas the ordinary procedure is to
use an overall sample estimate, and the two will
not be the same except in the rare case that the
estimate is linear in form; (3) No investigations
have been carried out of the applicability of these
approaches to such problems as contingency table
analyses and standard analysis of variance ap-
proaches; and (4) It is extremely difficult to
attack any of these problems analytically, and
the development of empirical approaches that
will have widespread applicability seems most
difficult,
As a final point, we call attention to the prob-
lems that arise when survey data, as opposed to
experimental data, are used to develop general
scientific conclusions. This topic has not been
more than mentioned in this report, but reference
may be made to discussions by Yates (1960),
Kish (1959), and Blalock (1964).
BIBLIOGRAPHY
Aoyama, H.: A study of the stratified random sampling.
Annals of the Institute of Statistical Mathematics VI(1):1-36,
1954-55.
Aspin, A.: Tables for use in comparisons whose accuracy
involves two variances, separately estimated. Riometrika
26:290-293, 1949.
Bartlett, M. S.: The information available in small samples.
Proc.Camb.Phil.Soc. 32:560, 1936.
Blalock, H.M., Jr.: Causal Inferences in Nonexperimental
Research. Chapel Hill. University of North Carolina Press,
1964.
Brillinger, D. R.: The asymptotic behavior of Tukey’s zen-
eral method of setting confidence levels (the Jackknife) when
applied to maximum likelihood estimates. Pewiew of Interna-
tional Statistical Institute 32:202-206, 1964.
Cochran, W. G.: Sampling Techniques, ed. 2. New York.
John Wiley and Sons, Inc., 1963.
Cornfield, J.: Cn samples from finite populations. J.: in. .
statist. Ass. 39(226):236-234, June 1944.
Cornfield, J., and Tukey, J. W.: Average values of mean
squares in factorials. A4nn.math.Statist. 27:907-949, 1956.
Deming, W. E.: Some Theory of Sampling. New York. John
Wiley and Sons, Inc., 1950.
Deming, W. E.: Sample Design in Business Research. New
York. John Wiley and Sons, Inc., 196C.
Deming, W. E.: On the correction of mathematical bias by
use of replicated designs. Metrika 6:37-42, 1963.
Deming, W. F., and Sterhan, F. F.: On the interpretation
of censuses as samples. J.4m.statist. Ass. 36(213):45-49,
Mar. 1941.
Durbin, J.: Sampling thecry for estimates based on fewer
individuals than the number selected. Bulletin of the Inter-
nationel Statistical Institute 2€:11°-119, 1958.
Durbin, J.: \ noteon the application of uencuille’s meth-
od of bias reduction to the estimation of ratios. Riometrika
46:477-48C, 1959.
Durbin, J., and Stuart, A.: Differences in response rates
of experienced and inexperienced interviewers. Journal of
the Royal Statistical Society, Series A (General), 114, Ft. II,
pp. 163-206, 1951.
Fisher, R. A.: The Design of Experiments, ed. 3. London.
Cliver and Poyd, Ltd., 1942.
Ciautschi, W.: Someremarks on systematic sampling. 4nn.
math.Statist. 28:385-394, 1957.
Garza-Hernandez, T.: An Approximate Test of Homogeneity
on the Basis of a Stratified Random Sample. M.S. thesis, New
York State School of Industrial and Labor Relations, Cornell
University, 1961.
Goodman, L. A., and Hartley, H. G.: The precision of un-
biased ratio-type estimators. J.Am.statist.Ass. 53(282):491-
508, June 1958.
Goodman, R., and Kish, L.: Controlled selection—a tech-
niquein probability sampling. J.Am.statist. Ass. 45(251):350-
372, Sept. 195C.
Gupta, S. S.: Probability integrals of multivariate normal
and multivariate ¢. Ann.math.Statist. 34:792-828, 1963.
Gurney, M.: The Variance of the Replication Method for
Lstimating Variances for the CPS Sample Design. Unpublished
memorandum, U.S. Bureau of the Census, 1962.
Gurney, M.: McCarthy’s Orthogonal Peplications for Fsti-
mating Variances, With Grouped Strata. Unpublished memo-
randum, U.S. Bureau of the Census, 1964.
Hansen, M. H., Hurwitz, W. N., and Madow, W. G.: Sample
Survey Methods and Theory, Vols. I and II. New York. John
Wiley and Sons, Inc., 1953.
Hanson, R. H., and Marks, E. S.: Influence of the inter-
viewer on the accuracy of survey results. J.Am.statist.Ass.
53(283):635-655, Sept. 1958.
Hartley, H. O.: Analytic Studies of Survey Data. Instituto
di Statistica, Rome, Volume in onora di Corrado Gini. Ames,
Towa. Statistical Laboratory, lowa State University of Science
and Technology. Reprint Series 63. 1959.
Jones, H. L.: Investigating the properties of a samplemean
by employing random subsample means. J.Adm.statist.Ass.
51(273):54-83, Mar. 1956.
Jones, H. L.: The Jackknife method. Proceedings of the
IBM Scientific Computing Symposium on Statistics. White
Plains, New York. IRM Data Processing Division, 1965.
Keyfitz, N.: A factorial arrangement of comparisons of fam-
ily size. Am.J.Soc. 58(5):470-480, Mar. 1953.
Keyfitz, N.: Estimates of sampling variance where two
units are selected from each stratum. J.Am.statist.Ass. 52
(28€):503-510, Dec. 1957.
Kish, L.: Confidence intervals for clustered samples.
American Sociological Review 22(2):154-165, Apr. 1957.
Kish, L.: Some statistical problems in research design.
American Sociological Review 24:328-338, June 1959.
Kish, L.: Efficient allocation of a multi-purpose sample.
Econometrica 29:363-385, 1961.
Kish, L.: Variances for indexes from complex samples.
Proceedings of the Social Statistics Section of the American
Statistical Association, 1962, pp. 190-199.
Kish, L.: Survey Sampling. New York. John Wiley and
Sons, Inc., 1965.
Kish, L., and Hess, I.: On variances of ratios and their
differences in multistage samples. J.Am.statist. Ass. 54(286):
416-446, June 1959.
Kish, L., Namboodiri, N. K., and Pillai, R. K.: The ratio
bias in surveys. J.Am.statist. Ass. 57(300):863-876, Dec. 1962.
Koop, J. C.: On theoretical questions underlying the tech-
nique of replicated or interpenetrating samples. Proceedings
31
of the Social Statistics Section of the American Statistical
Association, 1960, pp. 196-205.
Lahiri, D. B.: Technical paper on some aspects of the de-
velopment of the sample design, in P. C. Mahalanobis ‘Tech
nical Paper No. 5 on the National Sample Survey.’ Sankhya
14:264-316, 1954.
Lauh, E., and Williams, W. H.: Some small sample results
for the variance of a ratio. Proceedings of the Social Statis-
tics Section of the American Statistical Association, 1963,
pp. 273-283.
Madow, W. G., and Madow, L. H.: On the theory of system-
atic sampling. Ann.math.Statist.XV:1-24, 1944.
McCarthy, P. J.: Sampling considerations in the construc-
tion of price indexes with particular reference to the United
States Consumer Price Index. U.S. Congress, Joint Economic
Committee, Government Price Statistics, Hearings. Washing-
ton. U.S. Government Printing Office. Part 1, pp. 197-232,
1961.
McCarthy, P. J.: Stratified sampling and distribution-free
confidence intervals for a median. Accepted for publication
in J.Am.statist.Ass., 1965.
Meier, P.: Variance of a weighted mean. Biometrics 9(1):
59-73, Mar. 1953.
Mickey, M. R.: Some finite population unbiased ratio and
regression estimators. J.Am.statist. Ass. 54(287):594-612,
Sept. 1959.
Miller, R. G., Jr.: A trustworthy Jackknife. Ann.math.Sta-
tist. 35(4):1594-1605, 1964.
National Center for Health Statistics: Cycle I of the Health
Examination Survey, sample and response, United States,
1960-62. Vital and Health Statistics. PHS Pub. No. 1000-
Series 11-No. 1. Public Health Service. Washington. U.S.
Government Printing Office, Apr. 1964.
National Center for Health Statistics: Blood pressure of
adults by age and sex, United States, 1960-62. Vital and
Health Statistics. PHS Pub. No. 1000-Series 11-No. 4. Pub-
lic Health Service. Washington. U.S. fiovernment Printing
Office, June 1964.
National Center for Health Statistics: Blood pressure of
adults by race and area, United States, 1960-62. Vital and
Health Statistics. FHS Pub. No. 1000-Series 11-No. 5. Pub-
lic Health Service. Washington. U.S. Government Printing
Office, July 1964.
National Center for Health Statistics: I'lan and initial pro-
gram of the Health Examination Survey. Vital and Health
Statistics. PHS Pub. No. 1000-Series 1-No. 4. Public Health
Service. Washington. U.S. Covernment Frinting Office, July
1965.
National Health Survey: The statistical design of the Health
Household-Interview Survey. Health Statistics. FHS Pub. No.
584-12. Public Health Service. “ashington. II.S. ¢‘cvern-
ment Printing Office, July 1958.
Okamoto, M.: Chi-square statistic based on the pooled
frequencies of several observations. Biometrika 50:524-528,
1963.
Plackett, R.L., and Burman, J.P.: The design of optimum
multifactorial experiments. Biometrika 33:305-325, 1943-46.
Quenouille, M. H.: Notes on bias in estimation. Biometrika
43:353-360, 1956.
Robson, D. S., and Whitlock, J. H.: Estimation of a trun-
cation point. Biometrika 51:33-39, 1964.
Satterthwaite, F. E.: An approximate distribution of esti-
mates of variance components. Biometrics 2:110-114, 1946.
Sedransk, J.: Sample Size Determination in Analytical Sur-
veys. Ph.D. dissertation, Harvard University, 1964.
Sedransk, J.: Analytical Surveys With Cluster Sampling.
Unpublished naper, Iowa State University, 1964a.
Sedransk, J.: 4 Double Sampling Scheme for Analytical
Surveys. Unpublished paper, Iowa State University, 1964b.
Sukhatme, T'. V.: Sampling Theory of Surveys Vith Appli-
cations. Ames, Iowa. Iowa State College Press, 1954.
Tukey, J. W.: Bias and confidence in not-quite large sam-
nles. Abstracted in Ann.math.Statist. 29:614, 1958.
U.S. 2ureau of the Census: The Current Population Survey,
A Report on Methodology. Technical Paper No. 7. Washing-
ton. U.S. Government Printing Office, 1963.
Walsh, J. E.: Concerning the effect of intraclass correla-
tion on certain significance tests. Ann.math.Statist. 18(1):
88-96, 1547.
Walsh, J. E.: On the ‘“‘information’’ lost by using a ¢-test
when the population variance is known. J.Am.statist.Ass.
44(245):122-125, Mar. 1949.
Weibull, M.: The distributions of ¢- and F-statistics and of
correlation and regression coefficients in stratified samples
from normal populations with different means. - Skand.Aktuar-
Tidskr. 36(suppl. 1, 2):9-106, 1953.
Welch, B. L.: The generalization of ‘‘Student’s’’ problem
when several different population variances are involved. Bio-
metrika 54:98-35, 1947.
Villiams, W. H.: Unbiased Regression Estimators and Their
Efficiencies. Fh.D. dissertation, Iowa State College, 1558.
Williams, W. H.: Generating unbiased ratio and regression
estimators. Biometrics 17(2):267-274, June 1961.
Yates, F.: Sampling Methods for Censuses and Surveys,
ed. 3. New York. Hafner Publishing Co., 1960.
O00
32
APPENDIX
ESTIMATION OF RELIABILITY OF FINDINGS FROM THE FIRST CYCLE OF
THE HEALTH EXAMINATION SURVEY
Survey Design
The sampling plan of the first cycle of the Health
Examination Survey followed a highly stratified, mul-
tistage probability design in which a sample of the
civilian, noninstitutional population of the conterminous
United States, 18-79 years of age, was selected. In
the first stage of this design, the 1,900 primary sam-
pling units (PSU's), geographic units into which the
United States was divided, were grouped into 42 strata.
Here a PSU is either a standard metropolitan statistical
area (SMSA) or one to three contiguous counties. By
virtue of their size in population, the six largest SMSA's
were considered to be separate strata and were in-
cluded in the first-stage sample with certainty. As
New York was about three times the size of other strata
and Chicago twice the average size, New York was
counted as three strata and Chicago as two, making a
total of nine certainty strata. One PSU was selected
from among the PSU's in each of the 33 noncertainty
strata to complete the first-stage sample. Later stages
resulted in the random selection of clusters of typically
four persons from segments of households within the
sample PSU's. The total sampling included some 7,700
persons in 29 different States.
All examination findings for sample persons are
included in tabulations as weighted frequencies, the
weight being a product of the reciprocal of the prob-
ability of selecting the individual, an adjustment for
nonresponse cases, a stratified ratio adjustment of the
first-stage sample to 1960 Census population controls
within 6 region-density classes, and a poststratified
ratio adjustment at the national level to independent
population controls for the midsurvey period (October
1961) within 12 age-sex classes.
The sample design is such that each person has
roughly the same probability of selection. However,
there were sufficient deviations from that principle in
the selection and through the technical adjustments to
produce the following distribution of sample weights as
required to inflate to U.S. civilian, noninstitutional pop-
ulation levels:
Percent
1-digit distribution
Class relative of examined
Weight class average weight persons
7.,000-20,999 14,000 1 78.7
21,000-34,999 28,000 2 18.4
35,000-48,999 42,000 3 1.9
49 ,000-62,999 56,000 4 0.6
63,000-76,999 70,000 5 0.0
77,000-90,999 84,000 6 0.4
A more detailed description of the sampling plan
and estimation procedures is included in Vital and
Health Statistics, Series 11, No. 1, 1964: '"Cyclel of the
Health Examination Survey, Sample and Response.'
Requirements of a Variance Estimation
Technique
The Health Examination Survey is obviously com-
plex in its sampling plan and estimation procedure. A
method for estimating the reliability of findings is re-
quired which reflects both the losses from clustering
sample cases at two stages and the gains from strati-
fication, ratio estimation, and poststratification. Ideally,
an appropriate method once programmed for an elec-
tronic computer can be used for a wide range of sta-
tistics with little or no modification to the program,
This feature of adaptability is an important and special
requirement in HES. The small staff of analysts in the
Division of Health Examination Statistics typically works
on only a few sections of the examination and laboratory
results at a time. Consequently tabulation specifications
33
and edited input for a sizable variety of report topics
are not available until shortly prior to the need for
estimates of sampling error. New tables of sampling
error have been prepared for each of the 12 reports
published to date and at least a dozen more will be
prepared before the Cycle I publication series is com-
pleted.
Development of the Replication Technique
The method adopted for estimating variances in
the Health Examination Survey is the half-sample
replication technique. The method was developed at
the U.S. Bureau of the Census prior to 1957 and has
at times been given limited use in the estimation of
the reliability of results from the Current Population
Survey. A description of the half-sample replication
technique, however, has not previously been published,
although some references to the technique have ap-
peared in the literature.
The half-sample replication technique is particu-
larly well suited to the Health Examination Survey
because the sample, although complex in design, is
relatively small (7,000 cases) in sample size. Only
a few minutes are required for a pass of all cases
through the computer. This feature permitted the de-
velopment of a variance estimation program which is
an adjunct to the general computer tabulation program.
Every data table comes out of the computer with a
table of desired estimates of aggregates, means, or dis-
tributions together with a table identical in format but
with the estimated variances instead of the estimated
statistics. The computations required by the method are
indeed simple and the internal storage requirements are
well within the limitation of anIBM 1401-1410 computer
system,
The variance estimates computed for the first
few reports of Cycle I findings were based on 20
random half-sample replications. A half sample was
formed by randomly selecting one sample PSU from
each of 16 pairs of sample PSU's, the sample repre-
sentatives of 16 pairings of similar noncertainty strata,
and 8 of 16 random groups of clusters of sample per-
sons selected from the 9 certainty strata and the San
Francisco SMSA, the largest sample PSU of the 33
noncertainty strata. The concept of balanced half
samples is utilized in present variance estimates for
HES. The variance estimates are derived from 16
balanced half-sample replications, The composition
of the 16 half samples, shownin tables I and II, was
determined by an orthogonal plan. In the tables an "'X"
indicates that the PSU or random group was included
in the half sample. The construction using 16 balanced
half-sample replications results from viewing the
certainty and noncertainty strata as independent uni-
verses. This is only approximately true as the post-
stratified ratio adjustment to independent population
controls is made across both certainty and noncertainty
strata. Analternative construction, and perhaps a slight-
ly more accurate one, would have been to use 24 bal-
Table I. Composition of the 16 balanced half-sample replicates—certainty strata
Balanced half-sample replications
: Random group of segments in
Pair certainty PSU's
1 4is|el7|8|ofl10 31 {12 [1314 25 (16
1 XIX XIX Xi XX X
X X X) X
2 X X X X
X X X X X X
3 X X|X X X X
X X X X
4 X X |X X 20x X
XX |) X X| X
5 Xl X{ Z| 2{ X X
X X X X
6 X X X X X X X
X X |X X X
7 X X|X X| X X X
X XX |X X XX! X
8 X X X X
X X|X X X| X X| X
34
Table II. Composition of the 16 balanced half-sample replicates—noncertainty strata
Balanced half-sample replications
Pair Sample PSU
Liz 341516171819 {10 ;11.]112 113 {14 |15 {16
1 |Pittsburgh, Pa., SMSA-r- |X |X |X |X |X |X |X |X |X| X |X| X| X| X| X| X
Providence, R.I., SMSA
2 | Columbus, Ohio, SMSA---- X X X X X X X X
Akron, Ohio, SMSA------- X X x x X x
3 | York, Pa., SMSA----=--== Xx X xi x A |
Muskegon-Ottawa, Mich---| X | X X|X X| X X| X
4 | Cayuga-Wayne, N,Y-=--==- X X XX X X
York, Me---e-cccccncean- XiX X X| X X| X
5 |Baltimore, Md., SMSA---- Zixizix Xx x
Louisville, Ky., SMSA--- |X |X |X [|X X| X| X| X
6 | Nashville, Tenn., SMSA-- |X X X X |X X X X
San Antonio, Tex., SMSA- X X|X X Xi X X
7 Savannah, Ga., SMSA----- X |X X[X|[X | X X X
Midland, Tex., SMSA--=-= XixIX X| X| X| X
8 | Barbour, Ala-==--=cecece== X X X X X
Independent cities in
Virginia in 1950----==-- X X X |X X X XxX! xX
9 | Brooks-Echols-
Lowndes, Ga-=====ccece== XX XixX| Xx] X| x X
Jackson-Lawrence, Ark--- | X |X |X |X [X |X |X |X
10 | Horry, S.Ce==--ceececcca= X X X X X X X X
Franklin-Nash, N.Cooomem X X |X X X X
11 | Lafayette-Panola, Miss-- |X [X X X| X X
E. Feliciana-St.
Helena, La-=--ccecceccce= X |X X(X|X[X X
12 | San Jose, Calif., SMSA--~ X |X X|X X X| X X
Minneapolis-St. Paul,
SMSA-vermmmmmcc mc mmca—— X X|X X X| X X| X
13 | Ft. Wayne, Ind., SMSA--- [X |X [X Xi Xx x X
Topeka, Kans., SMSA-===- X X|X |X| X| X
14 | Grant, Wash---cceecnecena- X X|X X X X
Apache-Navajo, Ariz-=--- | X X X X X X| X X
15 | Dunklin-Pemiscot, Mo---- xx X X| Xx
Franklin-Jackson=-
Williamson, Ill--cee--- X |X X XX X
16 | Bates, Mo--=-=-=-=--- ——————— X X X X X
Bayfield, Wis--eecceccce== XIX X 1X | X X - X X
providence, R.I., SMSA figures into the variance computations as it is always a
part of the right hand size of the difference z' i
- z' in the variance equation.
35
Table III.
Estimates of the percent of demographic
subgroups of the
hypertension and estimates of variance in percent
U.S. adult population with
Replica-
HES T SRS SRS 3
ir cation % : Sample Ratio
Demographic subgroup SpLaaate estimate bic gic gstlince persons of
percent Are percent | variance examined | variances
(col. 1)! (col. 2Y! (col. 3) (col. 4): (col. 5) (col. 6)
Males aged 35-44 with income less
than $2,000-=-=-=mmcmeem meme 19.69 | 34,1056 22,22 | 27.4348 63 1.2432
Females aged 55-64 with income of
$4,000-86,999 == mmmmmmemmem meee em 25,75 | 117.2225 24,49 | 18.8697 98 0.9127
Females with income of $10,000+-===-- 11.75 4.7524 11.95 2.7326 385 1.7391
White males with income of
$4,000-$6,999 == mmmmmm meee ee 12.21 1.2321 12.39 1.2114 896 1.0171
White females with income of
$7,000-89,999 mcm cmcmcm meme 11.48 5.7600 10.59 2,0066 934 2.8705
Negroes with income of
$2,000-$3,999 == mmmmmmmcmm mmm meee 23.08 | 10.2400 23.41 8.7474 205 1.1706
Males aged 18-24 with 9-12 years
of schoOlemmcemcccc ccc 2.45 1.0609 2.37 0.9151 253 1.1593
Females aged 25-34 with none or
less than 5 years of school-----=-= 2.49 5.3361 3.33| 10.7407 30 0.4968
Males with 5-8 years of school------ 17.82 1.8225 17.92 1.7805 826 1.0236
White males with 13+ years of
SChoOl mmm emcee em 9.34 2.3716 9.01 1.4211 577 1.6688
White females with 9-12 years of
Eee at 10.33 0.35329 9.54 0.5187 1,666 1.0274
Negroes with 5-8 years of school---- 30.73 | 10.3684 31.19 7.4948 286 1.3834
Males aged 65-74 who are married---- 27.26) 19.2721 26.87 9.7751 201 1.9716
Females aged 35-44 who are
separated----mecemmcmc cee eee 18.80 | 78.3225 22.22) 96,0217 18 0.8157
Females who are divorced------ce-a-- 13.47 | 13.7641 14.50 9.4658 131 1.4541
White males who are single---eeemeen= 9.05 1.9321 9.98 2.23% 401 0.8628
White females who are widowed=--===-- 35.77 15,1321 33.55 2.1223 313 2.1246
Negroes who are married------ececeea-= 27.79 5.1076 28.79 3.8316 535 1.3330
Males aged 55-64 who are craftsmen-- 15.153 | 29,9209 14,29 | 19.4363 63 1.539%
Females aged 35-44 who are private
household workers=meeemeececececaanax 10,60 21.2521 10.67 | 12.7052 75 1.6727
Males who are laborersmeem-eececececeaa- 19.93 11.4921 18.25 5.6730 263 2,0258
White males who are farmers or
farm managers----emccmeccccmccncnaa 10,89 2.8900 11.39 6.3889 158 0.4523
White females who are clerical
and sales workers=-----ececccocaonx 9.80 2.8561 9.78 1.9522 451 1.4630
Negroes who are professional
WOTKerSmmmmmm mmc cece eee 16.57 | 31.2481 16,22 | 36,7202 37 0.8510
Males aged 25-34 who are employed
in construction and mining---=-==-= 7.46 9.3025 9.52) 13,6775 63 0.6801
Females aged 18-24 who are employed
in wholesale and retail trade-==-=-- 2.29 6.8121 2,27 5.0477 44 1.3495
Males who are employed in
transportation--=-eeceecceccccccaaa 11.29 4,1209 10.76 4,3067 223 0.9569
White males who are employed in
finance, insurance and real estate- 12.34 | 24.5025 11.54 | 13,0860 78 1.8724
White females who are employed in
Services-=--mmmmemccc cmc 10.48 3.0276 10.59 2.3324 406 1.2981
Negroes who are employed in
Government==--=seccmeccmmcmceccennn 22,19 9.3636 22,65 9.6800 243 0.9673
36
Table IV.
Estimates of differences in percent between demographic
subgroups of the U.S. adult
population with hypertension and estimates of variance in percent
Replica- Average
HES s SRS SRS 3
: cation : : Sample Ratio of rep-
Demographic subgroup gstingte estimate a espivace persons of licate
of : examined | variance per-
percent |... ce | Percent |variance i
(gol. 1) | (col, 2) | (col, 3) | (col, 4) | (col. 3) [| (col. 6) I (col, 7)
1. Adults with income
less than $2,000====-== 26.23 4.5796 26.66 1.7791 1,099 2,5741 26.06
2. Adults with income
of $10,0004=====mcmeaan 11.75 1.3924 12,17 1.3933 764 0.9994 11.34
Difference (l=2)===-=-= 14.48 5.2850 14.48 3.1784 1.6628 15,72
3. Males with income
$2,000-$3,999~ccmmcmun- 15.36 4,2025 15.31 2.4499 535 1.7154 14.75
4, Males with income
$4,000-56,999 ccc mmmun= 12.83 1.4641 13.24 1.1797 975 1.2411 12,35
Difference (3-4)====-= 2.33 4,5600 2.27 3.6296 1.2563 2.40
5. Females aged 55-64 with
income $4,000-$6,999-~~ 25.75 17.2225 24,49 18.8697 98 0.9127 25.91
6. Females aged 55-64 with
income $7,000-$9,999--- 25.25 99.8001 24,32 49.7500 37 2.0060 26.67
Difference (5=6)===-- 0.50] 110.1069 0.17 68.6197 1.6046 0.66
anced half-sample replications, viewing the 16 pairs of
noncertainty strata and 8 pairs of randomly grouped
clusters from the certainty strata as a single universe.
After the composition of each of the balanced half
samples was determined, the resulting half samples
were then separately subjected to all the estimation
procedures and tabulations used to produce the final
estimates from the entire sample.
An estimated variance s?,- of an estimated sta-
tistic z” of the parameter z is obtained by applying the
formula
2 5%
18 2
st = 2 (z{-2")
1
16 i=1
where z; is the estimate of z based on sth half
sample and z” is the estimate of z based on the
entire sample.
Computer Output
For the Health Examination Survey the variance
tabulations and prepublication tabulations of estimates
are derived from the same computer output. Since the
findings are generally expressed as rates, means, or
percentages, each output "table" actually consists of
three tables, the statistic of interest, such as the
percent of persons with hypertension, the numerator
of each cell in the "table," and the denominator of each
cell. The cells of the table are a cross-classification
of the statistic by age and sex with one of about a
dozen demographic variables for which information
was collected in the survey. The analyst can also
receive a printout of the same three tables for each
of the 16 half-sample replications. The replication
tables are useful when estimates of the variance of
estimated differences between statistics or of such
derived statistics as medians are needed or for
evidence to support or refute a hypothesis concerning
observed patterns in the data. In addition to the "table"
of findings, the output also includes a ''table' of estimat-
ed standard errors (of the statistic, its numerator,
and its denominator), a ''table'" of estimated relative
variances (the estimated variance of a statistic divided
by the square of the statistic), and a ''table' of the
number of sample observations on which the statistic,
its numerator, and its denominator are based. The
last table together with the others gives some insight
into the effect of the sampling plan and estimation pro-
cedures.
Illustration
The figures in table III are estimates from the
Health Examination Survey of the percent of demographic
subgroups of the adult population with hypertension and
their estimated variances. The official HES estimates
based on unbiased inflation factors adjusted for non-
response and ratio adjusted to independent population
controls are shown in column 1, Estimates of their
variance derived from 16 balanced half-sample repli-
37
cations treating the estimated percent of replicate
i as z{ are shown in column 2. For comparison, the
estimates of percent and variance which would have
resulted if the 6,600 examined persons had been a
simple random sample of the U.S. population and the
sample size in each demographic subgroup or domain
is considered to be fixed, are shown in columns 3 and
4. The number of examined sample persons inthe demo-
graphic subgroup or domains (the bases of the percents)
are shown in column 5. The ratios of the two variance
estimates are shown in column 6. These ratios are in-
dicative of the net effect of clustering and stratification
in the sample design, deviations from equal probabilities
of selection, and nonresponse and ratio adjustment in
the estimation procedures, and reflect as well the
variance of the estimated variance.
The median ratio of replication variance to simple
random variance——i.e., of an appropriate variance to
38
a much cruder measure—is 1.30. The mean ratio is
1.31. As one would expect, there is a tendency for the
ratio to be higher for larger values of the statistic, al-
though this tendency is not very pronounced.
The criteria for hypertension was 160 mm. Hg.
or over systolic blood pressure and 95 mm. Hg. or
over diastolic. The average of three blood pressures
taken over a 30-minute period was used for each
examined person.
Table IV is similar to table III but it also includes
the estimated difference in percent between two demo-
graphic subgroups. Estimates of variance of the dif-
ference between two estimated percents which would
have resulted if the sample had been a simple random
sample were obtained by summing the estimated
variances of the two estimated percents. The average
of the estimated percents over the 16 replicates is
shown in column 7.
% U. S. GOVERNMENT PRINTING OFFICE : 1966 O - 211-926
Series 1.
Series 2.
Series 3.
Series 4.
Series 10.
Series 11.
Series 12.
Series 20.
Series 21.
Series 22,
OUTLINE OF REPORT SERIES FOR VITAL AND HEALTH STATISTICS
Public Health Service Publication No. 1000
Programs and collection procedures.—Reports which describe the general programs of the National
Center for Health Statistics and its offices and divisions, data collection methods used, definitions, and
other material necessary for understanding the data.
Reports number 1-4
Data evaluation and methods research.—Studies of new statistical methodology including: experimental
tests of new survey methods, studies of vital statistics collection methods, new analytical techniques,
objective evaluations of reliability of collected data, contributions to statistical theory.
Reports number 1-15
Analytical studies.—Reports presenting analytical or interpretive studies based on vital and health sta-
tistics, carrying the analysis further than the expository types of reports in the other series.
Reports number 1-4
Documents and committee veports.—Final reports of major committees concerned with vital and health
statistics, and documents such as recommended model vital registration laws and revised birth and
death certificates.
Reports number 1 and 2
Data From the Health Interview Survey.—Statistics on illness, accidental injuries, disability, use of
hospital, medical, dental, and other services, and other health-related topics, based on data collected in
a continuing national household interview survey.
Reports number 1-29
Data Fyrom the Health Examination Suvvey.—Statistics based on the direct examination, testing, and
measurement of national samples of the population, including the medically defined prevalence of spe-
cific diseases, and distributions of the population with respect to various physical and physiological
measurements.
Reports number 1-12
Data From the Health Records Survey.--Statistics from records of hospital discharges and statistics
relating to the health characteristics of persons in institutions, and on hospital, medical, nursing, and
personal care received, based on national samples of establishments providing these services and
samples of the residents or patients.
Reports number 1-4
Data on mortality.—Various statistics on mortality other thanas included in annual or monthly reports—
special analyses by cause of death, age, and other demographic variables, also geographic and time
series analyses.
Reports number 1
Data on natality, marriage, and divorce.—Various statistics on natality, marriage, and divorce other
than as included in annual or monthly reports—special analyses by demographic variables, also geo-
graphic and time series analyses, studies of fertility.
Reports number 1-8
Data From the National Natality and Mortality Surveys.—Statistics on characteristics of births and
deaths not available from the vital records, based on sample surveys stemming from these records,
including such topics as mortality by socioeconomic class, medical experience in the last year of life,
characteristics of pregnancy, etc.
Reports number 1
For a list of titles of reports published in these series, write to: National Center for Health Statistics
U.S. Public Health Service
Washington, D.C. 20201
NATIONAL . .
CENTER Series 2
For HEALTH Num 15
AV NAR Ros Ll
SEITE] 0)
Psychological Measures
Used in the Health
Examination Survey
of children ages 6-11
U.S. DEPARTMENT OF
HEALTH, EDUCATION, AND WELFARE
Public Health Service
Public Health Service Publication No. 1000-Series 2-No. 15
For sale by the Superintendent of Documents, U.S. Government Printing Office
Washington, D.C. 20402 - Price 45 cents
VITALand HEALTH STATISTICS
DATA EVALUATION AND METHODS RESEARCH
evaluation of
Psychological Measures
Used in the Health
Examination Survey
of children ages 6-11
A critical review of literature pertaining to the psy-
chological measures used in Cycle Il, with recommen-
dations concerning validity, reliability, and applica-
bility to the Survey data.
Washington, D.C. March 1966
U.S. DEPARTMENT OF
HEALTH, EDUCATION, AND WELFARE Public Health Service
John W. Gardner William H. Stewart
Secretary Surgeon General
NATIONAL CENTER FOR HEALTH STATISTICS
FORREST E. LINDER, Pu. D., Director
THEODORE D. WOOLSEY, Deputy Director
OSWALD K. SAGEN, Pu. D., Assistant Director
WALT R. SIMMONS, M.A., Statistical Advisor
ALICE M. WATERHOUSE, M.D., Medical Advisor
JAMES E. KELLY, D.D.S., Dental Advisor
LOUIS R. STOLCIS, M.A., Executive Officer
OFFICE OF HEALTH STATISTICS ANALYSIS
Iwao M. Moriyama, Pu. D., Chief
DIVISION OF VITAL STATISTICS
RoserT D. Grove, Pu. D., Chief
DIVISION OF HEALTH INTERVIEW STATISTICS
PuiLip S. LAWRENCE, Sc. D., Chief
DIVISION OF HEALTH RECORDS STATISTICS
Monroe G. SirkEN, Pu. D., Chief
DIVISION OF HEALTH EXAMINATION STATISTICS
ArTHUR J. McDoweLL, Chief
DIVISION OF DATA PROCESSING
SioNEY BINDER, Chief
Public Health Service Publication No. 1000-Series 2-No. 15
Library of Congress Catalog Card Number 65-62272
FOREWORD
The practice of comparing one individual with
another is as old as recorded history. Man's
earliest writings are replete with statements in-
dicating that he has long viewed his fellow man in
terms of whether or not he measured up to an
expected ideal. Similarly, the performance of a
man has traditionally been described in terms of
how it compares with that of another man.
However, subjecting these 'known'' differences to
the scientific method of inquiry is a recent
development.
In the area of individual differences ir
behavior and psychological characteristics, re-
search has progressed from the simple to the
complex. The first studies dealt with the simple
functions of speed of reaction time. Today, studies
are aimed at measuring individual differences in
the complex functions of motivation, ego-integra-
tion, and cognition.
Progress in developing a technology for
measuring behavior has progressed in a similar
manner. Instruments are available which, most
scientists will agree, accurately measure the
speed with which an individual taps his finger in
response to a given signal. Scientists do not
agree, however, on the adequacy of the equipment
used to measure individual differences in intelli-
gence. Moreover, there will even be some dis-
agreement over the use of the word "intelligence"
to describe certain aspects of behavior.
Because of the present state of the art of
psychological measurement, studies such as those
conducted by the Health Examination Survey
encounter difficult problems in attempting to esti-
mate the prevalence of various mental health
factors in the population.
The Health Examination Survey is part of the
U.S. National Health Survey, authorized by
Congress in 1956 to collect information about the
Nation's health. Data are collected by direct
examinations of individual persons chosen to
constitute a probability sample of some segment of
the total population of the United States.
The first sample represented the adult popu-
lation aged 18 through 79 years. Since the study
was primarily concerned with the prevalence of
chronic physical disease, the examination did not
include psychological measurements. The second
sample consisted of noninstitutionalized children
ages 6 through 11, among whom the incidence of
chronic disease is insignificant. The important
health factors in this group are found in those
functions which result in growthand development.
These, then, were the factors to be studied.
Many authorities in the field of growth and
development contributed to the planning phase of
the Survey. Although they generally agreed on what
factors should be measured, they could not agree
on how the measurements should be obtained. They
did conclude that present instruments were inade-
quate but that these were the only tools available.
The tests which are discussed in the following
report were those selected for use by the Health
Examination Survey. In choosing these instru-
FOREWORD—Con,
ments, primary consideration was given to those
which best met the following criteria:
1.
They were capable of yielding data in
those areas considered most important
to the study of growth and development.
. They would produce data in a form which
would be meaningful to the individuals
responsible for children's health,
They were suitable for use in a survey
operation where examiners change fre-
quently, where only 1 hour is available
to conduct the examination, and where
examining conditions are less than opti-
mal.
The selected instruments are not ideal, but
they are felt to be the best compromise offered
by the present state of the art of measurement.
How much was compromised? What can be
said about the growth and development of chil-
dren from the data obtained by the use of these
instruments?
Through a contractual arrangement with Dr.
Sells, the first step has been taken in answering
these questions.
Lois R. Chatham, Ph.D.
Psychological Advisor
Division of Health Exam-
ination Statistics
CONTENTS
Page
Foreword «eweeeecc mecca ccc mn cc cence reece ccm em ———————————————————— i
Introduction = ==~=---ceccccm mmc cemr eee ———— 1
I. The Wechsler Intelligence Scale for Children,
the Vocabulary and Block Design SUDtEStS====mm momo mom mm cmc eeeeeem 2
Description of the WISC == ome mmm mc meee eee eee 2
Research on Short Forms of the WISC-=-memecemcmmm emcee eee eee mm 3
Reliability and Stability=-===- === mmm mmm eee eee 4
Validity-m mmm em mm meme oe eee eee eee 4
Factors Affecting WISC Scores---==cmemmmem coca 10
AnXiety--mmmmmm omen ee 10
Sex Differences----==ce- mmm eee 11
Qualitative Differences by Level-me mm emm ammo cece 11
Developmental FactorS=======mmmmmmmm ccc ee 12
Special GroupS=--=-=mmem meme mmm mmm 12
Reading Disability-===-=- cmc cco eee 12
Auditory Disability======m=cm mmm ee eee 13
Visually Handicapped---====m=cmcmmm mo cee eee 13
StUtterers ==-mmmmm meee eee 13
Cerebral Palsy-=--emmmm mmm eee eee 14
Organic Impairment of Central Nervous System--==-c-ceeeeamaana- 14
Gifted ~===mmmmmm ccc ———— 14
Mentally Retarded and Defective--==meemmmm cece mecca 14
Bilingual === emcee mm meee eee 14
Negro —===-—mmmm mmm eee eee ee 15
Socioeconomic StatuS--=--=-=ceemcmcmccm ccm ec mmm ———————— 15
Comparison of WISC and Stanford-Binet IQ'S---=-mcmeemmmcmccceeceae 15
Summary and ConclusionS--=-===me mmm eee 17
Bibliography === ==me mmm mee eee eee 18
CONTENTS—Con,
P
II. The Wide Range Achievement Test, the Oral Reading age
and Arithmetic SubteStS ======mmmmmm omc 23
EVAIUHIEVE Cr TlOT Iti mormon mim i ito im vp wri ts ii ot iis i sm mp i 23
1946 EIN Of WRAT «=m wmmimer mmm cv im mw sr mm mm em 0m mm a mrs 0 00 24
Research on the 1946 WRAT--=---mmeemcemc cee eecc mcm me 25
Reading =====mmmmmm memo eee eee eee meee 25
Arithmetic--~==mmmememmmem cee m meme ee ———————————————— 29
1963" BAIION Of WRA Teenie mn mm wns iin hm sm ws 29
Validity and NOrms=-====mmm emcee eee meme 30
Comparison of the Two Editions======ememmammmmm occ mccceee 30
Validation of 1963 Edition-----===mmccmmmmc emcee meee meen 30
Validity Variances-----==m=cmommmmm mo meee eee 31
Validity Data in 1963 Manual-=----eeeeocmmmcmm occ em 31
Grade EquivalentS--------cemecmmmmoe cece meme 32
STANAATTA SCOT EE == == mmm em mm mm ro rm mm me mre i mo ne rm 32
Percentiles =-==m= emo mm meee ee meme 33
Summary and Conclusions=--=-====mmmmmmm mmo 33
Bibliography ===m= momo mmm meme eee emma 33
III. The Goodenough Draw-A-Man Testem===mmmmmmmecmcem cee ccc meee i 34
Background and Development-==--=m=ammemmme ome m 34
Rationale = - === mmm eee eee 34
Point-Scoring System == mm =m momo mmm eee 34
Standardization~-========meemecc meme cmc — eee ————————— 35
Perspective mem mmm come eee eee 35
Evaluation of Intelligence by Human Figure DrawingS----=-===eeemu-- 36
Effective Range----===-m momma 36
Relation to Artistic Ability-==mm-mmeommm ccc meee 36
Perturbing FactorS--===cmm mmm moomoo eee eee 36
(©1111 TEI ERPS SSS SSRI 36
Sex DifferenceS---===e-=mmecmcemerocc ccc cce———————————————— 38
CONTENTS—Con.
Page
III. The Goodenough Draw-A-Man Test—Con.
Personality Study by Children's Drawings--=-=-=eeemmmammeccmameaaa- 38
Research on the Goodenough Teste==-=meceme comm c mecca 40
Reliability Studies=---====ccmmm mmm 40
Correlations With Other TestS---=--=mememmmme cece ———————— 40
The Harris Revision of the Goodenough TeSt=m=mmmmmmmmmmemeacacanax 46
Comparison of Goodenough and Goodenough-Harris Scores--------- 47
Recommendation =======cecmemecmc ccc cece eee 49
Summary and ConclusionS======cemeeccmee mmc mm cece ccc cece cen mm 49
Bibliography =e===m=me mmm mm mm eee eee ee eee 50
IV. The Thematic Apperception TeSt-meeememmam emcee ccc ccc cee 53
Review of the Literature on the TAT ec cmc c ccc eee 55
OVerVieW mmm meee eee emer meee ———————— 55
Research Demonstrating Developmental Factors ==========ecmeee-- 56
Other Relevant ResearChe--eeccmm ccm ccc ee 57
Prospects for Developing an Objective Scoring Key for the Survey's
TAT mm meme meee eee emma 59
Bibliography == === mmo mm meee eee eee 60
V. Total Psychological Test Battery--=-==ec-memmemmcmcmm ccc eee 63
VI. Cross-Disciplinary Analyses-=---==-cccmm mmm 64
Data Available-----c-crcmamcc cae reece reece, a ———— 65
Analyses Indicated=--===mm emo mame eee 65
Growth IndexeS---ceccmmm mmm ccc me 66
Other Factors Related to Test SCOr€S-=--mmmmmmm ccm m cece em 66
Acknowledgments === mmm am mcm eee meee 66
GloSSary Of ADDTeViatiONS m= cm mam moe sm om mm em mm mm me mm ee 67
IN THIS REPORT the psychological procedures used in the Health Ex-
amination Survey conducted between June 1963 and December 1965 for
children ages 6 through 11 are critically evaluated.
In his analysis, the author combines his own professional competence
with the information obtained in an extensive survey of litevature per-
taining to the four procedures used—the Wechsler Intelligence Scale for
Children, the Wide Range Achievement Test, a modification of the Dyaw-
A-Man Test, and the Thematic Apperception Test. The result is an
evaluation of the instruments which is made in teyms of their validity,
reliability, and applicability for use in the Health Examination Survey.
Finally, the author points out the strengths and weaknesses of each pro-
cedure and makes vecommendations concerning the eventual use of data
obtained in the Survey.
SYMBOLS
Data not available--mmmmmmoocmmmc eee —_—
Category not applicable----eeemmemmeacano
Quantity zero-----==-=r=mmmmmnem—————— =
Quantity more than O but less than 0.05----- 0.0
Figure does not meet standards of
reliability or precision-----=-cececmaca-- %*
EVALUATION OF
PSYCHOLOGICAL MEASURES
USED IN THE HEALTH EXAMINATION SURVEY
OF CHILDREN AGES 6-11
S. B. Sells, Ph.D., Institute of Behavioral Research, Texas Christian University
INTRODUCTION
This report is the outcome of a contract with
the National Center for Health Statistics. The
purpose of the contract was to obtain an objective
critical evaluation of the psychological procedures
chosen for use in the Health Examination Survey
of children ages 6 through 11. The objectives may
be summarized as follows:
1. To prepare a critical review concerning
the development and use of the psycholog-
ical procedures used in Cycle II based on
available literature and unpublished re-
ports (theses, dissertations, and others).
These measures include the Vocabulary
and Block Design subtests of the Wechsler
Intelligence Scale for Children, the Oral
Reading and Arithmetic subtests of the
Wide Range Achievement Test (1963 edi-
tion), the Draw-A-Man Test, and cards
1, 2,5, 8BM, and 16 of the Thematic Ap-
perception Test.
2. To make recommendations concerning the
appropriate inferences which can be made
concerning individual growth and develop-
ment based on scores derived from the
test battery described above.
3. To recommend what research must be
done if the objectives of the Health Ex-
amination Survey are to be accomplished.
4. To make original recommendations con-
cerning the types of cross-disciplinary
analyses that can be performed on data
obtained in the Health Examination Survey
of children.
An extensive survey of the literature was
made, but only the most relevant material was
included in this final report. Literature was con-
sidered relevant if it was either empirical re-
search or a review which included or made ref-
erence to the tests used in the Survey. Empirical
studies which were conducted on samples of U.S.
children ages 6 to 12 years were given preference.
A few important reports which did not meet these
criteria were included because of their method-
ological features or their significant content. Un-
published master's theses and dissertations were
obtained, as extensively as possible, by inter-
library loan. Information was sought and, with
some success, obtained from the publishers and
selected users of the reviewed tests.
One empirical study was carried out under
this contract. Its results are included in the sec-
tion on the Goodenough Draw-A-Man Test. The
study was stimulated by a recent publication by
Dale B. Harris entitled Children's Drawings as
Measures of Intellectual Maturity. This text is
basically a revision of the 1926 book by Florence
L. Goodenough entitled Measurement of Intelli-
gence by Drawings. In his publication, Harris in-
cludes new point-score scales and modernized
norms for scoring drawings of the human figure.
The text of this report is divided into six
sections. Sections I-IV present critical discus-
sions of various tests used by the Health Examina-
tion Survey. The tests are discussed in the follow-
ing order:
I. The Wechsler Intelligence Scale for
Children, Vocabulary and Block Design
subtests
II. The Wide Range Achievement Test, the
Oral Reading and Arithmetic subtests
III. The Goodenough Draw-A-Man Test
IV. The Thematic Apperception Test
Section V briefly discusses some of the issues
which arise when these tests are used as a bat-
tery. Finally, section VI considers the cross-
disciplinary relationships between 'psychologi-
cal" and "nonpsychological' measures.
Each research study or review referred to
in this report is identified by a number placed in
parentheses immediately following the cited ref-
erence. Bibliographies following each of the first
four sections of the report contain all references
cited in the respective sections.
Research studies which were abstracted as
part of the literature-review portion of this con-
tract are also included in the four bibliographies.
The actual abstracts of the reviewed literature
appear as appendixes to the report. For conven-
ience, numbers which identify the abstracts cor-
respond to the number given when the reference
is cited in the text of the report.
These abstracts have been deposited as docu-
ment number 8486 with the Library of Congress.
A copy may be secured by sending the document
number and $28.80 for photoprints or $3.20 for
35mm. microfilm to the American Documenta-
tion Institute Auxiliary Publication Project, Pho-
toduplication Service, Library of Congress, Wash-
ington, D.C., 20541. Advance payment is required.
Checks or money orders should be made payable
to Chief, Photoduplication Service, Library of
Congress.
|. THE WECHSLER INTELLIGENCE SCALE FOR CHILDREN,
THE VOCABULARY AND BLOCK DESIGN SUBTESTS
This section reviews the measurement char-
acteristics of the Vocabulary (Voc.) and Block
Design (BD) subtests of the Wechsler Intelli-
gence Scale for Children (WISC), both as a sepa-
rate unit and as a WISC short form. It also reviews
behavioral correlates of intelligence as reported
in the literature and critically evaluates the appro-
priateness of their use in Cycle II of the Health
Examination Survey.
The selection of the Vocabulary and Block
Design subtests for use as part of the psycho-
logical test battery for Cycle II, in effect, treats
these subtests as a short form of the WISC. In
addition to providing an estimate of the WISC
score, the two subtests may be interpreted sepa-
rately, in combination with other test scores, or
in conjunction with other Survey data. Combina-
tions of these measures with other data obtained
in the Survey are discussed in section II.
DESCRIPTION OF THE WISC
The WISC, which was published in 1949,
extended the well-known Wechsler intelligence
scales for adolescents and adults into the child-
hood range of 5 to 15 years. During the decade
and a half since its publication the WISC has
been the subject of extensive investigation and
has achieved wide school and clinic use where
individual measures of intelligence are desired.
The WISC is patterned after the Wechsler-
Bellevue Intelligence Scale both in the structure
of the subtests and the scales and in the use of
the deviation intelligence quotient. The test con-
sists of 12 subtests—6 Verbal and 6 Perform-
ance—of which 2 (Digit Span of the Verbal Scale
and Mazes of the Performance Scale) are supple-
mentary and not routinely used. The 5 subtests
comprising the Verbal Scale are as follows:
Information, Comprehension, Arithmetic, Simi-
larities, and Vocabulary. The 5 Performance Scale
subtests are Picture Completion, Picture Ar-
rangement, Block Design, Object Assembly, and
Coding (Digit Symbols).
An important innovation in the Wechsler in-
telligence tests is the use of the deviation IQ.
This device supplants the mental age concept and
evaluates the performance of each individual on
the basis of the distribution of scores ofa repre-
sentative sample of his own chronological age. In
the standardization of the WISC, Wechsler kept
the standard deviation of intelligence quotients
constant from year to year, with the result that
"a child's obtained IQ does not vary unless his
actual test performance as compared with his
peers varies."
Raw scores for each subtest are converted
to scaled scores which have a mean of 10 and
standard deviation of 3 for each age level. The
sum of five scaled scores for the Verbal Series
constitutes the Verbal Scale score (VS), and simi-
larly the Performance Scale score (PS) is the sum
of the five Performance Series scaled scores. The
Full Scale score (FS) is the sum of the Verbal
Scale and the Performance Scale. Deviation in-
telligence quotients have been derived by a sim-
ilar conversion process for VS, PS, and FS. The
IQ scales at each age have a mean of 100 and
standard deviation of 15.
The standardization of the WISC is reported
in Wechsler's manual (101), and the standardiza-
tion sample is summarized in terms of age, sex,
geographic representation, urban-rural compo-
sition, and composition by socioeconomic status
(reflected by occupation of fathers). The WISC
was standardized on a total sample of 2,200 cases,
including 100 white boys and 100 white girls at
each age from 5 to 15 years. The proportion of
urban children in the sample was slightly higher
than in comparable United States population sta-
tistics.
Reviewers have commented very favorably
on the WISC as a test of superior quality (102-
104), but, as in all areas of mental measurement,
imperfections have been noted and users have
attempted to employ it for purposes for which it
was not specifically designed. In general, the
deviation IQ has been accepted as an improvement
over the IQ computed by dividing mental age by
chronological age. Except for a slight bias for
urban and smalltown areas—as opposed to rural
areas—for a native white population, the sampling
basis of the WISC has been regarded as good.
Maxwell (106), and also Wilson (139), has
criticized the linearity of the transformation of
raw scores to scaled scores, which may be a
problem when sampling extreme cases and widely
varying regional, ethnic, and linguistic groups.
Hite (112) reported that the WISC lacks items of
middle-range difficulty at all age levels and is too
difficult for young children, particularly those in
the age range Sto 6 years. In the studies reviewed,
WISC Full Scale IQ's have indeed tended to be
lower than comparable Stanford-Binet IQ's. This
is especially true at the lower age levels. McCand-
less (103) noted that girls tend to test lower than
boys on the WISC, but support for this generali-
zation is equivocal in the present review.
In evaluating the utility of the Vocabulary and
Block Design short form of the WISC for the Survey
it is appropriate to consider shortcomings of these
tests in relation to alternatives that might have
been considered—given the constraints of testing
time available in the Survey schedule and the
general problems of a national survey. It may be
noted that although the WISC norms are inappro-
priate in varying degrees for Negro, bilingual
and foreign-born, illiterate, retarded, defective,
rural, and other special groups for which the test
was not designed, there is no adequate measure
that can be applied to all. On the other hand,
because of the extensive research on the WISC,
reported below, it may be possible to estimate
errors in the Vocabulary and Block Design sub-
tests and in the scores derived from them for
various components of the Survey sample. In ad-
dition, relationships of these variables to the
Goodenough Draw-A-Man Test offer further op-
portunities for compensatory analysis.
RESEARCH ON SHORT FORMS
OF THE WISC
Several investigators have combined two or
more subtests in order to develop an efficient
short form of the WISC that correlates well with
the Full Scale and produces comparable means
and standard deviations (175-179, 231, and 235).
Of these, only one article, by Simpson and Bridges
(177), reported favorable results with the combi-
nation of Vocabulary and Block Design. They used
a sample of 120 children over the age range of
65 to 192 months.
Finley and Thompson (231) developed for a
sample of 309 mentally retarded persons a short
form with five subtests, including Block Design,
which correlated 0.89 with FS IQ. Significantly,
their report included correlations of 0.55 and
0.45, respectively, for Voc. and BD with FS IQ,
while the correlation of Voc. and BDwas only 0.1.
Further, estimation of mean FS IQby proration of
the sum of Voc. and BD, as reported by these
authors, approximated the actual FS IQ quite
closely.
Schwartz and Levitt (235) also reported a
short form of the WISC for educable retarded chil-
dren, consisting of six subtests including Voc. and
BD which correlated 0.95 with FS IQ. However,
their best combination of five subtests, whichre-
duced the correlation to 0.92, eliminated Block
Design. Osborne and Allen (239), on the other
hand, cross-validated two triads of WISC subtests
including Voc. and BD, one with Picture Com-
pletion and one with Picture Arrangement, using
samples of 240 (initial) and 50 (validation) retarded
children aged 7 to 14 years, withcorrelations with
FS IQ of 0.88 to 0.90.
At the same time, Hite (112) has confirmed
Wechsler's data (101) indicating that Vocabulary
and Block Design are the most reliable subtests
in the WISC battery. Hagen (109) and Cohen (111)
in the United States and Gault (110) in Australia
have reported that both of these subtests are
highly loaded on the general factor obtained in
factor analysis of the WISC over the entire age
range of 5 to 15 years. Cohen found that Vocabu-
lary was the strongest single measure of the
general factor. Nevertheless, a problem exists in
determining the optimal combination of these sub-
tests to estimate the FS IQ and various parameters
related to the Survey objectives.
Simpson and Bridges (177) estimated the FS
IQ on the basis of a simple sum of the scaled
scores of Voc. and BD and reported a conversion
table for this purpose. Inasmuch as their results
have not been replicated, so far as is known,
cross-validation on a substantial sample should
be considered before this table is adopted. The
importance of this recommendation is illustrated
by some computations based on the Finley and
Thompson data (231). The sum of mean Voc. and
BD scaled scores, 11, multiplied by 5 to prorate
the FS score, gives a WISC Full Scale IQ of 70
(as compared with the actual mean of 68), while
the score of 11 in the Simpson and Bridges tables
yields an FS IQ of 77. Further, in view of Max-
well's criticism of the transformation of raw
scores to scaled scores (106), it may be advisa-
ble also to explore empirically the alternative
of predicting the FS IQ from raw scores.
In reviewing the WISC literature every effort
was made to focus on the Voc. and BD subtests,
and considerable data have been assembled.
Nevertheless, the major portion of the information
referred to in this report is based on the full test,
and assumptions of equivalence of short form
scores to the Full Scale must be made in gener-
alizing the results reported. As indicated above,
this assumption is not entirely inappropriate, but
caution is certainly indicated.
RELIABILITY AND STABILITY
Wechsler's manual (101, p. 13) reported cor-
rected split-half reliability coefficients of 0.77,
0.91, and 0.90, respectively, for Vocabulary, and
0.84, 0.87, and 0.88, respectively, for Block De-
sign for samples of 200 children at each of the
following age levels: 7 1/2, 10 1/2, and 13 1/2
years. The corresponding FS reliabilities were
0.92, 0.95, and 0.94, respectively. As noted above,
these two subtests were the most reliable of all the
WISC subtests. These results for Voc. and BD have
been confirmed by Hite (112) for children in the
age range of 5 to 7 years.
Stability of the WISC on retest has also been
found satisfactory by Gehman and Matyas (113)
over a 4-year period (age 11 yearsatinitial test),
by Reger (115), who tested a sample at ages 10,
11, and 12 years, and by Whatley and Plant (116),
who used a 17-month interval. In these studies,
retest correlations were generally of the order of
the corrected split-half reliabilities. These and
related data are summarized in table 1.
VALIDITY
Despite the fact that Wechsler developed the
WISC in protest against the measurementconcept
of mental age (and the IQ based on it) implicit in
the Stanford-Binet test, and despite the additional
Table 1. Studies reporting reliability coefficients of the WISC
Number Coefficient
Investigator Year Subjects? Age range ore OF
Zz M F Voc. | BD vs PS FS
Throne, Schulman, and 1962 | Retarded=========~ ; 11-0 - 14-11 39 39 -10.79| 0.82] 0.92] 0.89 | 0.95 | Test-retest
Kaspar (227).
Armstrong (175) ==---==-= 1955 | Guidance clinic----{ 5-0 - 14-11| 200 | 100 | 100 | 0.94 | N.R. | N.R. | N.R. | N.R. | Split-half,
5-7 years 20| 20 -10.92| XR. | R.R. | N.B, | N.R, Raman
5-7 years 20 -| 20]0.90] N.R. | N.R, | N.R. | N.R.
7-9 years 20 20 -10.93| N.R. | N.R. | N.R. | N.R.
7-9 years 20 - 20 | 0.91] N.R. | N.R. | N.R. | N.R.
9-11 years | 20| 20 -| 0.87 | N.R. | N.R. | N.R. | N.R,
9-11 years | 20 -| 20| 0.89 N.R. | N.R. | N.R. | N.R.
11-13 years| 20| 20 -] 0.88 | N.R. | N.R. | N.R. | N.R.
11-13 years 20 - 20 | 0.88 | N.R. | N.R. | N.R. | N.R.
13-15 years 20 20 -|1 0.90 | N.R. | N.R. | N.R. | N.R.
13-15 years | 20 -| 20] 0.96 N.R. | N.R.|[ N.R.| N.R.
Sos and Matyas 1956 | Normals=-=========== 11-1 60 29 31| N.R. | N.R.| 0.77] 0.74 | 0.77 | Test-retest”
Caldwell (252)-----=-=-- 1954 | Normals (Negro) =----| 9-7 - 10-6 60 | ---| ---| 0.70| 0.89| 0.82] 0.90 | 0.84 | Split-half
Jones (154) -====-==-mnuun 1962 | Normals (England) -|-=-=-=-========= 240 | 120 | 120 |~=====f=m=mmfmm——- of mmm om ml on me mn] Split-half,
7-6 - 8=5 80| 40| 40(0.70| 0.74| 0.86| 0.80 | 0.89 en
8-6 - 9-5 80| 40 | 40 0.70| 0.68| 0.87 0.81 0.90
9-6 - 10-5 80| 40| 40|0.70| 0.75] 0.90| 0.85| 0.94
Wechsler (101)=--=-====-- 1949 | Normals (WISC ~~ |-===-===-=enmnnrd 600 | 300 | 300 [=====f=mmmmefammmmef mmm ee —---~-| Split-half,
standardization Spearman-
data). Brown
7-6 200| 100| 100 | 0.77 | 0.84 | 0.88| 0.86 | 0.92
10-6 200 | 100| 100 | 0.91| 0.87 | 0.96 | 0.89 | 0.95
13-6 200 | 100| 100 | 0.90 | 0.88 | 0.96 | 0.90 | 0.94
Hite (112)----ecoocnnnn- 1953 | Normals=======mmonslommm meme meme 200 | 117 | 83 |=====ofmmmmmemmm mm emma meee Split-half
5-6 50| 34| 16) 0.71| 0.77] 0.77| 0.81 0.90
6-6 100 | 56| 44) 0.72| 0.84 0.89 0.89] 0.91
7-6 50| 27| 23(0.76|0.89| 0.89] 0.86| 0.94
Hagen (109)° -----=-===-- 1952 | Normals (WISC ~~ |-==-==-n=n-===-1 400 || 200) | 200 [runs msm mdicn nod wmode Split-half,
standardization Spearman-
data). Brown
5 years 200 | 100 | 100 | 0.68 | 0.77 | N.R. | N.R. | N.R.
15 years 200 | 100| 100 | 0.91 0.89 | N.R. | N.R. | N.R.
tDesignations of subjects are always white Americans unless otherwise specified.
Time between testings was 49 months.
Data are from the WISC standardization sample, but were not reported in the WISC manual.
NOTES: All correlation coefficients$ are Pearson Product-Moment unless otherwise specified.
Z —Total population; M—male; F—female;
Scale; FS—Full Scale; N.R.—not reported.
Voc.—Vocabulary; BD—Block Design; VS—Verbal Scale; PS—Performance
Table 2. Studies reporting correlation between the WISC and Stanford-Binet”
Number Correlation
Investigator Year Subjects” Age range
z M F Voc. BD vs PS FS
Nale (216) -=-==-==-==--mceeeun 1951 | Mental defectives---=--=-==== ~8-10 - 15-11 |104 54 50 | N.R, | N.R, | N.R, |N.R. {0,91
Stacey and Levin (228) ------- 1951 | Mental defectives--------=-= 7-2 - 15-11 70 |---| --- | N.R., | N.R. | N.R. |N.R. [0.68
Sloan and Schneider (217)----| 1951 | Mental defectives----------- N.R. 40 20 20 | N.R. |N.R. [0.75 [0.64 | 0.76
OLE CiBE) man mmm meri mt 1950 | Retarded-n=mm=mmmm=mmmmmmn- = N.R. 10 |---| --- | N.R. | N.R. [0.81 [0.49 0.71
Sharp (229) --=-cmemmcmmmeem am 1957 | Slow learners-=-=--========- 8-0 - 16-5 50 |---| =---| N.R. | N.R., | 0.62 | 0.67 | 0.69
Post (198) --===ccemmcmme mee 1952 | Stutterers=--======-eeeeee=— 5-5 - 15-10 30 27 3 | N.R. [N.R. [0.80 [0.37] 0.78
Kent and Davis (207) ----=-----| 1957 | Normals and clinic referrals
(England) ~==-======mmmmumn— 8-12 years [213 |133
NOTA LE mw mis mim wis mim wien 59
Delinquents-==============| comm maa 55 | 48
Psychiatric outpatients---|-==m=mwmmen=— 40 | 26
Muhr (119) ==-mmecmm meee 1952 | Institutional (orphans and
various problems) --=--=-=-==- 5-0 - 6-11 42 | === | === | N.R. | N.R. | 0.46 | 0.52 | 0.62
5 years 21 |---| ==-| N.R. | N.R. | 0.65 | 0.66 | 0.74
6 years 21 |---| =---| N.R. | N.R, | 0.44 | 0.39] 0.49
Davidson (162) ===-=-m-ceeeenx 1954 | Normals-====-=cmmocmmmonoann 14-0 - 14-3 30 |---| =--| N.R. | N.R. | 0.79 | 0.71] 0.83
Kardos (l6l)-==-==--m-mmememae-n 1954 | Normals====-=====-moommmonnn 11-11 - 13-0 [100 | 50 | 50 | N.R. | N.R. | 0.87 | 0.82] 0.89
MAEYaE? (L184) meme mmm LTB HOT mmr em rem mmr mm mse fi im in G01 20] BE [emmanuel mmm foe sim
Grade S5-==-mmmmmmmm——————— 11-1 (mean) 60 29 31| N.R. | N.R. [0.78 | 0.46] 0.73
Grade 9 (retest)------=--- 15-2 (mean) 60 29 81 | N.R. | B.R. [0.76 | 0.64 | 0.77
Raleigh (191) -===-==mmmmeo-mn 1952 | Normals=-=========commmmeonn— 10-8 - 14-9 100 52 48 NR. { N.R. | 0.77 | 0.59 0.80
Schwitzgoebel (189)-=-=====--= 1952 | Normals=-======mmmmmmomaan~ 9-11 - 13-8 |100 | 52 | 48 | N.R. | N.R. [0.78 |0.61| 0.84
Clarke (160) --===meemmmmemmae 1950 | Normals=====-==mcommmemanaaou 9-7 - 12-9 84 39 45 | N.R. | N.R. | 0.83]0.57| 0.79
Frandsen and Higginson (159)-| 1951 | Normals-========-ccuomoaoon- 9-1 - 10-3 54 |---| ---| N.R, [| N.R. | 0.71 | 0.63| 0.80
Reidy (171) ===memmmmmmmmmeeem 1952 | Normals==-=-===mmoooccmmnanan 9-0 - 11-11 | 60 | 30 | 30| N.R. | N.R. | 0.87 0.69] 0.86
Jones (154) ~=-==eemmmmmanaaan 1962 | Normals (England) ------=---- 8-10 years |240 |120 | 120 | N.R. | N.R. | 0.84] 0.59 0.81
8 years 40 40 -| N.R. | N.R. | 0.77 | 0.48] 0.72
8 years 40 -| 40 | N.R. | N.R. | 0.79 |0.46| 0.76
z 8 years 80 40 40 | N.R. | N.R. | 0.78 {0.47 | 0.74
9 years 40 | 40 -| N.R. | N.R. | 0.89 | 0.65] 0.90
9 years 40 -| 40| N.R. | N.R. | 0.78] 0.58] 0.75
Zz 9 years 80 40 40 | N.R. | N.R. | 0.84|0.61| 0.84
10 years 40 | 40 -| N.R. | N.R. | 0.86 | 0.64] 0.83
10 years 40 -| 40| N.R. | NR. | 0.90 | 0.67 | 0.86
2 10 years 80 | 40 | 40| N.R. | N.R.| 0.88] 0.66] 0.85
Arnold and Wagner (158) ------ 1955 | Normalg§====m=memmmmeememmaam = 8-9 years 50 | === | ---| N.R. | N.R. | 0.85] 0.75| 0.88
Wagner (156) -=-=-=--c=comeunn 1951 | Normals-====omemomommmaaaoan 8-9 years 50 |---| ---| N.R. [ N.R. | 0.77 | 0.87 | 0.81
Scott (155) -=====m=mmmmmmanan 1950 | Normals-=-==-=mcmcmmmmoanonn 7-7 - 11-1 30 |---| ---|0.63|0.60| 0.86 | 0.86] 0.92
Beeman (153) -=<===c-memncmcoan 1960 | Normals====--==c-momomemaann 7-2 - 11-9 36 |---| === | N.R. | N.R. | 0.64 | 0.42] 0.67
Harlow, Price, Tatham, and 1957 | Normals§==-=m-mmm comme | mmmmcmm meee o 60 | === | === |mmmmmm mmm meee meen
Dovdidson (148). 6-6 - 6-7 30 |---| ---| N.R. | N.R. | 0.64 | 0.61] 0.64
10-0 - 10-1 30 |---|] --~-| N.R. | N.R. [ 0.88 0.52] 0.83
Cohen and Collier (124)------ 1952 | Normals-=-=-=comcmcmcmmananq 6-5 - 8-9 51 |---| ---| N.R. | N.R. | 0.82 0.80] 0.85
Tatham (152) =-=-=-=-ememmaomn 1952 | Normals========momommmmeeann 6-5 - 6-7 30 |---| =--| N.R. [ N.R. | 0.64 | 0.51| 0.64
Mussen, Dean, and Rosenberg | 1952| Normals=-=--===mmm-memcmmenan 6-0 - 13-1 39 |---| ---| N.R. | N.R. | 0.83] 0.72] 0.85
Qalz),
See footnotes at end of table.
Table 2. Studies reporting correlation between the WISC and Stanford-Binet®—Con.
Number Correlation
Investigator Year Subjects’ Age range
z M F Voc. BD VS PS FS
Krugman, Justman, Wright-
stone, and Krugman (144) ---= | 1951 | NOTMALE = mmm mmm mmm oom cm om mm ct mf on 222 | === | mmm |e mp mmm mf mee meee
6 years 38 |---| ---| N.R. |N.R. [0.73 |0.74 [0.82
7 years 43 | === | === | N,R. |N.R. | 0.64 |0.49 | 0.73
8 years 44 | === |---| N.R, | N.R. {0.78 |0.57 | 0.82
9 years 31 |=---|=---| N.R., | N.R., | 0.83 ]0.79 0.87
10 years 29 | === | === | N.R. | N,R., [0.88 | 0.54 | 0.86
11 years 37 |---| ---| N.R. [N.R. | 0.69 [0.53 0.76
Pastovic® (121) ====memmmmmemm 1951 | NOTMAlS === mmm moon mmm mmm mm mmm mm mm mmm me oo] 100 | === | === |m=mmmmf mmm mem meee ene
- 50 | === | === | N.R. | N.R. [0.63 0.57 [0.71
- 50 | === | =--| N.R. | N.R. {0.82 (0.71 |0.88
Winpenny (105) ---=--=====c-z= 1951 | Normals=====mmmmommm mm mmm ee meme mmm =] 185 | === | === |-===== fromm mem me mee
Kindergarten--------==-w-=- 5-4 - 5-8 50 | === | =--| N.R. | N.R. | N.R. | N.R. [0.71
Grade 2--=-~cmmmmmm mmm 7-4 - 7-8 50 | === | ---| N.R, | NR. | N,R, [ N,R., | 0.88
‘Grade S5----=--=---mmmm——— 9-7 - 12-9 85 | === | ---| N.R, | N.R. | N.R. | N.R. [0.79
Dunsdon and Roberts (170) =---- | 1955 | Normals (England) -----=-=-===- 5-0 = 14-11|1,947| 980 | 967 |-=====|-mmmmmmmmm mmf mm mmm mm mm
980 | 980 -| N.R. | N.R. | N.R. | N.R. | 0.82
967 -| 967 | N.R. | NR, | N.R. | N.R. | 0.77
loruszak (146) ---=---====-=m= 1954 | Normals-=====mmmmommmm mm 5-14 years 80 | 40 | 40| N.R. | N.R. | 0.87 |0.78 | 0.90
5-14 years 40 | 40 -] N.R. | N.R., | 9.89 | 0,720.93
5-14 years 40 -] 40({ NR, | N.R. 10,86 [0.71] 0.93
olland (149) ----=-==--=-ocm- 1953 | Normals-==~-===-=mc=mmmmmmmm 5-13 years 52 |---| ---| N.,R, [| N.,R., [0.88 |0.73|0.87
eider, Noller, and Schraumm
(150) ==mmmmm mm mmm meme 1951 | Normals======== == == m=m—————— 5-0 - 11-11 | 106 | --- | ---| N.,R. | N.R. | 0.89 | 0.77 | 0.89
5-0 - 7-11 44 | === | ---| N.R. | N.R. [0.82 (0.79 0.90
8-0 - 11-11 62 | --- | ---| N.R. [N.R. [0.92 [0.78] 0.90
wureth, Muhr, and Weisgerber
(118) ~==mmmm mmm mmm meme 1952 | Normals========mmcm momen ———— 5-6 years 100 | ---| ---| 0.51 {0.61 |0.75|0.71 0.81
5 years 50 | --- |---| 0.42|0.65|0.79|0.73|0.84
6 years 50 | === |---| 0.65 |0.55| 0.71 {0.71 | 0.79
Rottersman (151) -----c-emenen 1950 | Normals=m-mmmmmmmm mmm 6 years 50] 21; 29] NuR. {N.R. [0.71 | 0.49 | 0.7)
[riggs and Cartee (148)------ 1953 | Normals (S-B, Form M)------- 5 years 46 | =-- | === | N,R., | N.,R., | 0.58 | 0.48 | 0.61
Orr (188) =---m-ommmmmm meee 1950 | Normals===m=mm mmm mmm m mmm mele meme mmm 40 | mmm | mmm [mmm meee ee mee meee
Grade l---me-memmemmm mee] N.R. 15| ---| =--| N.R. | N.R. [ 0.63 | 0.62 | 0.77
Grade femmmmmmmeemmeanen=- N.R. 14 | === | === | N.R. | N.R. [0.64 [0.65 (0.67
Grade Je--mmmeeemmmonansned N.R. 11 | === | === | N.R. | N.R. [0.88 [0.66 [0.79
Stanley (157) --=-==-=cmmmcaz= 1955 | Normals (from Frandsen and
Higginson, 159, above)----- N.R. 50 | === | ===] N.R. | N.R. | N.R. | N.R. | 0.71
"
Schachter and Apgar (147)---- | 1958 | Normals, mixed sample--=----- N.R. 113 61 52| N.R. | N.R. [0.64 [0.48] 0.67
White-==mcmmmmmm mm mem mm meee meee 39 | === | ===
Negro-=======meommmeneon—— 66 | === | ---
Puerto Rican-~--====mnemmm- 6 |---| ---
Oriental---=-===c=-mm=m=n-q 2] ===] ---
Estes, Curtin, DeBurger, and
Denny (125) -==--=-==cemcmeean 1961 | Normals, Grades 1-8--=-----n--m-mcmnnmmnn 82 | 47 | 35|==--mmermmmmdemm meme meme
Form Le-==-=smmmmeemmmm meme N.R. 82 | 47 35| N.R. | N.R. | N.R., | N.R. | 0.80
Form L-M---mmmmccmmemmmnmn N.R. 82| 47 | 35| N.R. |N.R. | N.R. |N.R. | 0.74
tUnless otherwise noted, Stanford-Binet, Form L.
Designation of subjects are always white Americans unless otherwise specified.
oRank difference correlation. Also reported by Gehman and Matyas in 1956.
Also reported by Pastovic and Guthrie in 1951. fIntraclass correlation.
EAverage time between S-B and WISC administration was 50.8 months.
NOTES: All correlation coefficients are Pearson Product-Moment unless otherwise specified.
I —Total population; M—male; F—female; Voc.—Vocabulary; BD—Block Design; VS—Verbal Scale; PS—Performance
Scale; FS—Full Scale; N.R.~—not reported.
Table 3.
Studies reporting correlation between
the WISC and other measures
) Number Correlation
Investigator Year Tose ov oXjlevion Subjects” Age range
z M F Voc. BD vs PS FS
Smith (126) ------- 1961 | Full Range Picture | Normals-----=-=-==-=-- 6-11 - 8-10 | 100 51| 49] N.R. | N.R. | 0.63| 0.42| 0,60
Vocabulary Test.
McBrearty (123)---| 1951 | Arthur Point Scale | Normals------------| 10-3 - 12-11) 52 | 22| 30| N.R.| N.R.| N.,R.| 0.65] 0.71
of Performance
Tests.
b
Cohen and Collier | 1952 | Arthur Point Scale | Normals----=-==-=-==-- 6-5 - 8-9 49 | === | ---| N.R. | N.R.| 0,77 0.81 0.80
(124). of Performance
Tests.
Winpenny (105)=----| 1951 | Arthur Point Scale | Normals----=-------- 9-7 - 12-9 | 85| ==-| --=| N.R.| N.R.| N.R.| N.R,| 0.70
of Performance
Tests.
Armstrong and Hauck | 1960 | Visual Motor Ge=- Nonorganic child 6-12 years 98 49 49 N.R.| N.R.|-0.22|-0.07(-0.23
(130). stalt Test. guidance popu- F
lation.
Winpenny (105) ----| 1951 | Bernreuter-Winpenny= Normals--=---=-=-==-=je======c=-cox|men=d==--q rm [mmm sn of ne wr of og af ce 4--m--
Kindergarten----- 5-4 - 5-8 50 | =--| ---| N.R.| N.R.| N,R.| N.R. | 0.92
Grade 2---------- 7-4 - 7-8 50 | =--| ---| N.R.| N.R.| N.R.| N.R.|[ 0.92
Grade 5-----==-=- 9-7 - 12-9 85| ---| =---| N.R.| N.R.| N,R.| N.R.| 0.97
Cooper (242)=-=---- 1958 | california Achieve- | Bilinguals N.R 51| --~| ===; N.R.| N.R.[ 0.80] 0.54] 0.77
ment Tests. (Guam), Grade 5.
Altus (122) -=====-= 1952 | California Test of | Normals, junior N.R. 55| =--| =---| N.R.| N.R.| N.R.,| N.R.| 0.81
Mental Maturity. high.
Altus (134) -====-== 1955 | California Test of | Retarded, elemen- N.R. 100 71| 29| ~=mm=tfmmmmmdmmmmel——- 4mm——
Mental Maturity tary school.
Language ---| ===| =--| N.R.| N.R.| 0.71] 0.57} 0.70
Non- language === | ===| ==-| N.R.| N.R.| 0.65| 0.67| 0.68
Total------==--=~4 -==| ===| ===| N.R.| N.R.| 0.76 0.68] 0.77
Cooper (242) ------ 1958 | California Test of | Bilinguals N.R. 51| ---| ---| N.R.| N.R.| 0.66| 0.68| 0.74
Mental Maturity. (Guam) , Grade 5.
Schwitzgoebel 1952°| California Test of | Normals--===m=m=m=m 9-11 - 13-8 | 100 | 52| 48| N.R.| N.R.| 0.55] 0.59 0.75
(189). Mental Maturity.
Barratt (138)----- 1956 | Columbia Mental NOEL Emm wns n 9-2 - 10-1 | 60| 26] 34/°0.45/°0.47%.56|%.48 0.61
Maturity Scale.
Warren and Collier | 1960 | Columbia Mental Retarded--==--=---- 9-30 years | 49 | ---| ---| N,R.| N.R.| N.R,| N.R, | 0.68
(224). Maturity Scale.
Thompson (193) ----| 1961 | Gates Advanced Normals---=-=-=====--| 6-4 - 8-0 105 62| 43] ---=- domo mdm mmm meme
Primary Reading
Tests.
Word Recognition=ee===mmemmmme cee. —————————————— === | ===| ===] N.R.| N.R.[ 0.58] 0.42 0.55
Paragraph Reading=---=-===-=-coemmmmaloe ccm emo] === | ===| ===] N.R.| N.R.|[ 0.55] 0.46 | 0.56
Composite Reading=-=======-c-ececmeeodecm ccm canna === | ===| ===] N.R.| N.R.|[ 0.57] 0.47 0.58
Warren and Collier | 1960 | Goodenough Intelli- | Retarded--=-======~- 9-30 years | 49 | ---| ---| N.R.| N.R.| N.R.| N.R.| 0.43
(224). gence Test.
Armstrong and 1960 | Goodenough Intelli- | Child guidance 6-12 years | 98 | 49 49) N.R.| N.R.| 0.37| 0.51 0.49
Hauck (130). gence Test. clinic,
Rottersman (151)--| 1950 | Goodenough Intelli- | Normals===========- 6 years 50 21) 29 N.R.| N.R.| 0.38] 0.43| 0.47
gence Test.
Kimbrell (136)----| 1960 | Grade placement=----- Mental defec- 10.5 - 15.8 | 62 | ---| ---1 N.R.| N.R.| N.R.| N.R. | 0.40
tives.
Smith (126) ------- 1961 | Wide Range Normalg==-========= 6-11 - 8-10 | 100 51| 49] N.R.| N.R.| 0.55] 0.47] 0.61
Achievement Test.
Delp (135)--=--=---- 1953 | Kent EGY Test=-=---=-- Normals===========-| 6-15 years 74 | ---| ---| N.,R.| N.R.| 0.60 0.55] 0.62
Cooper (242) ------ 1958 | Leiter Interna- Bilinguals N.R. 51 =--| =--| N.R.| N.R.| 0.73] 0.78] 0.83
tional Perform- (Guam) , Grade 5.
ance Scale.
Sharp (229) ------- 1957 | Leiter Interna- Slow learners--=---- 8-0 - 16-5| 50| ---| ---| N.R.| N.R.| 0.78] 0.80 0.83
tional Perform-
ance Scale.
See footnotes at end of table.
Table 3.
Studies reporting correlation between the WISC and other measures=—Con.
Number Correlation
Investigator Year Test oF Criterion Subjects” Age range
bo M F Voc. BD VS PS FS
Alper (221)------- 1958 | Leiter Interna- Mental defec- 7-2 - 17-3 30 15 15 | N.R. N.R, {0.4070,79 | 0,77
tional Perform- tives.
ance Scale.
Dunn and Brooks 1960 | Peabody Picture Retarded-=-=mmmm- N.R. 56 | --= |--- N.R. N.R. | N.R. | N.R. 0.61
2 . Vocabulary Test.
Kimbrell (136)---- | 1960 | Peabody Picture Mental defec- 10.5 - 15.8 62 | --- | --= N.R. N.R N.R N.R. 0.30
Vocabulary Test. tives.
Himelstein and 1962 | Peabody Picture Emotionally 6-2 - 14-8 48 | --- | --- | N.R. N.R. | 0.64] 0.52 ( 0.63
Herndon (137). Vocabulary Test. | disturbed.
McBrearty (123)--- | 1951 | Progressive Normals-=-======xq 10-3 - 12-11 52 22 30 N.R. N.R. [ 0.78 | 0.50 0.81
Achievement.
Tests.
Dunsdon and 1955 | Mill Hill Vocabu=- | Normals 5-0 - 14-11 [1947 | 980 | 967 |------ ft nf nf
Roberts (170). lary Scale. (England).
POI freemen A= irre rrr Ean aT 980 | 980 - | 0.83 | N.R. | N.R. | N.R. | N.R.
FOrm A--==mmmmod mmm eee meee meme mee 967 - {967 (0.81 | N.R. | N.R. | N.R, | N.R.
FOI Boss mmr Simm mind hi ——— 980 | 980 - | 0.85 | N.R. | N.R. | N.R. | N.R.
Form B-=--==-cmedmme mmm mcm memo mmm mmm mmm ema 967 - [967 | 0.82 | N.R. | N.R, | N.R. | N.R.
Brown, Hakes, and | 1959 | Raven Progressive | Retarded--=====-- N.R. N.R. | =-- [=== | N.R. | N.R. | N.R. | N.R. | 0.39-
Malpass (233). Matrices, 0.49
Malpass, Brown, 1960 | Raven Progressive | Retarded--------- 11-8 (mean) 104 | --- | === | N.R. N.R. | N.R. | N.R. | 90.51
and Hakes (140). Matrices.
C ic
Barratt (138)----- 1956 | Raven Progressive | Normals -----munno 9-2 - 10-1 60| 26 | 3¢ [70.56 [0.60 [0.69 0.70 | 0.75
Matrices.
Wilson (139)------ 1952 | Raven Progressive | British Columbia | 5-6 - 13-0 90 | --- | --- | N.R. N.R. | N.R. | N.R. | o=----
Matrices. Hospitalized |--==---c-een-a- JO [www [mm [mmm ede me dm (0.75
Americans 0, 27
Indians.
Hospitalized [-==-===m=ww=e- 30 | mem | [mmf nd i mmm] °0.83
whites, $0.42
High socioeco= |-====-cococnnn 30 | === | === [mmmmmedemmmmee meena °5.51
nomic whites. 0.49
Martin and Wiech- | 1954 | Coloured Progres- | Normals=-=-======- 9-0 - 10-0 100| 60 | 40 | 0.73 | 0.74|0.84(0.83| 0.91
ers (142), sive Matrices.
Stacey and Carle- 1955 | Coloured Progres-| Mental defec- 7-5 - 15-9 150 | =-- | --- | ,N.R, | N.R.,|0.54)|0.52] 0.55
ton (141). sive Matrices. tives. 0.36 [“0.41[0.51|°0.55| 0.62
Hite (112)======== 1953 | SRA Primary Mental| Normals-----=-=-=--| 5-6 years 50 | 34 | 16 |eecemodemmme meen eee mee
Abilities Test.
Verbal 0.38 | N.R. | N.R. N.R.
Perception-===-dmecomoc mmm mmm memo bm oo mmm mm——— 0.30 0.83 | N.R. | N.R. N.R.
Quantitative--=qeeecoc mcm ee meee mm homme mmm L me 0.35 0.53 | N.R. | N.R. N.R.
- 0.68 | N.R. | N.R N.R.
Stempel (143)----- 1953 | SRA Primary Mental| Superior 8-5 - 10-4 50 | === | === |==mmmmq-m--- EE
Abilities. intelligence.
SpPaCe======m mmm] mmm mem mm me mmm mmm hmmm mmm mmm N.R. N.R. [ 0.45] 0.34 | N.R.
NUmbET === = === =m mm mmm mm mmm mmm mle mmm mmf eeee Lee N.R. | N.R. [0.15] 0.38 | N.R.
RBG OTLLING rm wim we mm mr ti mm mm a sr oa N.R. N.R. | 0.63] 0.55 | N.R.
Perception=====-qemmmmeece meme meee mmm mbm mee mm mmm ee N.R. N.R. [0.18] 0.42 | N.R.
Verbal---==mn==+ i N.R. | N.R. | 0.68 0.40 | N.R.
IQ-mm=m mm mmm mm dem mmm mmm mmf eee he N.R. | N.R. | N.R. | N,R. | 0.68
Jones (154)-----=- | 1962 | Teacher ratings--- Normals 7-6 - 10-5 240 | 120 | 120 | N.R. N.R. ( 0.73] 0.57} 0.74
(England) . 8 years 80| 40 | 40 |N.R. |N.R. [0.70] 0.48 | 0.70
9 years 80 | 40 | 40 | N.R. | N.R. |0.71]0.59| 0.73
10 years 80 | 40 | 40 | N.R. N.R. | 0.76 | 0.62 | 0.76
Stark (163) ------- 1954 | The Drawing=- Normals=========~ 8-4 - 9-10 50 30 20 0.72 | 0.49 | N.R. | N.R. 0,79
Completion Test.
Bacon (127) ------- 1954 | Wechsler-Bellevue | Normals--===-====~ 11-9 - 12-3 32 | 16 16 [0.84 |0.65|0.86| 0.65| 0.77
Intelligence
, Scale, Form I.
Delattre and Cole | 1952 | Wechsler-Bellevue | Normals----=-===-- 10-5 - 15-7 50 { --- |-=-- | 0.55 [0.49 0.86 0.82 0.87
Q . Intelligence
Scale, Form I.
"Designation of subjects are always white Americans unless otherwise specified. YETA coefficient.
°WISC scaled scores. Partial correlations with chronological age removed.
®Raw scores. fScaled scores.
NOTES: All correlation coefficients are Pearson Product-Moment unless otherwise specified.
T —Total population; M—male; F—female; Voc.-—Vocabulary; BD=-Block Design; VS-—Verbal Scale; PS—Performance
Scale; FS—Full Scale; N.R,—not reported.
fact that the validity of the WISC must be judged
principally in relation to the logic of Wechsler's
approach and the adequacy of his developmentand
standardization of the test, a surprisingly large
number of papers dealing with the validity of the
WISC have used the Stanford-Binet as a criterion.
As may be expected, unless one assumes naively
that the theoretical objections to mental age scores
involve gross discrepancies, which they usually
do not, the correlations between WISC Full Scale
IQ's and Stanford-Binet IQ's are generally high,
in about the same range as the respective reli-
abilities of these tests. (See table 2.) There seems
to be little doubt that both the WISC and the Stan-
ford-Binet merit their reputations as outstanding
individual intelligence tests.
There are, however, differences between the
WISC and Stanford-Binet in score levels. As noted
above, the WISC IQ's tend to be substantially lower
than the corresponding Stanford-Binet IQ's for the
very young and for the gifted (153 and 215), as
well as for many samples reported across the
normal range (119, 120, 124 147, 148, 151, 154,
156, 159, and 161). This problem is discussed
below.
The WISC has been correlated with a wide
range of verbal and performance tests that pur-
port to measure various aspects of intelligence.
Correlations with the Wechsler-Bellevue, Form
I, have been reported by Bacon (127) for a sample
of 36 children in the age range 11 years 9 months
to 12 years 3 months and by Delattre (128) for 50
students aged 10-5 to 15-7. Their results for FS
were 0.77 and 0.87, respectively, while both corre-
lated 0.86 for VS. For PS their respective corre-
lations were 0.65 and 0.82; for Voc., 0.84 and 0.55.
Finally, for BD their results were 0.65 and 0.49.
Variations of the magnitude indicated mustbe ex-
pected for small samples from different settings.
Dunsdon and Roberts (170) administered four
vocabulary tests including the WISC to 2,000
British children and obtained intercorrelations
exceeding 0.8 for both sexes.
Table 3 summarizes reported correlation
coefficients between WISC scores and other tests
of intelligence, mental maturity, and achievement
in school subjects, teacher ratings, and related
criteria. For the FS IQ these are generally quite
high and positive, considering sample size and
variation in sample composition and setting. In
10
view of these variations, the specific coefficients
are of less interest than the general trend, which
supports the validity of the WISC as a general
measure of what Wechsler labels ''the total effec-
tive intelligence of the individual" (101, pp. 4 and
5)
For the purposes of a national survey, the
robustness of the validity data over wide sample
fluctuations is very encouraging, as is revealed
by its use on samples of varying geographic and
ethnic characteristics, of varying abilities ranging
from defective to gifted samples, and by its use
with special groups such as retarded readers
(133), bilinguals (242), stutterers (198), and low
school achievers (190).
FACTORS AFFECTING WISC SCORES
Both qualitative and quantitative variations in
WISC scores have been reported by various inves-
tigators in relation to a wide range of factors.
Those discussed in this section are considered
relevant to the objectives and problems of the
Survey. Where feasible and appropriate, implica-
tions and recommendations are noted.
Anxiety
Hafner, Pollie, and Wapner (132)and Carrier,
Orton, and Malpass (205) have both reported nega-
tive correlations between the WISC FS and the
Children's Manifest Anxiety Scale (CMAS), indi-
cating that anxiety, as measured by this scale,
tends to interfere with effective WISC perform-
ance, Hafner and others found a significant corre-
lation of -0.31 between CMAS and BD. The Carrier
study observed the relationship (-0.54) over a
range of ability but not among the exceptionally
bright. It appears to be most marked in the sub-
normal; Feldhusen and Klausmeier (167) found the
following mean differences in CMAS scores for
three groups at different IQ levels: low IQ, 20.2;
average, 14.8; and high, 12. These results are not
entirely consistent with those of Burns (206), how-
ever, who found similar correlations between
WISC Vocabulary and California Personality Test
measures of Social Adjustment (0.55) and Personal
Adjustment (0.45) but obtained nonsignificant co-
efficients of 0.12 and 0.10, respectively, for Block
Design.
Although anxiety and adjustment may be re-
garded generally as factors that tend to depress
WISC (Voc. and BD) scores for some segments of
the child population on some occasions, it would
seem unwise to attempt any correction for these
factors. Presumably, some valid evidence on ad-
justment will become available from the Thematic
Apperception Test (TAT), the School Information
Form, and the extensive background and medical
information being collected in the Health Exam-
ination Survey. However, the relationships are not
clearly enough defined for fine quantitative manip-
ulation. One alternative is to regard fluctuations
on these variables as a source of error which
may possibly be crudely estimated later but is
probably well randomized in the total sample.
Another is to accept the error pragmatically with
the attitude that depressed scores resulting from
affective factors probably reflect depressed a-
bility of the individual to function effectively.
Sex Differences
The statement by McCandless (103), cited
earlier, that boys do better on the WISC than girls,
is not supported by the present review. Data on
sex differences are presented in nine studies
(130, 146, 154, 169, 175, 192, 194, 196, and 232),
and only one (130) reports a significant mean dif-
ference favoring boys on FS IQ. However, none of
them employed a sampling design encouraging
confidence in the group comparisons.
Some correlational differences mentioned by
several authors do appear interesting: The cor-
relation of WISC Full Scale IQ with Bender-Gestalt
was negative and higher for boys (-0.34 p<0.01)
than for girls (-0.09 ns) (130). The correlation of
WISC Full Scale IQ with the Ammons Picture Vo-
cabulary Test was 0.71 for boys and 0.45 for girls
(169). The correlations of WISC FS and VS IQ's
with the spelling subtest of the Iowa Test of Basic
Skills were higher for boys than for girls. No data
were reported in which sex differences favored
girls. The absence of sex differences in studies of
normal American (146)and English (154) children,
deaf American (194) and English (196) children,
and retarded American children (232) suggests
considerable generality for the negative con-
clusion.
Qualitative Differences by Level
Gallagher and Lucito (164) found a negative
rank order between the mean scores of gifted
and retarded children on the WISC. The three
highest and three lowest subtests for five com-
parison groups in their study are shown below.
These results agree with others, to be discussed
below, which indicate that Block Design scores
are least affected by population variations, in
contrast with Vocabulary, which is the highest
test of the gifted groups and the lowest of the re-
tarded.
Baroff (223) described a WISC profile for a
sample of 53 low-IQ patients with a mean FS IQ
of 63; Block Design was highest, and Vocabulary
ranked 11 out of 12. Although Fisher (225) failed
to verify the Baroff patterning, Baroff's results
are in agreement with those of Gallagher and
Lucito with respect to Vocabulary. Matthews (230)
found that nonachievers in school tend to be higher
on Block Design than on Vocabulary. Levinson
(243 and 244), working with Jewish children in
New York, and Altus (240), with Mexican and
Anglo-American children in California, both found
that monolinguals exceeded bilinguals on Vocabu-
lary, but that the differences on Block Design
Grou Number of s
ER LOT subjects (N) Three highest subtests Three lowest subtests
1 Gifted====== 50 Similarities, Information, Picture Completion, Picture
Vocabulary Arrangement, Digit Span
2 Gifted-==-== 43 Vocabulary, Information, Picture Completion, Picture
Similarities Arrangement, Digit Span
3 Average--=-=-=- 565 Arithmetic, Digit Symbol, Block Design, Information,
Picture Arrangement Similarities
4 Retarded---=- 150 Object Assembly, Picture Information, Vocabulary,
Completion, Digit Span Arithmetic
5 Retarded---- 52 Object Assembly, Digit Vocabulary, Information,
Span, Picture Completion
Picture Arrangement
11
were not significant. Burks and Bruce (186)
found that poor readers score significantly high
on Block Design, and Kallos, Grabow, and Guarino
(180) obtained a significant difference between
Block Design and Vocabulary, favoring Block De-
sign, for a sample of poor readers.
Results such as these suggest the possibility
of investigating a Voc.-BD ratio which may prove
to have some diagnostic use, in conjunction with
the Goodenough Draw-A-Man Test, the Wide Range
Achievement Test (WRAT), the Thematic Apper-
ception Test, and school information, in evaluating
various categories of subnormal and deviantper-
formance such as those enumerated above.
On the Vocabulary subtest, Stacey and Port-
noy (168) also observed qualitative differences
between a borderline group (IQ range 66-79) and
a defective group (IQ range 50-65) in conceptual
approaches to word definition. Defectives ex-
ceeded borderlines significantly in the use of
functional definitions, while the borderlines were
significantly higher in use of descriptive defini-
tions. Neither group used abstract concepts to
more than a slight degree.
Carleton and Stacey (219) made an item anal-
ysis of the Vocabulary and Block Design subtests
with a sample of 366 low-IQ children (mean FS
IQ 67) and found four Voc. items and two BD items
displaced. In view of the greater dependence on
these two subtests in a short form than is usually
required with the full test, consideration might
well be given by the Survey staff to a repetition
of this study for a substantial sample.
Maxwell (211) observed that the WISC vari-
ances for a sample of neurotic children were
greater than for a normal sample, which led him
to criticize the transformations of raw scores to
scaled scores. This point was also made by Wilson
(139), whose work was with Indian children. Walker
(209), in a highly creative study, enumerated a
lengthy list of qualitative variations of WISC re-
sponses that appear to have promise for person-
ality diagnosis. Walker's study merits further
followup.
Developmental Factors
Klausmeier and Check (166) investigated a
number of developmental correlates of the WISC.
They reported that children with high intelligence
12
quotients grow taller than those in the average or
low range, but that weight is not significantly re-
lated to sex or IQ. On strength of grip, they
found low-IQ children weaker than those with
average or high IQ's, the average group weaker
than the high-IQ group, and girls weaker than
boys. Girls were found to have more permanent
teeth and a higher carpal age than boys of the
same age. No sex differences or IQ differences
were found in relation to emotional adjustment.
Girls also exceeded boys on achievement in
relation to capacity, integrvation of self concept,
and estimation of own ability. These observations
are of interest in suggesting cross-disciplinary
analysis of psychological and biomedical data.
SPECIAL GROUPS
The following discussion includes researchon
the WISC with reference to a number of special
groups—those involving various disabilities, af-
flictions, deviations, social and ethnic character-
istics, and other definitive attributes commonly
recognized in the literature—for which at least
some information has been found. Each of these
groups involves some variables which affect
WISC scores, and this review might properly
have been included in the preceding section.
However, most of the research referred to here
was organized in terms of samples of persons in
various categories rather than by underlying
variables. As a result, the organization of the
discussion follows the organization of the material
reviewed.
Reading Disability
As noted earlier, Kallos and others (180)
found that Block Design scores were significantly
higher than Vocabulary scores for a reading dis-
ability sample of 37 boys aged 9to 14 years whose
IQ's ranged from 90 to 109. The elevation of BD
was supported by Burks and Bruce (186). Altus
(181), Sheldon and Garton (182), and Karlsen (185)
published WISC profiles for retarded readers,
based on small but similar groups. No consistent
pattern is unequivocally shown. Robeck (183) used
a more sophisticated method to study subtest
patterning of problem readers on the WISC, repre-
senting subtest scores as deviations of scaled
scores from the respective age-group means. By
this method problem readers were significantly
higher than the norms on both Block Design and
Vocabulary (as well as on Comprehension, Simi-
larities, and Picture Arrangement) and lower on
Digit Span, Arithmetic, Information, and Coding.
Rogge (187) reported no significant differences
on WISC VS, PS, or FS IQ's between a sample of
132 delinquents 14 to 16 years ofage and a control
sample of good readers.
Correlations of WISC scales with reading
tests are generally moderate, in the range of 0.3
to 0.5 (171, 172, and 173). On the other hand, ap-
proaches involving score patterns or profiles,
such as discussed above, and qualitative analyses
of responses, exemplified by the analyses of the
understanding of the concept of opposite, by Ro-
binowitz (108) and by Flamand (172), appear to offer
greater promise than linear regression methods
for the evaluation of reading disability cases. The
latter approach does not appear feasible with only
Voc. and BD in the battery, but the pattern ap-
proach, as discussed above, merits consideration.
In the Survey battery the WRAT is, of course,
most directly related to estimation of reading dis-
ability, but a Voc.-BD ratio may be a useful sup-
plement.
Auditory Disability
Murphy (196) administered the WISC to an
equally divided sample of 300 deaf boys and girls
in English schools for the deaf. Deaf children did
not differ significantly from normal children on
the Performance Scale in this study, and there was
no meaningful relation between hearing loss and
PS. It is of interest, though, that Block Design
correlated 0.71 with PS in this sample. In addition,
teacher ratings of emotional adjustment corre-
lated 0.76 with PS, suggesting that here also, as
in the samples evaluated in relation to the Chil-
dren's Manifest Anxiety Scale, anxiety may be
a deterrent to effective performance.
Graham and Shapiro (195) compared the per-
formance of the deaf and normal children on the
WISC with standard and pantomime instructions.
Both groups did equally well on PS with pantomime
instructions, but the normals were superior with
standard instructions. Mean scores on BD were
‘approximately equal under all three conditions.
For deaf children, ‘then, the pantomime instruc-
tions are appropriate on BD.
Glowatsky (194) found that WISC Performance
Scale IQ's were comparable with Draw-A-Man
Test IQ's for a sample of 24 deaf and hard-of-
hearing children in Santa Fe. PS scores were sub-
stantially higher than VS scores in this group, but
bilingualism (noted in 13 cases) was not a factor.
Thompson gave Wepman's Auditory Discrim-
ination Test, the WISC, and other tests of reading
and auditory acuity to 105 children, including good
and poor readers. She found that a significantand
substantial proportion of first graders (71 percent)
had inadequate auditory discrimination, but that
this number was reduced to 24 percent by the
second grade. Auditory Discrimination scores
correlated more highly with reading (0.59.to 0.66)
than with WISC IQ's (0.55 t00.58). The correlation
of Auditory Discrimination with WISC Verbal
Scale IQ, the highest correlation reported, was
0.61.
Where hearing disability is noted byaudiom-
eter test it would be advantageous to estimate
intelligence level by a combination of Draw-A-
Man and Block Design scores.
Visually Handicapped
According to a study by Scholl (197), the
Block Design test may be administered with
normal procedures to the partially blind. For the
totally blind only the Vocabulary test would be
appropriate in the Survey, and no data are avail-
able to evaluate their scores adequately.
Stutterers
Post (198) found no significant differences
between the mean scores of 30 stutterers and 30
controls, predominantly boys in the age range of
5-5 to 15-10, on the Stanford-Binet (S-B) and the
WISC. The correlation of WISC Full Scale IQwith
the S-B was 0.78 for the stutterers. The only
difference found between the two groups was in
the correlation of WISC Verbal Scale and Perform-
ance Scale IQ's, which was 0.26 for the stutterers
and 0.60 (the same as in Wechsler's standardiza-
tion sample) for the controls. Both group means
were higher on PS than VS.
13
Cerebral Palsy
Bortner and Birch (199) studied the adminis-
tration of the Block Design subtests with twenty-
eight 13-year-old cerebral palsied children. They
found, as may be expected, that the ability to dis-
criminate block designs in a choice situation may
be intact even though motor factors impair re-
productive ability.
Organic Impairment of
Central Nervous System
Beck and Lam (200) found that WISC Full
Scale IQ's of diagnosed organics were lower than
those of nonorganics, but failed, as others have, to
verify Wechsler's subtest diagnostic pattern for
organics. Young and Pitts (202) compared the
WISC scores of 40 rural juvenile congenital
syphilitics (aged 6 to 16 years) with 40 normal
controls matched on age, sex, race, region, and
father's occupation. The controls were signifi-
cantly superior on IQ's and on Vocabulary, but
not on Block Design, where the critical ratio was
marginal.
Gifted
In Edmonton, Chalmers (213) administered
the WISC to 57 superior children with IQ's above
120 (mean FS IQ 128) and found that 11 obtained
perfect scores on one or more tests. However,
there were no perfect scores on Vocabulary and
only one on Block Design. Nevertheless, Chalmers
questioned the adequacy of the WISC ceilings for
precise measurement in the very high range.
Trauba (214), with a similar sample of 71 gifted
Kansas children, found that WISC Vocabulary has
a correlation of 0.71 with the McCall-Crabbs
Standard Test Lesson in Reading. Lucitoand Gal-
lagher (215) obtained a mean WISC Full Scale IQ
of 141 for a sample of SO children whose mean
S-B IQ was 161. In this group the boys' scores
were slightly higher than those of the girls. In
agreement with Gallagher and Lucito (164), men-
tioned earlier, Similarities, Information, and Vo-
cabulary were the three highest tests for boys and
girls. Object Assembly, Coding, and Picture Ar-
rangement were lowest for boys, while Digit Span,
Picture Arrangement, and Picture Completion
were lowest for girls (only partially in agreement
with Gallagher and Lucito).
14
The adequacy of the WISC for precise meas-
urement of the gifted may be questioned, but it
is possible that more accurate measurement may
be obtained by use of the present short form of
Vocabulary and Block Design than with the Full
Scale. This is a problem, however, that will re-
quire further attention.
Mentally Retarded and Defective
The research on the use of the WISC with
retarded and defective groups is very favorable,
in contrast with research on its use for the gifted.
This is indicated by virtually all the studies re-
viewed: (a) reliabilities reported— Throne and
others (227) obtained retest reliabilities over 3
to 4 months of 0.79 for Vocabulary and 0.82 for
Block Design on a sample of 39 retarded boys aged
11 to 14 years; (b) correlations of the WISC with
other tests—Stanford-Binet (216, 217, 228, and
229), Leiter International Performance Scale (221
and 229), Wechsler Adult Intelligence Scale (222),
Columbia Mental Maturity Scale (224), Goodenough
Draw-A-Man Test (224), Progressive Matrices
(233), Peabody Picture Vocabulary Test (234), and
grade placement (238); (c) patterning studies,
mentioned earlier; (d) absence of sex differences
(232); and (e) amenability to short forms based on
Vocabulary and Block Design, as discussed above.
(See Research on Short Forms of the WISC.) Dif-
ferences between WISC and Stanford-Binet IQ's
are smaller in this range than in any other. It
appears that estimates of retardation in the pop-
ulation should be justified on the basis of a com-
posite score of Voc. and BD, but the desirability
of further research to develop a conversion table
to the Full Scale should not be minimized.
Bilingual
The effect of bilingualism appears tobe in the
direction of lowering the Vocabulary scores; no
effects have been reported on Block Design. Altus
(240) reported such results for Mexicans in Cali-
fornia; Kralovich (241), for children of Slavic
origin in New Jersey; and Levinson (243 and 244),
for Jewish children in New York. Kralovich re-
ported a correlation of 0.61 between the Verbal
and Performance scales of the WISC for 28 mono-
linguals and -0.04 for 28 bilinguals. Where bi-
lingualism is known to exist, verbal tests may be
expected to be invalid measures and greater re-
liance on performance-type tests such as Block
Design and Draw-A-Man is indicated.
Negro
The WISC norms do not apply to Negro chil-
dren, and research by Young and Bright (251),
Caldwell (252), Blakemore (253), and Racheile
(254), as well as others, does nothing to alter
this fact. Negroes score lower than whites, and
it is generally accepted that cultural experience
and caste factors not only account for the Negro-
white differences, but also render comparable
measurement by culture-fair or culture-free
methods as difficult as other ethnic comparisons.
The sampling designs of the studies cited, which
used the WISC, were not adequate to qualify them
for any detailed comment on differences found.
Socioeconomic Status
Laird (250) compared children of different
socioeconomic status (SES) on the WISC and noted,
in common with the general trend in the literature,
superior performance at upper levels. Estes (247
and 248) found similar differences at grade 2 but
not at grade 5. At bothgrades the WISC Full Scale
IQ was more highly correlated with the Metro-
politan Achievement Test for the higher SES sam-
ple.
COMPARISON OF WISC
AND STANFORD-BINET 1Q’'S
Despite the theoretical objections tothe men-
tal age concept, discussed earlier, which led to
the adoption of the deviation IQ as a distinctive
feature of the Wechsler scales and which set
them apart from the venerable Stanford-Binet
test, the relation of the WISC to the S-B has been
a matter of great interest, as evidenced by the
number of papers on this topic in the present re-
view.
The Stanford-Binet is indeed one of the giants
among psychological tests, a veritable landmark
in the history of psychological measurement, and
still enjoys extensive school and clinical use, not-
withstanding the fact that its popularity has been
somewhat reduced by the success of the relatively
recent WISC. Although the standardization of the
WISC has been impressive and supported by so-
phisticated conceptualization, many users have
been relieved to find that it is highly correlated
with the Stanford-Binet. The correlation is in fact
so high (accounting for over 80 percent of common
variance) that one wonders about the significance
of the theorizing which describes them so differ-
ently.
The impression of similarity of measurement
results given by the correlations does not, how-
ever, stand up when mean scores of different
groups are compared. As noted earlier, WISC
IQ's tend to be lower than Stanford-Binet IQ's at
the lower age levels and among the gifted. These
observations are illustrated by data extracted
from the following 12 studies in which comparison
means were cited: 119, 120, 124, 147, 148, 151,
153, 156, 159, 161, 215, and 216. Their resulis
are epitomized briefly on the following page.
Data from Jones' (154) British study of 240 chil-
dren in the age range 8 to 10 years are also of
interest. For this group the WISC means were,
on the average, 7.2 IQ points below the S-B, the
WISC always being administered first.
Allowing for sampling fluctuations and errors
of measurement in routine testing, there never-
22 |
20} hn
allel 4
z
gl Gifted y
14 | E
go
= 2} -
=z
= oF
8
S 8 i
w
x 6 4
uw
L 4 Normal range —
o
2 nu
ok Retarded - Defective
1 1 i] 1 1 1 1 1 A. 1
5 6 7 8 9 10 1 12 13 14
AGE IN YEARS
Figure |. Summary of the amount Stanford-Binet
Intelligence Test scores differ from Wechsler
Intelligence Test scores.
15
Normal (White) Samples
Schachter and Apgar Mean age 4-1 Mean S-B 104.3
(147)1 Mean age 8-3 Mean WISC 98,9
N 113 (61m, 62f) -5.4
Triggs and Cartee (148) Kindergarten- Mean S-B 124.1
Age 5 Mean WISC 107.6
N 48 -16,5
Muhr (119) 5-year group Mean S-B 97.4
N 21 Mean WISC 88,1
~953
6-year group Mean S-B 102.2
N 21 Mean WISC 96,6
-5,6
Pastovic and Guthrie 5-year group Mean S-B 113.0
(120) N 50 Mean WISC 103.2
-9,8
7-year group Mean S-B 115.1
N 50 Mean WISC 111.5
-3,6
Rottersman (151) 6-year group Mean S-B 110.2
N 50 Mean WISC 101,5
a
Cohen and Collier (124) 6- to 9-year group Mean S-B 104,8
Ages 6-5 to 8-9 Mean WISC 99,3
N 53 -5,0
Wagner (156) 8- to 9-year group Mean S-B 104,5
N 50 Mean WISC 103.3
-1,2
Frandsen and Higginson 9-year group Mean S-B 105.8
(159) N 50 Mean WISC 102,4
=3.4
Kardos (161) 13- to l4-year group Mean S-B 113.7
N 100 Mean WISC 109.4
Gifted (White) Samples
Beeman (153) N 36 Full sample: Mean WISC compared with Mean S-B: =15
IQ over 130: Mean WISC compared with Mean S-B: =-20
IQ 120-129: Mean WISC compared with Mean S-B: -11.4
N 50 Mean S=-B 160,8
‘Mean WISC 141.2
Lucito and Gallagher
(215) =
Retarded Samples
Nale (216) 9- to ll-year group Mean S-B 55.4
N 104 Mean WISC 58,0
+2,6
Interval between S-B and WISC administration, 50 months,
NOTE: N—number; m—male; f-——female,
16
theless appears to be a common trend in these
reports which can be summarized as follows. The
differences between WISC and S-B IQ's are great-
est among the gifted. In the normal range they are
high among the very young, dropping off as age
increases, but persisting tosome degree through-
out the age range 5 to 14 years. The data suggest
an upturn after age 9, but this is not certain. No
significant differences appear for the subnormal.
The schematic chart in figure 1 suggests the na-
ture of the age- and level-related difference
functions on the basis of the results cited.
Unfortunately it is possible only to speculate
on the nature of the true curves which those in
figure 1 are intended to suggest, and speculation
on what they would be for a short form composed
only of Vocabulary and Block Design is difficult.
Some of the data presented earlier for these sub-
tests suggest that the differences mightbe small-
er, but in the absence of empirical evidence this
is only an educated guess.
For the purposes of the Survey there are
only two alternatives. One is to carry out some
ad hoc research on the short form, as suggested
earlier, for the purpose of estimating the Full
Scale IQ from Voc. and BD, using the results to
conform to Wechsler's norms. The other is to
regard the full Survey sample as the unprecedented
opportunity to carry outa complete new standardi-
zation of the short form on a basis that, in sam-
pling sophistication, far exceeds any work of its
kind in the history of testing. There are a number
of problems related to the second alternative,
including the availability of funds for this purpose.
However, if this standardization were accom-
plished, the new norms for Voc. and BD would be
superior to those now available, and the compu-
tations of FS IQ based on them would permit more
accurate population ‘estimates than any others
conceivable for the age range included.
SUMMARY AND CONCLUSIONS
This review is based on 154 published studies,
reviews, and unpublished theses and disserta-
tions related to the WISC, interpreted in a frame
of reference of measurement theory and psy-
chometric principles. The evidence considered
strongly supports the judgment of the Survey
staff in the selection of the WISC Vocabulary and
Block Design subtests as a short form of the WISC
for the national survey, but at the samc time it
raises questions concerning the acceptance of
either the scaled scores of these subtests or of
prorated Full Scale Intelligence Quotients based
on them without further empirical research. It
is the reviewer's considered opinion that, given
the alternatives presented, the selection was an
eminently wise one. The research recommended
reflects principally the nature of the unprecedent-
ed testing problems and the generally imprecise
nature of psychological measurement.
The most important recommended investiga-
tions discussed in this section involve the follow-
ing steps:
1. Restandardization of the Vocabulary and
Block Design tests on the full Survey
sample. As part of this study, item diffi-
culties should be checked and a formula or
set of formulas should be developed for
estimating Full Scale IQ's from revised
Voc. and BD scaled scores (based on
samples of normal, gifted, and retarded
groups—and if possible several ethnic
groups, such as Negroes or Mexicans—to
whom the Full Scale has been adminis-
tered). Consideration should be given
to estimation of IQ's directly from raw
scores by age group.
2. Research on correlates of a Voc.-BD
ratio, for use with the WRAT and with the
Draw -A-Man Test in the identification of
poor readers, bilinguals, and verbally im-
paired children and in estimating IQ's of
culturally deviant ethnic groups.
3. Cross-disciplinary developmental anal-
yses of Vocabulary, Block Design, and de-
rived scores and of item responses with
biomedical data obtained in other sections
of the Survey. This area is discussed in
detail elsewhere. See Klausmeier and
Check (166).
17
BIBLIOGRAPHY
General References to WISC
101. Wechsler, D.: Wechsler Intelligence Scale for Children.
New York. Psychological Corp., 1949.
102. Littell, W. M.: The Wechsler Intelligence Scale for Chil-
dren, review of a decade of research. Psychological
Bull. 57:132-156, 1960.
103. McCandless, B. R.: Review of the WISC, in O.K. Buros,
ed., Fourth Mental Measurements Yearbook. Highland
Park, N.J. The Gryphon Press, 1953. pp. 480-481.
104. Frost, B. P.: An application of the method of extreme
deviations to the Wechsler Intelligence Scale for Chil-
dren. J.Clin.Psychol. 16:420, 1960.
105. Winpenny, N.: An Investigation of the Use and the Va-
lidity of Mental Age Scores on the Wechsler Intelligence
Scale for Children. Unpublished master’s thesis, Penn-
sylvania State College, 1951.
106. Maxwell, A. E.: Inadequate reporting of normative test
data. J.Clin.Psychol. 17:99-101, 1961.
107. Seashore, H. G.: Differences between verbal and per-
formance IQ’s on the Wechsler Intelligence Scale for
Children. J.Consult.Psychol. 15:62-67, 1951.
108. Robinowitz, R.: Learning the relation of opposition as
related to scores on the Wechsler Intelligence Scale for
Children. J.Genet.Psychol. 88:25-30, 1956.
Factor Analytic Studies
109. Hagen, E. P.: 4 Factor Analysis of the Wechsler Intel-
ligence Scale for Children. Unpublished doctoral dis-
sertation, Columbia University, 1952.
110. Gault, U.: Factorial patterns on the Wechsler Intelligence
Scales. Aust.J.Psychol. 6:85-90, 1954.
111. Cohen, J.: The factorial structure of the WISC at ages
7-6, 10-6, and 13-6. J.Consuli.Psychol. 238:285-299,
1959.
Reliability and Stability
112. Hite, L.: Analysis of Reliability and Validity of the
Wechsler Intelligence Scale for Children. Unpublished
doctoral dissertation, Western Reserve University, 1953.
113. Gehman, I. H., and Matyas, R. P.: Stability of the WISC
and Binet tests. J.Consult.Psychol. 20:150-152, 1956.
114. Matyas, R. P.:4 Longitudinal Study of the Revised Stan-
ford-Binet and the WISC. Unpublished master’s thesis,
Pennsylvania State University, 1954.
115. Reger, R.: Repeated measurements with the WISC. Psy-
chol.Rep. 11:418, 1962.
116. Whatley, R. G., and Plant, W. T.: The stability of WISC
1Q’s for selected children. J.Psychol. 44:165-167,1957.
Validity
117. Mussen, P., Dean, S., and Rosenberg, M.: Some further
evidence on the validity of the WISC. J.Consult.Psychol.
16:410-411, 1952.
18
118. Kureth, G.,Muhr, J. P., and Weisgerber, C. A.: Some data
on the validity of the Wechsler Intelligence Scale for
Children. Child Development 23:281-287, 1952.
119. Muhr, J. P.: Validity of the Wechsler Intelligence Scale
for Children at the Five and Siz Year Level. Unpub-
lished master’s thesis, University of Detroit, 1952.
120. Pastovic, J. J., and Guthrie, G. M.: Some evidence on
the validity of the WISC. J.Consult.Psychol. 15:385-
386, 1951.
121. Pastovic, J. J.: A Validation Study of the Wechsler In-
telligence Scale for Children at the Lower Age Level.
Unpublished master’s thesis, Pennsylvania State Col-
lege, 1951.
122. Altus, G. T.: A note on the validity of the Wechsler In-
telligence Scale for Children. J.Consult.Psychol. 16:
231, 1952.
Relations with Other Tests: Batteries
123. McBrearty, J. F.: Comparison of the WISC With the Arthur
Performance Scale, Form I, and Their Relationship to
the Progressive Achievement Test. Unpublished mas-
ter’s thesis, Pennsylvania State College, 1951.
124. Cohen, B. D., and Collier, M. J.: A note on WISC and
other tests of children six to eight years old. J.Consult.
Psychol. 16:226-227, 1952.
125. Estes, B. W., Curtin, M. E., DeBurger, R. A., and Denny,
C.: Relationships between 1960 Stanford-Binet, 1937
Stanford-Binet, WISC, Raven, and Draw-A-Man. J.Con-
sult.Psychol. 25:388-391, 1961.
126. Smith, B. S.: The relative merits of certain verbal and
non-verbal tests at the second-grade level. J.Clin.Psy-
chol. 17:53-54, 1961.
Relations with Other Tests: Wechsler-Bellevue
127. Bacon, C. S.: A Comparative Study of the Wechsler-Bel-
levue Intelligence Scale for Adolescents and Adults,
Form I, and the Wechsler Intelligence Scale for Children
at the Twelve-Year Level. Unpublished master’s thesis,
University of North Dakota, 1954.
128. Delattre, L., and Cole, D.: A comparison of the WISC
and the Wechsler-Bellevue. J.Consult.Psychol. 16:228-
230, 1952.
Relations with Other Tests: Bender-Gestalt Perceptual Tests
129. Koppitz, E. M.: Relationships between the Bender-Ge-
stalt Test and the Wechsler Intelligence Scale for Chil-
dren. J.Clin.Psychol. 14:413-416, 1958.
130. Armstrong, R. G., and Hauck, P. A.: Correlates of the
Bender-Gestalt scores in children. J.Psychol.Stud. 11:
153-158, 1960.
181. Goodenough, D. R., and Karp, S. A.: Field dependence
and intellectual functioning. J.Abnorm.&8ocial Psychol.
63:241-246, 1961.
Relations with Other Tests: CMAS
132. Hafner, A. J., Pollie, D. M., and Wapner, I.: The relation-
ship between the CMAS and WISC functioning. J.Clin.
Psychol. 16:322-323, 1960.
Relations with Other Tests:
Ammons Full Range Picture Vocabulary
133. Smith, L. M., and Fillmore, A. R.: The Ammons FRPV
Test and the WISC forremedial reading cases; abstracted,
J.Consult.Psychol. 18:332, 1954.
Relations with Other Tests: CTMM
134. Altus, G. T.: Relationships between verbal and non-ver-
bal parts of the CTMM and WISC. J.Consult.Psychol.
19:143-144, 1955.
Relations with Other Tests: Kent EGY
135. Delp, H. A.: Correlations between the Kent EGY and the
Wechsler batteries. J.Clin.Psychol. 9:73-75, 1953.
Relations with Other Tests: Peabody Picture Vocabulary Test
136. Kimbrell, D. L.: Comparison of Peabody, WISC, and ac-
ademic achievement scores among educable mental de-
fectives. Psychol.Rep. 7:502, 1960.
137. Himelstein, P., and Herndon, J. D.: Comparison of the
WISC and Peabody Picture Vocabulary Test with emo-
tionally disturbed children. J.Clin.Psychol. 18:82, 1962.
Relations with Other Tests: Raven Progressive Matrices
138. Barratt, E. S.: The relationship of the Progressive Ma-
trices (1938) and the Columbia Mental Maturity Scale to
the WISC. J.Consult.Psychol. 20:294-296, 1956.
139. Wilson, L.: 4 Comparison of the Raven Progressive Ma-
trices (1947)and the Performance Scale of the Wechsler
Intelligence Scale for Children for Assessing the Intel-
ligence of Indian Children. Unpublished master’s thesis,
University of British Columbia, 1952.
140. Malpass, L. F., Brown, R., and Hade, D.: The utility of
the Progressive Matrices (1956 edition) with normal and
retarded children. J.Clin.Psychol. 16:350, 1960.
141. Stacey, C.L., and Carleton, F.O.: The relationship be-
tween Raven’s Colored Progressive Matrices and two
tests of general intelligence. J.Clin.Psychol. 11:84-85,
1955.
142. Martin, A. W., and Wiechers, J. E.: Raven’s Colored Pro-
gressive Matrices and the Wechsler Intelligence Scale
for Children. J.Consult.Psychol. 18:143-144, 1954.
Relations with Other Tests: SRA-PMA
148. Stempel, E. F.: The WISC and the SRA Primary Mental
Abilities Test. Child Development 24:257-261, 1953.
Relations with Other Tests: Stanford-Binet
144. Krugman, J.I., Justman, J., Wrightstone, J. W., and Krug-
man, M.: Pupil functioning on the Stanford-Binet and the
Wechsler Intelligence Scale for Children. J.Consult.
Psychol. 15:475-483, 1951.
145.
146.
147.
148.
149.
150.
151.
152.
153.
154.
155.
156.
157.
158.
159.
Harlow, J. E., Jr., Price, A. C., Tatham, L. J., and
Davidson, J. F.: Preliminary study of comparison be-
tween Wechsler Intelligence Scale for Children and Form
L of the Revised Stanford Binet Scale at three age lev-
els. J.Clin.Psychol. 13:72-73, 1957.
Boruszak, R. J.: 4 Comparative Study to Determine the
Correlation Between the 1Q’s of the Revised Stanford
Binet Scale, Form L, and the 1Q’s of the Wechsler In-
telligence Scale for Children. Unpublished master’s the-
sis, Wisconsin State College, 1954.
Schachter, F. F., and Apgar, V.: Comparison of pre-
school Stanford-Binet and school-age WISC 1Q’s. J.Educ.
Psychol. 49:320-323, 1958.
Triggs, F. O., and Cartee, J. K.: Pre-school pupil per-
formance on the Stanford-Binet and the Wechsler Intel-
ligence Scale for Children. J.Clin.Psychol. 9:27-29,
1953.
Holland, G. A.: A comparison of the WISC and Stanford-
Binet IQ’s of normal children. J.Consult.Psychol. 17:
147-152, 1953.
Weider, A., Noller, P. A., and Schraumm, T. A.: The
Wechsler Intelligence Scale for Children and the Re-
vised Stanford-Binet. J.Consult.Psychol. 15:330-333,
1951.
Rottersman, L.: A Comparison of the IQ Scores on the
New Revised Stanford Binet, Form L, the Wechsler In-
telligence Scale for Children, and the Goodenough “Draw
A Man” Test at the Six Year Age Level. Unpublished
master’s thesis, University of Nebraska, 1950.
Tatham, L. J.: Statistical Comparison of the Revised
Stanford-Binet Intelligence Test Form L With the Wech-
sler Intelligence Scale for Children Using the Siz and
One-HalfYear Level. Unpublished master’s thesis, Uni-
versity of Florida, 1952.
Beeman, G.: A comparative study of the WISC and Stan-
ford-Binet witha group of more able and gifted 7-11 year
old students. Calif.J.Educ.Res. 11:77, 1960.
Jones, S.: The Wechsler Intelligence Scale for Children
applied to a sample of London primary school children.
Br.J.Educ.Psychol. 32(2):119-133, 1962.
Scott, G.R.: A Comparison Between the Wechsler Intel-
ligence Scale for Children and the Revised Stanford-Binet
Scales. Unpublished master’s thesis, Southern Method-
ist University, 1950.
Wagner, W. K.: A Comparison of Stanford-Binet Mental
Ages and Scaled Scores on the Wechsler Intelligence
Scale for Children for Fifty Bowling Green Pupils. Un-
published master’s thesis, Bowling Green State Univer-
sity, 1951.
Stanley, J. C.: Statistical analysis of scores from coun-
terbalanced tests. J.Exp.Educ. 23:187-207, 1955.
Arnold, F. C., and Wagner, W. K.: A comparison of Wech-
sler Children’s Scale and Stanford-Binet scores for eight-
and nine-year-olds. J.Ezp.Educ. 24:91-94, 1955.
Frandsen, A. N., and Higginson, J. B.: The Stanford-
Binet and the Wechsler Intelligence Scale for Children.
J.Consult.Psychol. 15:236-238, 1951.
19
160.
161.
162.
Clarke, F. R.: A Comparative Study of the Wechsler In-
telligence Scale for Children and the Revised Stanford
Binet Intelligence Scale, Form L, in Relation to the Scho-
lastic Achievement of a 5th Grade Population. Unpub-
lished master’s thesis, Pennsylvania State College,
1950.
Kardos, M. S.: A Comparative Study of the Performance
of Twelve-Year-Old Children on the WISC and the Re-
vised Stanford-Binet, Form L, and the Relationship of
Both to the California Achievement Tests. Unpublished
master’s thesis, Marywood College, 1954.
Davidson, J. F.: A Preliminary Study in Statistical Com-
parison of the Revised Stanford-Binet Intelligence Test
Form L With the Wechsler Intelligence Scale for Chil-
dren Using the Fourteen Year Level. Unpublished mas-
ter’s thesis, University of Florida, 1954.
Relations with Other Tests: Wartegg Drawing Completion Test
163.
WISC:
1lv4.
165.
166.
167.
Stark, R.: A Comparison of Intelligence Test Scores on
the Wechsler Intelligence Scale for Children and the War-
tegg Drawing Completion Testwith School Achievement
of Elementary School Children. Unpublished master’s
thesis, University of Detroit, 1954.
Response Patterns of Cifted, Average, and Retarded
Gallagher, J. J., and Lucito, L. L.: Intellectual patterns
of gifted compared with average, and retarded. Except.
Children 27:479-482, 1961.
Klausmeier, H. J., and Feldhusen, J. F.: Retention in
arithmetic among children of low, average, and high in-
telligence at 117 months of age. J.Educ.Psychol. 50:
88-92, 1959.
Klausmeier, H. J., and Check, J.: Relationships among
physical, mental, achievement, and personality measures
in children of low, average, and high intelligence at 113
months of age. Am.J.Ment.Deficiency 63:1059-1068,
1959.
Feldhusen, J. F., and Klausmeier, H. J.: Anxiety, intel-
Jligence, and achievement in children of low, average,
WISC:
and high intelligence. Child Development 33:403-409,
1962.
Vocabulary, Language Skills, Reading
168.
169.
170.
wL
20
Stacey, C. L., and Portnoy, B.: A study of the differen-
tial responses on the vocabulary subtest of the Wechsler
Intelligence Scale for Children. J.Clin.Psychol. 6:401-
403, 1950.
Winitz, H.: 4 Comparative Study of Certain Language
Skills in Male and Female Kinaergarten Children. Un-
published doctoral dissertation, State University of Iowa,
1959.
Dunsdon, M. I., and Roberts, J. A. F.: A study of the
performance of 2,000 children on four vocabulary tests.
Br.J .Statist.Psychol. 8:3-15, 1955.
Reidy, M. E.: A Validity Study of the Wechsler-Bellevue
Intelligence Scale for Children and Its Relationship to
Reading and Arithmetic. Unpublished master’s thesis,
Catholic University of America, 1952.
172.
173.
174.
WISC:
175.
176.
vi.
178.
179.
Flamand, KE. K.: The Relationship Between Various Meas-
ures of Vocabulary and Performance in Beginning Read-
ing. Unpublished doctoral dissertation, Temple Univer-
sity, 1961.
Triggs, F.O., Cartee, J. K., Binks, V., Foster, D., and
Adams, N. A.: The relationship between specific reading
skills and general ability at the elementary and junior-
senior high school levels. Educ.Psychol.Measur. 14:
176-185, 1954.
Fitzgerald, L. A.: Some Effects of Reading Ability on
Group Intelligence Test Scores in the Intermediate
Grades. Unpublished doctoral dissertation, State Uni-
versity of Iowa, 1960; abstracted, Diss.Absér. 21:1844,
1961.
Short Forms
Armstrong, R. G.: A reliability study of a short form of
the WISC vocabulary subtest. J.Clin.Psychol. 11:413-
414, 1955.
Throne, J. M.: A Short Form of the Wechsler-Bellevue
Intelligence Test for Children. Unpublished master’s
thesis, University of Florida, 1951.
Simpson, W.H., and Bridges, C. C., Jr.: A short form of
the Wechsler Intelligence Scale for Children. J.Clin.Psy-
chol. 15:424, 1959.
Carleton, F. O., and Stacey, C. L. : Evaluation of se-
lected short forms of the Wechsler Intelligence Scale for
Children. J.Clin.Psychol. 10:258-261, 1954.
Yalowitz, J. M., and Armstrong, R. G.: Validity of short
forms of the Wechsler Intelligence Scale for Children
(WISC). J.ClUin.Psychol. 11:275-277, 1955.
WISC: Reading Disability
180.
181.
182.
183.
184.
185.
186.
187.
Kallos, G. L., Grabow, J. M., and Guarino, E. A.: The
WISC profile of disabled readers. Personnel Guid.J.
39:476-478, 1961.
Altus, G. T.: A WISC profile for retarded readers. J.Con-
sult.Psychol. 20:155-156, 1956.
Sheldon, M. S., and Garton, J.: A note on ‘‘a WISC pro-
file for retarded readers.’ Alberta J.Educ.Res. 5:264-
267, 1959.
Robeck, M. C.: Subtest patterning of problem readers on
WISC. Calif.J.Educ.Res. 11:110-115, 1960.
Abrams, J. C.: 4 Study of Certain Personality Character-
istics of Non-Readers and Achieving Readers. Unpub-
lished doctoral dissertation, Temple University, 1955.
Karlsen, B.: 4 Comparison of Some Educational and Psy-
chological Characteristics of Successful and Unsuccess-
ful Readers at the Elementary School Level. Unpub-
lished doctoral dissertation, University of Minnesota,
1954.
Burks, H. F., and Bruce, P.: The characteristics of poor
and good readers as disclosed by the Wechsler Intelli-
gence Scale for Children. J.Educ.Psychol. 46:488-493,
1955.
Rogge, H. J.: A Study of the Relationships of Reading
Achievement to Certain Other Factors in a Population
of Delinquent Boys. Unpublished doctoral dissertation,
University of Minnesota, 1959.
WISC: School Achievement
188.
189.
190.
191.
192.
Orr, K.N.: The Wechsler Intelligence Scale for Children
as a Predictor of School Success. Unpublished master’s
thesis, Indiana State Teachers College, 1950.
Schwitzgoebel, R. R.: The Predictive Value of Some Re-
lationships Between the Wechsler Intelligence Scale for
Children and Academic Achievement in Fifth Grade. Un-
published doctoral dissertation, University of Wisconsin,
1952.
Barratt, E. S., and Baumgarten, D. L.: Therelationship
of the WISC and Stanford-Binet to school achievement.
J.Consult.Psychol. 21:144, 1957.
Raleigh, W. H.: 4 Study of the Relationships of Academic
Achievement in Sixth Grade With the Wechsler Intelli-
gence Scale for Children and Other Variables. Unpub-
lished doctoral dissertation, Indiana University, 1952.
Stroud, J. B., Blommers, P., and Lauber, M.: Correlation
of WISC and achievement tests. J.Educ.Psychol. 48:
18-26, 1957.
WISC: Auditory Disability, Visual Handicap,
Stuttering, Cerebral Palsy, Brain Damage
193.
194.
195.
196.
197.
198.
199.
200.
201.
202.
Thompson, B. B.: The Relation of Auditory Discrimina-
tion and Intelligence Test Scores to Success in Primary
Reading. Unpublished doctoral dissertation, Indiana Uni-
versity, 1961.
Glowatsky, E.: The verbal element in the intelligence
scores of congenitally deaf and hard of hearing children.
Amer. Ann.Deaf 98:328-385, 1953.
Graham, E. E., and Shapiro, E.: Use of the Performance
Scale of the Wechsler Intelligence Scale for Children
with the deaf child. J.Consult.Psychol. 17:396-398,
1953.
Murphy, L. J.: Tests of abilities and attainments, pupils
in schools for the deafaged six to ten, in A. W. G. Ewing,
ed., Educational Guidance and the Deaf Child. Man-
chester, England. Manchester University Press, 1957.
pp. 213-251.
Scholl, G.: Intelligence tests for visually handicapped
children. Ezcep.Children 20:116-120, 1953.
Post, D. P.: 4 Comparative Study of the Revised Stan-
ford Binet and the Wechsler Intelligence Scale for Chil-
dren Administered to a Group of Thirty Stutterers. Un-
published master’s thesis, University of Southern Cali-
fornia, 1952.
Bortner, M., and Birch, H. G.: Perceptual and perceptual-
motor dissociation in cerebral palsied children. J.Nerv.
EMent.Dis. 134:108-108, 1962.
Beck, H. S., and Lam, R. L.: Use of the WISC in pre-
dicting organicity. J.Clin.Psychol. 11:154-157, 1955.
Kilman, B. A., and Fisher, G. M.: An evaluation of the
Finley-Thompson abbreviated form of the WISC for un-
differentiated, brain damaged and functional retardates.
Am .J Ment.Deficiency 64:742-746, 1960.
Young, F.M., and Pitts, V. A.: The performance of con-
genital syphilitics on the Wechsler Intelligence Scale
for Children. J.Consult.Psychol. 15:239-242, 1951.
203. Rowley, V. N.: Analysis of the WISC performance of
brain damaged and emotionally disturbed children. J.Con-
sult.Psychol. 25:553, 1961.
WIS C: Personality Measures (Normal), Discipline, Delinquency
204. Gourevitch, V., and Feffer, M. H.: A study of motivational
development. J.Genet.Psychol. 100:361-375, 1962.
205. Carrier, ‘N. A., Orton, K. D., and Malpass, L. F.: Re-
sponses of bright, normal, and EMH children to an orally-
administered manifest anxiety scale. J.Educ.Psychol.
53:271-274, 1962.
206. Burns, L.: A Correlation of Scores on the Wechsler In-
telligence Scale for Children and the California Test of
Personality Obtained by a Group of 5th Graders. Unpub-
lished master’s thesis, Pennsylvania State College,
1954.
207. Kent, N., and Davis, D. R.: Discipline in the home and
intellectual development. Brit.J.M.Psychol. 30:27-33,
1957.
208. Wall, H. R.: 4 Differential Analysis of Some Intellective
and Affective Characteristics of Peer Accepted and Re-
jected Pre-Adolescent Children. Unpublished doctoral
dissertation, University of Kansas, 1960.
209. Walker, H. A.: The Wechsler Intelligence Scale for Chil-
dren as a Diagnostic Device. Unpublished master’s the-
sis, Utah State Agricultural College, 1956.
210. Schonborn, R.: A comparative study of the differences
between adolescent and child male enuretics and non-
enuretics as shown by an intelligence test. Psychol.
Newsletter 6:1-9, 1954.
211. Maxwell, A. E.: Discrepancies in the variances of test
results for normal and neurotic children. Br.J.Statist.
Psychol. 13:165-172, 1960.
212. Richardson, H. M., and Surko, E. F.: WISC scores and
status in reading and arithmetic of delinquent children.
J.Genet.Psychol. 89:251-262, 1956.
WISC: Gifted
213. Chalmers, J. M.: An Analysis of Results Obtained on
the Wechsler Intelligence Scale for Children by Mentally
Superior Subjects. Unpublished master’s thesis, Uni-
versity of Alberta, 19583.
214. Trauba R.G.:4 Study of the Aspects of Differentiation
of Abilities in Interpretation of Reading With a Group
of Gifted Children. Unpublished doctoral dissertation,
University of Kansas, 1959.
215. Lucito, L., and Gallagher, J.: Intellectual patterns of
highly gifted children on the WISC. Peabody J.Educ.
38:131-136, 1960.
WISC: Mental Defectives
216. Nale, S.: The Childrens-Wechsler ana the Binet on 104
mental defectives at the Polk State School. Am.J.Ment.
Deficiency 56:419-423, 1951.
217. Sloan, W., and Schneider, B.: A study of the Wechsler
Intelligence Scale for Children with mental defectives.
Am .J .Ment.Deficiency 55:573-575, 1951.
21
218.
219.
220.
221.
222.
223.
224.
225.
226.
2217.
Atchison, C.O.: Use of the Wechsler-Intelligence Scale
for Children with eighty mentally defective Negro chil-
dren. Am.J.Ment.Deficiency 60:378-879, 1955.
Carleton, F. O., and Stacey, C. L.: An item analysis of
the Wechsler Intelligence Scale for Children. J.Clin.
Psychol. 11:149-154, 1955.
Newman, J. R., and Loos, F. M.: Differences between
verbal and performance IQ’s with mentally defective chil-
dren on the Wechsler Intelligence Scale for Children.
J.Consult.Psychol. 19:16, 1955.
Alper, A. E.: A comparison of the WISC and the Arthur
adaptation of the Leiter International Performance Scale
with mental defectives. Am.J.Ment.Deficiency 63:312-
316, 1958.
Fleming, J. W.: The Relationships Among Psychometric,
Experimental, and Observational Measures of Learning
Ability in Institutionalized Endogenous Mentally Re-
tarded Persons. Unpublished doctoral dissertation, Uni-
versity of Colorado, 1959.
Baroff, G. S.: WISC patterning in endogenous mental de-
ficiency. Am.J.Ment.Deficiency 64:482-485, 1959.
Warren, S. A., and Collier, H. L.: Suitability of the Co-
lumbia Mental Maturity Scale for mentally retarded in-
stitutionalized females. Am.J.Ment.Deficiency 64:916-
920, 1960.
Fisher, G. M.: A cross-validation of Baroff’s WISC pat-
terning in endogenous mental deficiency. Am.J.Ment.De-
ficiency 65:349-350, 1960.
Baumeister, A., and Bartlett, C. J.: Further factorial in-
vestigations of WISC performance of mental defectives.
Am.J .Ment.Deficiency 67:257-261, 1962.
Throne, F. M., Schulman, J. L., and Kasper, J. C.: Re-
liability and stability of the Wechsler Intelligence Scale
for Children for a group of mentally retarded boys. Am.
J.Ment.Deficiency 67:455-457, 1962.
WISC: Mentally Retarded
228.
229.
230.
231.
232.
233.
22
Stacey, C. L., and Levin, J.: Correlation analysis of
scores of subnormal subjects on the Stanford-Binet and
Wechsler Intelligence Scale for Children. Am.J.Ment.
Deficiency 55:590-597, 1951.
Sharp, H. C.: A comparison of slow learner’s scores on
three individual intelligence scales. J.Clin.Psychol.
13:872-374, 1957.
Matthews, C. G.: Differential Performances of Non-
Achieving Children on the Wechsler Intelligence Scale.
Unpublished doctoral dissertation, Purdue University,
1958.
Finley, C. J., and Thompson, J.: An abbreviated Wech-
sler Intelligence Scale for Children foruse with educable
mentally retarded. Am.J.Ment.Deficiency 63:473-480,
1958.
Finley, C., and Thompson, J.: Sex differences in intel-
ligence of educable mentally retarded children. Calif.
J.Educ.Res. 10:167-170, 1959.
Brown, R., Hakes, D., and Malpass, L.: The utility of
the Progressive Matrices Test(1956 revision); abstract-
ed, Am.Psychologist 14:341, 1959.
234.
235.
236.
237.
238.
239.
Dunn, L. M., and Brooks, S. T.: Peabody Picture Vocab-
ulary Test performance of educable mentally retarded
children. Train.Sch.Rull. 57:35-40, 1960.
Schwartz, L., and Levitt, E.: Short forms of the Wechsler
Intelligence Scale for Children in the educable, non-in-
stitutionalized mentally retarded. J.Educ.Psychol. 51:
187-190, 1960.
Salvati, S. R.: 4 Comparison of WISC 1Q’s and Altitude
Scores as Predictors of Learning Ability of Mentally Re-
tarded Subjects. Unpublished doctoral dissertation, New
York University, 1960; abstracted, Diss.Abs¢r. 21:2370,
1961.
Baumeister, A. A.: The Dimensions of Abilities in Re-
tardates as Measured by the Wechsler Intelligence Scale
for Children. Unpublished doctoral dissertation, George
Peabody College for Teachers, 1961.
Thompson, J. M., and Finley, C. J.: The validation of
an abbreviated Wechsler Intelligence Scale for Children
for use with the educable mentally retarded. Educ.Psy-
chol.Measur. 22:539-542, 1962.
Osborne, R. T., and Allen, J.: Validity of short forms
of the WISC for mental retardates. Psychol.Rep. 11:167-
170, 1962.
WISC: Bilingualism
240.
241.
242.
243.
244.
Altus, G. T.: WISC patterns of a selective sample of bi-
lingual school children. J.Genet.Psychol. 83:241-248.
1953.
Kralovich, A. M.: The Effect of Bilingualism on Intelli-
gence Test Scores as Measured by the Wechsler Intelli-
gence Scale for Children. Unpublished master’s thesis,
Fordham University, 1954.
Cooper, J. G.: Predicting school achievement for bilin-
gual pupils. J.Educ.Psychol. 49:31-36, 1958.
Levinson, B.M.: A comparison of the performance of bi-
lingual and monolingual native born Jewish preschool
children of traditional parentage on four intelligence
tests. J.Clin.Psychol. 15:74-76, 1959.
Levinson, B. M.: A comparative study of the verbal and
performance ability of monolingual and bilingual native
born Jewish preschool children of traditional parentage.
J.Genet.Psychol. 97:93-112, 1960.
WISC: Cultural Variations
245.
246.
Levinson, B. M.: Traditional Jewish cultural values and
performance on the Wechsler tests. J.Educ.Psychol.
50:177-181, 1959.
Levinson, B. M.: Subcultural variations in verbal and
performance ability at the elementary school level. J.
Genet.Psychol. 97:149-160, 1960.
WISC: Socioeconomic Status
247.
248.
Estes, B. W.: Influence of socioeconomic status on Wech-
sler Intelligence Scale for Children, an exploratory study.
J.Consult.Psychol. 17:58-62, 1953.
Estes, B. W.: Influence of socioeconomic status on Wech-
sler Intelligence Scale for Children, addendum. J.Con-
sult.Psychol. 19:225-226, 1955.
249. Roy, I., and Cohen, N.: Some psychometric variables
relative to change in sociometric status; abstracted,
Am. Psychologist 10:328, 1955.
250. Laird, D. S.: The performance of two groups of eleven-
year-old boys on the Wechsler Intelligence Scale for
Children. J.Educ.Res. 51:101-107, 1957.
WISC: Negro Samples, Negro-White Comparisons
251. Young, F. M., and Bright, H. H.: Results of testing 81
Negro rural juveniles with the Wechsler Intelligence
Scale for Children. J.Soc.Psychol. 39:219-226, 1954.
252. Caldwell, M. B.: An Analysis of Responses of a South-
ern Urban Negro Population to Items on the Wechsler
Intelligence Scale for Children. Unpublished doctoral
dissertation, Pennsylvania State University, 1954.
253. Blakemore, J. R.: 4 Comparison of Scores of Negro and
White Children on the Wechsler Intelligence Scale for
Children. Unpublished master’s thesis, College of the
Pacific, 1952.
254. Racheile,L.D.: 4 Comparative Analysis of Ten Year Old
Negro and White Performance on the Wechsler Intelligence
Scale for Children. Unpublished doctoral dissertation
University of Denver, 1953. *
Il. THE WIDE RANGE ACHIEVE MENT TEST,
THE ORAL READING AND ARITHMETIC SUBTESTS
The requirement ot the Survey for an indi-
vidually administered, brief, well-standardized,
reliable, valid, and flexible school achievement
test was filled by the selection of the Reading
and Arithmetic subtests of the 1963 revision of
the Wide Range Achievement Test. The 1963
WRAT, by J.F. Jastak, replaces the original 1946
edition by Jastak and S.W. Bijou and appears to
be quite similar to the original indesignand item
content, except that the new edition is divided, for
the convenience of users, into two levels (Level I
covers ages 5 to 12 years; Level II, 12 years
through adulthood), in contrast with the broad
sweep of the original, from kindergarten through
adulthood.
The principal difference between the two edi-
tions ‘appears to be in the method of standardi-
zation. The 1946 norms were computed to conform
to those of the New Stanford Achievement Test
(Reading, to New Stanford Word and Paragraph
Reading, and Arithmetic Computation, to New
Stanford Arithmetic Computation), whereas the
1963 norms, in each age bracket, depend on
"probability samplings based on IQ's... that
would correspond to the achievement of mentally
average groups with representative dispersions
of scores above and below the mean" (301).
The purpose of this section is both to review
the literature on the WRAT and to evaluate it in
relation to its suitability for the objectives of the
Survey. Unfortunately this must be done almost
entirely on the basis of the tests, manuals, and
research available on the 1946 edition, which is
itself extremely limited. Appropriate data for
critical evaluation of the 1963 edition are almost
totally lacking. Although released for sale in 1963,
the test manual for this edition was still incom-
plete in June 1964 (301) and no independent data
on validity have been found.
EVALUATIVE CRITERIA
Measurement experts believe that in addi-
tion to the standard questions concerning such
issues as reliability, validity, representativeness
of standardization sample, and agreement of
norms with criterion levels, some problems are
inherent in the wide-range type of design. These
are stated forthrightly by Chauncey and Dobbin
(310), in a discussion of various defects of tests:
The "wide-range" test. . . is the too-short
testin disguise. There are only a few of them
around. They are promoted as being suitable
measures of ability (or achievement) for
people of many ages—from third grade
through second year of college, for example.
Since only a small part of any such test can be
material suitable in difficulty for one indi-
vidual, the effective part of the test may
amount to no more than half a dozen ques-
tions—making it a very short test, indeed.
These remarks, by the president and one of
the project directors of the Educational Testing
Service, in a book written expressly to defend
educational testing at a time when it is under
23
attack from many sources, command attention
and concern by users of wide-range tests such
as the WRAT. The particular implication of the
critique is that reliabilities, validities, and score
levels must be evaluated at every level covered
(or at least at every level at which the test is
used) and that broad-band coefficients of relia-
bility and concurrent validity are likely to be
misleading.
The problem of selecting a suitable achieve-
ment test for the Survey is highly complex. Time
restrictions favor short forms and short-cut
methods (such as the wide-range approach), pro-
vided that they meet reasonable standards of
acceptability. However, it is just as true in test-
ing as in all other areas that ''you cannot get
more out than you put in." Compromises with
reality in testing often mean less reliable meas-
ures and less adequate coverage of appropriate
universes of content; sometimes they mean penal-
ties in relation to validity and consequent gener-
alizability of measures.
The application of these points to the WRAT
is considered as judicially as possible in this re-
view, and the reality demands are weighed against
possible shortcomings of this wide-range test in
relation to alternatives available in the situation.
A brief review of the 1946 edition andthe general
conceptualization of the WRAT is followed by a
review of the 1963 edition used in Cycle II.
1946 EDITION OF WRAT
The conceptualization and rationale of this
test (302) could not help but appeal toclinical psy-
chologists in schools and mental health services.
Jastak made an extremely strong case for the
clinical use of his test, and it is not surprising
that the WRAT has enjoyed considerable popu-
larity in clinical circles despite psychometri-
cians' prejudice against wide-range tests.
Jastak's arguments are briefly as follows:
1. A thorough psychological examination
should include tests of school fundamen-
tals as well as intelligence tests. In-
telligence tests account for only a portion
of the variance in school achievement, and
failure in school and life adjustment may
result from factors other than low in-
telligence.
24
2. Reliable (and valid) school tests should be
used to assess discrepancies between in-
tellectual capacity and performance in
basic school subjects as well as dis-
crepancies in the organization of learning
abilities. Wide range discrepancies in
school achievement are the rule rather
than the exception, and their discovery is
important for the understanding of per-
sonality and school performance problems
and for the institution of proper remedial
programs.
3. Clinically recognized discrepancy pat-
terns in children are illustrated by the
tendency of neurotic and disorganized
children to be more proficient in reading
than in arithmetic. In addition, "if neu-
rotic tendencies and special reading
handicaps occur together the child may
function far below the level of his true
capacity in all school subjects." Of course,
failure in reading and in arithmetic may
also reflect unrelated processes.
Jastak's criteria of a satisfactory school
achievement test for (individual) clinical use are
(a) low cost, (b) individual standardization, (c)
ease and economy of administration, (d) suita-
bility of contents, (e) relevance of the functions
studied, and (f) comparability of results over the
entive range of the skills in question. It is appar-
ent that these criteria do in effect exclude such
standard school achievement batteries as the
Stanford, Iowa, Cooperative, and other well-known
and highly respected batteries that are designed
for group administration within a narrow grade
range and cover a large universe of content,
requiring considerable time to administer and
score. These criteria certainly appear to be
"tailor made" for the Survey (as well as for
clinical practice). However, in view of the test-
ing conditions for individually selected members
of the national sample, the question is, how well
are they implemented in the WRAT?
Jastak's views on test content are of partic-
ular interest. The WRAT focuses entirely on
three basic school study skills—reading, spelling,
and arithmetic— "around whichmost school stud-
ies revolve." The range of the subtests for each
is indeed wide, from kindergarten to college.
The test content is concerned principally
with mastery of the mechanics of the subject
rather than with comprehension. Thus the reading
test is in effect a test of reading as a motor
skill; the spelling test focuses on words without
sentence contexts; and the arithmetic test in-
volves number facility with minimal dependence
on reading.
This emphasis is a reflection of the author's
conception of the WRAT as an adjunct to tests of
intelligence and behavior adjustment. Information
concerning the subject's ability to comprehend
can be obtained from intelligence tests, but ac-
curate measurement of mechanics in the basic
tools chosen is essential because of the depend-
ence of most other studies on them. Further, it
is argued that correct answers can often be given
in conventional reading, arithmetic, and other
subject-matter achievement tests on the basis of
general knowledge and intellectual ability, even
when mastery of mechanics is poor; thus, im-
portant diagnostic cues are overlooked.
Although the WRAT Reading and Arithmetic
tests were reported to correlate satisfactorily
with other achievement tests, their limitations of
content and intended use were clearly outlined in
the manual.
As stated above, the 1946 edition of the WRAT
was standardized by anchoring the WRAT norms to
those of corresponding subtests of the New Stan-
ford Achievement Test. The standardization
sample consisted of the scores of 4,052 students
for Spelling and Arithmetic (about 1,500 were
individually tested; the remainder were tested in
groups) and 1,429 students, .individually tested,
for Reading. Reliability coefficients (retest) were
reported as 0.95 for Reading (N=110) and 0.90
for Arithmetic (N=120). The Reading section of
the New Stanford Achievement Test was reported
to have correlated 0.81 with Paragraph and Word
Reading; the Arithmetic section of the Stanford
test correlated 0.91 with Arithmetic Computation.
The detailed composition of the various sam-
ples was not reported in the 1946 manual, and
the validation data were not specified by age level
as would be required to conform with the evalua-
tive criteria discussed above. This was not ex-
ceptional in 1946, however, when the professional
demands for rigorous reporting of critical infor-
mation by test publishers were less stringent
than they are today.
Nevertheless, despite the absence of com-
prehensive statistical information, the WRAT be-
came a favorite of a large number of clinicians,
and its use was extensive in the United States
and abroad within a short time of its publication.
It may appear surprising that so popular a test
generated so little research. However, itappears
that the principal use of the testwas by clinicians
whose attitudes toward tests are usually validated
more by clinical experience than by statistics
and whose opportunities and motivations to con-
duct and publish research are generally limited.
RESEARCH ON THE 1946 WRAT
It is noteworthy that only seven researchre-
ports have been found dealing with the 1946 edi-
tion and that of these seven, two were unpublished
mimeographed papers (303 and 306) furnished by
Dr. Jastak. Reliability coefficients and corre-
lations of the WRAT with other tests, abstracted
from these reports and the two test manuals (301
and 302), are reported in tables 4 and 5.
Reading
Hopkins, Dobson, and Oldridge (304) quoted
Sundberg (312), in a 1961 paper, to the effect that
although the WRAT was the second most popular
achievement test in clinics, Sundberg could not
find a single empirical study of it. They adminis-
tered the Reading subtest to 502 children in
grades 1 to 5 and correlated the scores with
teacher ratings and scores on the California
Reading Test (CRT). The correlations with teacher
ratings were high for grades 1 to 5—0.79, 0.74,
0.86, and 0.85, respectively. The correlations
with the total score of the California Reading
Test were 0.86 for grade 3 and 0.71 for grade 5.
The mean grade placements on the WRAT, for
the five grades in order, were 1.4, 2.4, 3.5, 4.1,
and 4.7.
Wagner and McCoy (303) reported correla-
tions of the WRAT Reading subtest with the
Sangren-Woody Silent Reading Test (grade level)
for two samples, one of 29 fifth graders and the
other of 57 primary school juvenile offenders.
The correlations were 0.78 and 0.74. In the first
sample, the WRAT Reading correlated 0.78 with
both teacher ratings and with rank order of mid-
term grades. The correlation with the Stanford
Reading Test, in the second sample, was 0.80.
25
Table 4.
Studies reporting reliability coefficients of the WRAT
Tavestigator | Voor | Subjects | IRRPE. | ape venge | ZEEE | P-|Sugiiiiey| Sghrens | Bene |Selighiliey
Jastak and 1946 | Normals®--| Test-retest N.R. Reading------ 110 0.95 Arithmetic--| 120 0.90
Bijou (302).
Jastak (301) --| 1963 | N.R, -===-- Split-half |---==eeeeen-—- i eo so me hr Arithmetic-mp mmm ofmmo mee aeann
20+ years | ==---------- - 200 0.99 |emeemeeneao 200 0.97
18-19 years | -==-===-e-an 200 0.98 |=-emmceemaaa 200 0.97
16-17 years | =====meeeeen 200 0.99 | --mmmmmeen 200 0.95
15 years | =-----eeeemooa 200 0.99 | --mmmeeeeean 200 0.97
14 years | ==--eceeeoand 200 0.99 | =eemmmmmeee- 200 0.96
13 years | =-=-ceemmen- 200 0.99 |==mememeenan 200 0.96
12 years | =--==--=meeeo-- 200 0.99 | ---mmmmmmean 200 0.94
Reading, = | ===-=-f---cmmccaaan Arithmetice=ee=mmcobocmmmmaaaao
Level I, }
11 years |---==-ec---nnq 200 0.99 |=memmmemeeen 200 0.95
10 years |[=--=-c-conoad 200 0.99 [==memememnenn 200 0.95
9 years |=-=----c--oano 200 0.99 |=--mmemeeae- 200 0.94
8 years |-----emeooood| 200 0.99 [memes 200 0.95
7 years |-------oooooo 200 0.99 [--e--memmeen 200 0.96
6 years |=---e-c-eoaan 200 0.99 |---emeeeeee- 200 0.96
5 years |=--m-ecmeaoa-d| 200 0.98 |----memeena- 200 &.10./97
Standard- | Form I with |--==-=ncecoau Reading-=====lommmcmcbmmccmeemeeee Arithmetic-=fe=mecobocooonaoao
ization Form II.
Tation.
14-0 = 14-11 |=--=mmmemaann 89 0.88 |==--memeeee- 87 0.86
13-0 = 13-11 |----emm-mmmmn 224 0.90 |-m--mmeeeeee 194 0.87
12-6 = 12-11 |~===cmmemcnmun 180 0.94 |-m-emmmeoea- 165 0.85
12-0 = 12-5 |==--cccnennan 179 0.92 |-memmmemean 164 0.86
11-6 = 11-11 |-=-=ccmmeeae- 252 0.91 |----memmmee- 225 0.85
1150 = 11=5 |=nwesnenswews 197 QOL EI or mms, 191 0.82
10-6 = 10-11 |--==cc=eccen-n 214 0.93 |=eemmmemea- 195 0.89
10-0 = 10-5 |=-=--cemmmean 207 0.90 |--e--meea- 190 0.84
9-6 = 9-11 [--ccccccmaaa- 165 0.91 [--e-eemmeen 160 0.79
9-0 = 9-5 [--meememenaad 81 0.90 [-mmememeeeeo 78 0.88
Level of subjects and time interval between tests not reported.
NOTES: All correlation coefficients are Pearson Product-Moment unless otherwise specified.
N.R.—Not reported.
26
Table 5. Studies reporting correlation between the WRAT and other measures
Number
Investigator Year Test or criterion variable Subjects” Age range Correlation
z M F
WRAT Reading Test
Smith (126) -==-=--m=cmaum 1961 | Full Range Picture Vocabulary Normals, 6-11 - 8-10 | 100 51 49 0.42
Test. Grade 2.
Hopkins, Dobson, and 1962 | California Achievement Test-=-=-=-=-=-- Normals----=-- N.R. 257 | mmm | wm frmmm———————
Oldridge (304).
Reading Vocabulary---===wmeneene" Grade 3--=--- N.R. 171 | =m | mmm 0.83
Grade 5-===-- N.R. 86 | === | === 0.67
Reading Comprehension ----e-c-oo-. Grade 3------ N.R. 171 | === | === 0.84
Grade 5------ N.R. 86 | --- | --- 0.67
Total Readingr---2-mememeemenen—- Grade 3------ N.R. 171 | === | === 0.86
Grade 5------ N.R. 86 | === | === 0.71
Smith (126) -=-=--=--==-== 1961 | California Test of Mental Maturity-| Normals, N.R. 100 51 49 0.47
Grade 2.
Lawson and Avila (305)--- | 1952 | Gray Standardized Oral Reading Mental de- 16-45 years | 30 | 19 | 11 0.94
Paragraphs Test. fectives.
b
Reger (307) -=---==-=me==-= 1962 | Metropolitan Achievement Tests, Retarded 9-9 - 14-6 05) | wm (f womine . 0.76
Reading. boys.
Wagner and McCoy (303)--- | N.R. [Midterm grades------=--c===-cmecan- Normals, N.R. 29 [=== [=== .78
Grade 5. (rank order)
Jastak and Bijou (302)--- [1946 | Stanford Achievement Test, Reading--| Normals, N.R. a
Grades 7
and 8.
Word Meaning---=--=====-=mm-ecccmm mem mmm mmm N.R. 389 | === [=== 0.84
Paragraph Meaning=-=---==-=-=c-mcoolommmmm eens N.R. 389 |---| --- 0.81
Wagner and McCoy (303) ~--- | N.R. | Sangren-Woody Reading = = |[============-==- N.R. SR eC
Test.
Normals, N.R. 29 |---| === 0.78
Grade 5.
Juvenile of- N.R. 57 | === | === 0.74
fenders.
Stanford Reading Tests---=====mmm=- Juvenile of- N.R. 47 | === | === 0.80
fenders,
Teacher rating of reading ability--| Normals, NR. 29 |---| === 0.78
Grade 5.
Hopkins, Dobson, and 1962 | Teacher rating of reading ability--| Normals=-====mmmmemeneanaa= 502 | === | === |rmmmmmmm———-
Oldridge (304).
Grade l------ N.R. 90 |---| --- 0.79
Grade 2------ N.R. 106 |---| --- 0.74
Grade 3------ N.R. 171 | === | === 0.86
Grade 4------ N.R. 49 | ==m | === 0.86
Grade 5------ N.R. 86 | --~ | === 0.85
Smith (126) -=--=====cun== 1961 | Wechsler Intelligence Scale for Normals, N.R. 100 ||. SL. | 49 amen immmmmmen
Children. Grade 2.
Verbal Score = 0.55
Performance SCOre-==-=-=-===-em- oom eee meme meh mmm fe mm 0.47
Full ScCOr@==-==smmemc meee meee meee meee eee hee mm mm] 0.61
See footnotes at end of table.
27
Table 5.
Studies reporting correlation between the WRAT and
other measures—cCon.
Number
Investigator Year Test or criterion variable Subjects? Age range Correlation
z M F
WRAT Arithmetic Test
Holowinsky (309) -----=---~ 1961 | California Reading Test-=-========= Normals and |12-17 years | 600 |---| == 0.61
retarded,
Murphy (306) --==-=-==-=-=== N.R. | First-quarter grades-------=----=--- Normals--===mfooommanonoae 24] | mem | mmm fmmmmmmmmmmeem
Grade 5----- N.R. 135 [mmm [| === 0.64
Grade 6----- N.R. 106 |---| --- 0.56
Holowinsky (309)------=--- 1961 | Grade placement--------emeceeenn——— Normals and 12-17 years | 600 | =-- | === 0.31
retarded.
Reger (307)-==----mmm-mn= 1962 | Metropolitan Achievement Tests, Retarded 9-9 - 14-6 | 25 |---| ~~~ b0.87
Arithmetic. boys.
Jastak and Bijou (302)--- | 1946 | Stanford Achievement Tests, Arith- | Normals, N.R. 140 | === | === 0.91
metic Computation. Grades 7
and 8.
Holowinsky (309) --------- 1961 | Otis Quick Scoring Mental Ability | Normals, 12-17 years | 600 | === | === 0.30
Tests. retarded.
12-13 years | N.R.| === | === 0.59
13-14 years | N.R.| === | === 0.39
14-15 years | N.R.| === | === 0.54
15-16 years | N.R.| --- | ==~- 0.02
16-17 years | N.R.| === | --- 0.09
Murphy (306) --=-=-=mmmmnv N.R. | Stanford Achievement Tests, Arith- | Normals-====-=f==========-- 241 | mmm | mmm prmmmmmmee——
metic, and school grades.
N.R. 135 | mmm | w= 0.59
N.R. 106 | === | ~~~ 0.35
Stanford Achievement Tests, Arith- | Normals------ |e elm nace 241 [rm=msfprmm= mm nm
metic, and school grades.
Grade 5----- N.R. 135 | wom | === 0.75
(Multiple r)
Grade 6----- N.R. 106 | ww= | === 0.70
(Multiple r)
Designation of subjects are always white Americans unless otherwise specified.
Spurious correlation with age for small N.
NOTES: All correlation coefficients are Pearson Product-Moment unless otherwise specified.
3 —Total population; M—male;
28
F—female; N.R.—not reported; r—correlation.
The report by Lawson and Avila (305) of a
correlation of 0.94 between the WRAT Reading
subtest and the Gray Oral Reading Test, adminis-
tered to a sample of retarded adults ranging
widely in age and IQ, is probably inflated because
of the nature of the sample. Similarly, Reger's
(307) sample of 25 emotionally disturbed, re-
tarded boys (age range 9-9 to 14-6) is also quite
a diverse population. Reger reported a correlation
of 0.76 between the WRAT Reading subtest and
the Metropolitan Achievement Test.
Holowinsky (309) had an apparently well-
designed sample of 600, including 75 children at
each age from 12 to 16 years. Each group was
divided into three categories on the basis of IQ
scores. The categories were as follows: 80-89 1Q),
90-99 1Q, and 100-109 IQ. For the total sample of
600 children, the California Reading Test corre-
lated 0.61 with the WRAT Arithmetic subtest.
Students of lower intellectual ability tended to show
better achievement in arithmetic than in reading.
For the total sample of 600 children the WRAT
had a correlation of 0.31 with grade placement.
These limited results tend to support the
claims for the WRAT with regard to concurrent
validity both with other reading tests and with
grade placement. The evidence is far from suf-
ficient to permit definitive evaluation, and the lack
of information on many points is obvious. However,
no contrary evidence was found and as far as these
papers are concerned, the report for the WRAT
Reading subtest is favorable.
Arithmetic
The most adequate independent study of the
WRAT Arithmetic subtest is that of Murphy (306),
who tested 135 fifth and sixth graders (with
average IQ of 114) with the WRAT and the Stan-
ford Achievement Test (SAT). The correlation of
the two tests was 0.59 for grade S and 0.35 for
grade 6. The correlations between Arithmetic
grades and the WRAT were 0.64 for grade 5 and
0.56 for grade 6. Correlations between the SAT
and Arithmetic grades were 0.68 for grade S and
0.59 for grade 6. In Reger's sample, noted above
(307), the WRAT Arithmetic testhada correlation
of 0.87 with the Metropolitan Achievement Test.
Holowinsky's study mentions a correlation of 0.59
between the IQ scores of 12-year-olds and the
WRAT Arithmetic subtest, as compared with 0.71
for the Reading subtest.
These results are less satisfactory than
those for Reading in the respect that the corre-
lations reported compare less favorably with those
mentioned in the manual. This type of cross-
validation is imperative and demonstrates the
importance of independent reports to supplement
the data provided in a test manual. To Dr.
Jastak's credit, however, it should be noted that
the Murphy report, in which the lower corre-
lations appear, is an unpublished paper which he,
Dr. Jastak, furnished unsolicited for this review.
These studies are insufficient for an evaluation of
the WRAT Arithmetic subtest, to be sure. As the
only information available, they leave the case for
the Arithmetic test without strong independent
support.
1963 EDITION OF WRAT
Two major changes appear in the 1963 edi-
tion. One is the division of the testinto two levels.
Level I covers the age range of 5 to 12 years;
Level II covers the age range 12 years through
adulthood. It is pointed out in the mimeographed
manual for this edition that this change not only
has reduced the time of test administration, but
also has increased the number of items at each
level, thereby increasing ''the already highrelia-
bility" of the test. Indeed, the test has been
lengthened, and the reliabilities have been listed
for samples of 200 each for ages 5 through 11
years (Level I). For Reading, all—with the ex-
ception of 5 years of age—correlate 0.99. (Age 5
correlates 0.98.) Similarly computed reliabilities
for Arithmetic are listed at or above 0.94, with
the highest correlation, 0.97, occurring at 5 years
of age. Since these coefficients are based on corre-
lations between two forms of the test, they are
considered by the authors to be inflated. The text
of the reliability section of the manual (301, p.
47) states that the reliability coefficients are
more likely within the range 0.90 to 0.95 with a
mean of 0.92. At this level, they do not seem
perceptibly higher than the reliabilities reported
in the 1946 manual.
The second major change is in method of
standardization. The 1963 manual (301)describes
29
the development of norms and the normative popu-
lation sample as follows:
The revised WRAT was administered to
school children and adults in a number of
states: Delaware, Pennsylvania, New Jersey,
Maryland, Florida, Washington, and Cali-
fornia. No attempt was made to obtain a
representative national sampling. Nor is.
such a sampling considered essential for
proper standardization. (italics added)
The groups of children were selected from
schools of known socioeconomic levels. The
IQ's of the children were also known from
group tests such as the Lorge-Thorndike, the
Kuhlmann-Anderson, and the California Men-
tal Maturity Test, administered at the
schools. Many of the cases (over 1,000) in
the standardization group had been given
individual tests such as the Stanford-Binet,
Wechsler Intelligence Scale for Children,
and others. In each age bracket, probability
samplings based on IQ's were studied to de-
velop WRAT norms that would correspond to
the achievement of mentally average groups
with representative dispersions of scores
above and below the mean. (italics added)
From the standpoint of the Health Exami-
nation Survey, with particular reference to Cycle
II (children aged 6-11 years), the first of the two
mentioned changes is an advantage. The age
range of Level I fits the age range of Cycle II
perfectly, and the increased length of the test
and more extensive reliability studies reported
support the claim of excellent reliability. The
second change, in standardization and norm
development, does, however, present a potential
problem which is accentuated by the absence of
validity data. This is discussed below,
Validity and Norms
Although published in 1963, the validity sec-
tion of the revised WRAT was not available for
review until late in June 1964. The delay was
explained by the author of the test as occasioned
by comparison of the WRAT ''with a number of
other tests in order to determine the meaning
and diagnostic value of the three subtests in re-
lation to other abilities." In addition, his letter
30
disclosed that ''specific methods to identify, in
individual cases, the size of the independent and
separate variances will have to be developed.
Since this is somewhat of a novel and pioneering
venture, it takes more time than routine manual
preparation.’ The latter quotation is discussed
separately below.
The basis for the present evaluation is, then,
a comparison of the content and structure of the
1946 and 1963 editions of the WRAT, supplemented
by the limited independent literature on the 1946
edition, reviewed above, and the limited data on
the 1963 edition provided in the manual furnished
by the author. No independent studies of the 1963
edition were available.
Comparison of the Two Editions
Examination of the two booklets indicates
close similarity in item content, format, adminis-
tration, and scoring. The Reading test for Level
I, in the revised edition, contains 5S words that
were in the 1946 edition, and their rank order of
sequential position in the two editions is about
0.99. It is presumed that the 20 new words were
empirically calibrated to fit into the previously
established word order. The arithmetic items of
the new test are of the same general type as in
the earlier test, although the format is slightly
different and the number of items is increased.
In view of this similarity, it appears reason-
able to expect that the network of correlations of
the revised test with other measures would be
approximately the same as that reported for the
1946 edition. In fact, the correlations might even
be slightly higher as a result of the greater
length of the revision. To the extent that con-
current validity could be accepted for the 1946
edition, therefore, there is no reason to doubt
that it will be upheld with the 1963 edition. Al-
though the data are quite inadequate, tentative
acceptance on this point appears warranted,
based on the authors' reputations and the state-
ments in the manual. However, this is only part
of the problem.
Validation of 1963 Edition
It is equally important to be able to meaning-
fully interpret the grade ratings, standard scores,
and percentiles in relation to individual age and
grade placement and in relation to population
parameters. In the absence of empirical infor-
mation on this issue, nothing definite can be con-
cluded. It is appropriate to raise some questions
which have been generated by statements made in
the 1963 manual.
In the first place, the reviewer would take
issue with the test author's statement that a
representative national sampling is not essential
for proper standardization. A national sample is
certainly necessary if national norms are to be
promulgated. Although the 1946 edition was de-
veloped on a restricted (as opposed to national)
sample, its norms were presumably keyed to the
grade norms of the New Stanford Achievement
Test, for which a more extensive base existed.
Even though regional, ethnic, and other perturbing
effects were not known, it was at least possible
to invoke the Stanford norms in interpreting grade
levels. With the 1963 edition, however, no such
anchoring process was followed. The only indi-
cations concerning age-grade levels are, in fact,
disquieting.
The manual goes on to say that intelligence
quotients of a number of group and individual
tests (which are generally known to vary inlevel
among themselves) were used to select samples
in each age bracket "that would correspond tothe
achievement of mentally average groups with
representative dispersions of scores above and
below the mean." (italics added) It would indeed
be remarkable if such a procedure could produce
a standard reference sample of known character-
istics for normative purposes. Therefore it is
doubtful that the resulting norms could have de-
pendable accuracy for individual assessment or
for analysis of groups in the manner required
for the national sample of the Health Examination
Survey. Perhaps the test author's current con-
cern with comparisons with other tests, referred
to above, reflects realization of this problem.
Furthermore, in view of the professed clini-
cal purposes of the WRAT, itis surprising that the
standardization research is confined to ''mentally
average groups,’ and that no studies were under-
taken of such groups as gifted pupils, students
retarded in reading, arithmetic, and other school
subjects, disturbed children, and subnormal chil-
dren.
For the purposes of a national survey, prob-
lems of ethnic and regional variations in test
performance are important, as are other sources
of perturbation attributable to deviations of abili-
ty, personality, and physical and social factors.
The absence of such data for the 1963 WRAT is
certainly not the sole responsibility of the author -
publisher; ordinarily test producers donotassume
responsibility for all possible research of interest
to all possible users. If a test attracts interest,
information about it in various situations gradu-
ally accumulates in the literature. However, in
the present case it appears fair to say that the
author's confidence in his test led him to publish
the revision before he had completed his own
research and before research on it by any users
could be reported. The test was issued without
a formal designation of the norms as ''tentative"
and without any qualifications.
Validity Variances
Instead, the 1963 manual (301, p. 2) concludes
its introductory section with the following para-
graph:
In addition to the three operational aspects
(of mechanics and comprehension in relation
to each skill test) the basic skills have sever-
al unique validities which will be explained
later by reference to appropriate research.
The validity variances will not only support
the empirical distinctness of mechanics and
comprehension, but will provide the degrees
to which each is important in learning to
read, spell and figure and the impact the
relationship between them has on the total
learning process.
The burden of proof is on the author. The
development of such an analytic scheme for inter-
pretation of test scores is indeed both novel and
ambitious and deserves all the time required to
complete it. It seems regrettable, however, that
the test was released before critical users could
evaluate not only these devices, buteven the grade
ratings, percentiles, and standard scores included
in the manual.
Validity Data in 1963 Manual
The section of the manual entitled "Validity
of the WRAT'" (301, p. 51), contains a table of
means and standard deviations of raw scores for
31
the Reading, Spelling, and Arithmetic subtests,
which indicates considerable need for refine-
ment of the tests in order to produce an even
progression of scores from grade to grade. The
difficulties are considerable at some levels (8.0
to 8.5, 9.5 to 10.0, and 10.5 to 11.0, on the Read-
ing test, for example), to say nothing of the fact
that the basic difficulties reported about the
standardization sample are not only notclarified,
but are not even referred to in this section of the
manual.
Two paragraphs on the validity of the Read-
ing test (301, p. 50) refer only to the studies
cited above, which involve the 1946 edition of the
WRAT. No validity data on the 1963 edition are
presented. Similarly, data are presented (301,
p. 52) on correlations of the WRAT with achieve-
ment tests and on the validity of the Arithmetic
subtest, but these are also identified as relating
to the 1946 edition.
Internal consistency data cited by the author
(301, p. S53) involve intercorrelations among the
three WRAT subtests and not validity, despite the
author's assertion that "criteria of internal con-
sistency, if properly interpreted, are usually
more valid than are external criteria of com-
parison." These data are also presented as ''one
method of cross-validation."
Correlations of the Wide Range Achievement
Test with the California Test of Mental Maturity
are given (301, p. 54) for a sample of 74 children
spanning the age range of 5 to 15 years. They
range from 0.74 to 0.84 and may be spuriously
high in view of the heterogeneity of the sample.
Similarly structured comparisons with the WISC
for 300 boys (aged 5 to 15 years) and 244 girls
(aged 5 to 15 years) are reported which indicate
correlations as follows:
Sex and test Reading | Arithmetic
Boys
Vocabularyleceecea- 0.65 0.56
Block Designee=e=== 0.41 0,41
Girls
Vocabularyleeeceaa= 0.56 0.56
Block Design==e==== 0,39 0.50
1
Based on Jastak’s short-form revision (311).
32
In view of the composition of the sample, these are
surprisingly low.
The manual also reports (301, p. 55) cor-
relations of WISC Verbal Scale, Performance
Scale, and Full Scale with the WRAT (1963), with
samples covering narrower age ranges of 5 to 7
years and 8 through 11 years. The results here
are the most impressive concurrent validity data
in the manual, although they indicate correlations
in the 0.6 to 0.7 range with intelligence rather than
achievement criteria, for which they are intended.
As stated several times earlier, the accuracy
of score levels in the WRAT norms is regarded as
a more pressing problem for empirical demon-
stration than the concurrent validity (covariation
with related measures) of the test. On this point
the validity section of the manual is silent.
Grade Equivalents
The 1963 manual (301, p. 22) states thatgrade
norms were derived from ''the actual mean grade
levels of the children in each grade group.’ De-
spite variations in school grade-placement prac-
tices over time, grade rating is characterized as
"rather stable." The manual further asserts
"striking comparability of grade ratings of the
old and the new WRAT's "through nearly all edu-
cational levels except the upper ranges." Grade
ratings below 14 years of age are said to be less
arbitrary than grade ratings over 14 years of age.
The grade scores are intended to be comparable to
mental ages.
Standard Scores
The WKAT standard scores can be converted
from raw scores by age group in a table provided
in the manual. The standard score has a mean of
100 and a standard deviation of 15 and is intended
to be equivalent to an IQ from the WAIS, WISC,
Stanford-Binet (Form L-M) or any of the major
intelligence scales. Although these scales are not
comparable themselves (as developed in some
detail in section! of this report), the manual states
that "the results from the WRAT test can thus be
directly compared with the major individual in-
telligence scales."
The standard score is asserted to be the
"most precise and most meaningful score." It is
the only score that is comparable between sub-
tests and that provides for uniform differences
between scores.
Percentiles
Percentiles are included -''because of their
present popularity and convenience," but the
manual appropriately downgrades them and dis-
courages their use.
SUMMARY AND CONCLUSIONS
The foregoing review of the WRAT is neces-
sarily incomplete because of lack of adequate
information on which to base a technical evalua-
tion. The test is well conceptualized and has much
face validity, but standardization information on
the 1946 edition was inadequate, and on the 1963
edition it is thus far insufficient.
Published research on the 1946 WRAT has
been extremely limited and fails to answer most
of the questions left unanswered by the authors’
manual. Moreover, analysis of the available in-
formation on the 1963 edition raises doubts about
normative score levels.
The selection of the WRAT over other avail-
able school achievement tests may be defended on
the grounds of administrative expediency and
suitability of the material for the purposes of
the Survey, in spite of the fact that inadequate
data exist to support the author's claims of va-
lidity. It is possible that such data may be pro-
duced, and every effort should be made to obtain
them. However, unless these results are con-
vincing—and reason to doubt that they will be
has been expressed—it is recommended that
serious consideration be given to carrying out a
complete restandardization of the Reading and
Arithmetic subtests on the entire national sample.
Unless this is done, projections of estimates to
population may be seriously in error.
BIBLIOGRAPHY
Research References and Manuals
301. Jastak, J. F.: Wide Range Achievement Test, rev. ed.
Wilmington, Del. Guidance Associates, 1963.
302. Jastak,d. F., and Bijou, S. W.: The Wide Range Achieve-
ment Test. Wilmington, Del. C. L. Story Co., 1946.
303. Wagner, R. F., and McCoy, F.: Two validity studies of
the Wide Range Achievement Reading Test. Personal
communication.
304. Hopkins, K. D., Dobson, J. C., and Oldridge, O. A.: The
concurrent and congruent validities of the Wide Range
Achievement Test. Educ.Psychol.Measur. 22:791-793,
1962.
305. Lawson, d. R., and Avila, D.: Comparison of Wide Range
Achievement Test and Gray Oral Reading paragraphs
reading scores of mentally retarded adults. Percept.Mot.
Skills 14:474, 1962.
306. Murphy, G. M.: An investigation of the utility of mathe-
matics sub-test from the Wide Range Achievement Test,
as applied to intermediate level groups. Personal com-
munication.
307. Reger, R.: Brief tests of intelligence and academic
achievement. Psychol.Rep. 11:82, 1962.
308. Warren, S. A.: Academic achievement of trainable pupils
with five or more years of schooling. Train.Sch.Bull.
60:75-88, 1963.
309. Holowinsky, I.: The relationship between intelligence
(80-110 I.Q.) and achievement in basic educational
skills. Train.Sch.Bull. 58:14-22, 1961.
Other References
310. Chauncey, H., and Dobbin, J. E.: Testing, Its Place in
Education Today. New York. Harper and Row, 1963.
311. Jastak, J. F., and Jastak, S. R.: Short forms of the WAIS
and WISC vocabulary subtests. J.Clin.Psychol. 20:167-
199, 1964.
312. Sundberg, N. D.: The practice of psychological testing
in clinical services in the United States. Am.Psycholo-
gist 16:79-83, 1961.
33
ll. THE GOODENOUGH DRAW-A-MAN TEST
BACKGROUND AND DEVELOPMENT
A comprehensive historical survey of the
study of children's drawings appeared recently
in an important new book by Dale B. Harris (522),
a former colleague of Florence Goodenough and
apparent successor to her in the leadership role
in the measurement of children's intelligence by
point scales based on drawings of the human
figure. The present review does not duplicate
Harris' scholarly survey, but focuses more
specifically on the problems of the Goodenough
Test as used in the Health Examination Survey.
The first formal intelligence test based on
the analysis of children's drawings was published
by Florence Goodenough (595) in 1926, but the
literature on this subject goes back at least to
1885 (595, ch. I). Some of the early papers are
summarized in this study, but the major emphasis
has been placed on recent critical research on the
Draw-A-Man Test and its variants, Nevertheless,
it is of interest that in 1893 Herrick (501) demon-
strated the developmental significance of profile
drawings and that in the same year Barnes (502)
recognized that drawings are used by young chil-
dren as a means of expressing their ideas. Mean-
while, Lukens (503), in 1896, outlined many details
of human figure drawings which were later in-
corporated in the point-scoring systems of Good-
enough (595) and of Harris (522).
The Goodenough Test is referred to in this
discussion as the Draw-A-Man Test although the
specific instructions in Cycle II of the Survey are
to '"'make a picture of a person." However, the
instructions go onto state that ''when a bust picture
has been drawn intentionally, the child is given
another sheet of paper with the instruction 'Now
make a picture of a whole person.'" Only one pic-
ture is used.
Rationale
In this procedure emphasis is placed on the
representation of details in the drawing tomeasure
conceptual maturity, Drawing technique is mini-
mized, and distortions potentially usable as cues
for personality evaluation are not scored. Recent
34
drawing tests focused on personality study have
used two or more drawings. For example, Mach-
over (596) instructs the subject to "draw a person
and then to draw a person of the sex opposite to
‘the one previously drawn, while Buck (594) uses
drawings of a house, a tree, and a person. In
general, the cues and signs interpreted in person-
ality study of drawings are different from those
employed for the measurement of intelligence.
Point-Scoring System
The point system developed by Goodenough
(595) for drawings which can be recognized as
attempts to represent the human figure—no matter
how crude—involves the presence or absence of
51 detailed points, which are listed as follows:
1-4a Head, legs, arms, trunk present
4b Length of trunk greater than breadth
4c Shoulders definitely indicated
Sa Attachment of arms and legs
Sb Legs attached to trunk; arms attached to
trunk at correct point
6a Neck present
6b Outline of neck ~ontinuous with that of
the head, of trunk, or both
7a-c Eyes, nose, mouth present
7d Both nose and mouth shown in two di-
mensions; two lips shown
Te Nostrils shown
8a Hair shown
8b Hair on more than circumference of head;
nontransparent
9a Clothing present
9b At least two clothing items nontransparent
9c Entire drawing free from transparencies
of any sort; sleeves and trousers shown
9d At least four clothing items definitely
indicated
%e Costume complete without incongruities
10a Fingers present
10b Correct number of fingers shown
10c Detail of fingers correct
10d Opposition of thumb shown
10e Hand shown as distinct from fingers or
arm
lla Arm joint shown (elbow, shoulder, or
both)
11b Leg joint shown (knee, hip, or both)
12a-e Proportion: head, arms, legs, feet, two
dimensions
13 Heel shown
14a-f Motor coordination
a Lines reasonably firm and joining usually
accurate
B Increased firmness of lines and increased
accuracy of line junctions
c Head outline free from unintentional ir-
regularity
d Trunk outline free from unintentional ir-
regularity
e Arms and legs without irregularities,
narrowing at point of body junction
f Features symmetrical
15a Ears present
15b Ears in correct position and proportion
l6a-d Eye detail, brow, lashes, or both shown;
pupil shown; proportion; glance
17a Both chin and forehead shown
17b Projection of chin shown; chin clearly
differentiated from lower lip
18a-b Profile drawings
Standardization
In Goodenough's original research, point
scores based on these items were equated to age
norms from which intelligence quotients could be
computed in the same manner as in the Stanford-
Binet test. Data on reliability and validity were
reported in the 1926 book (595) and also in a
monograph (504) published the same year. Using
a basic standardization sample of 5,627 school
children from kindergarten to the sixth grade aged
4 to 12 years, split-half and retest reliabilities
were computed, A split-half reliability of 0.77
(corrected) was found to be constant from 5 to 10
years of age, and a retest reliability coefficient
of 0.94 was reported for 194 first-grade children,
Correlations with Stanford-Binet were 0.76 for
mental ages and 0.74 for intelligence quotients.
The experimental work, analysis, and reporting
which characterized this undertaking would be
regarded as impressive today, and the critical
reader of Goodenough's book can well appreciate
Lewis M. Terman's description of it (in the fore-
word) as ''a notable accomplishment."
Perspective
In 1950, a quarter of a century after the pub-
lication of her book, Goodenough collaborated with
Dale Harris in a review (510) ofthe extensive lit-
erature generated by her test. This review was
critical of many studies of graphic expression
that lacked quantification, but it acknowledged the
value of drawings used projectively as a source
of diagnostic cues. Goodenough and Harris made
special note of some writers' attempts to attribute
discrepancies between the Draw-A-Man Test and
the Stanford-Binet (in which Draw-A-Man IQ's
are markedly lower) as possible diagnostic cues
of emotional or nervous instability or of brain
damage. They also cautioned about the use of the
Draw-A-Man Test incross-culturaicomparisons,
pointing out that the Draw-A-Manis nota culture-
free test, as many users have incorrectly as-
sumed. This point is most dramatically illustrated
by the Near Eastern study of Dennis (555).
In the Fourth Mental Measurement Year-
book, 1953, Stewart (514), while presenting a
very favorable evaluation, suggested that the
Goodenough norms might require revision due to
social changes which have occurred since the
original standardization. Such a revision was
apparently justified, and the new Goodenough-
Harris Drawing Test (552), published in 1963,
fills an important need. This modified procedure
consists of three drawings: a man, a woman,
and "yourself." Separate point scales are pro-
vided for drawings of men and drawings of women;
separate norms are also provided for drawings
made by boys (men) and drawings made by girls
(women).
An empirical study on a sample of 195 draw-
ings taken from the Health Examination Survey
population, in which the Harris scoring and norms
were compared with the original Goodenough
scoring and norms, is reported below. This study
supports a recommendation that the Harris revi-
35
sion be adopted for scoring the Goodenough test in
this Survey.
EVALUATION OF INTELLIGENCE
BY HUMAN FIGURE DRAWINGS
Effective Range
Barnes' (502) early observation that children
draw candidly up to about 14 years of age and
then more abstractly is supported by Barnhart
(507), who described three types of drawings—
schematic (graphic representation), predominat-
ing in the age range 5 to 9 years; mixed, in the
range 8 to 13 years; and visual realistic (abstract-
ed, esthetic, nonspecific as to factual details),
principally in the range 10 to 16 years. This
apparently explains why the point scores cannot
be validly extended above 14 years of age (522).
The increase in point scores with age, up to
14 years of age, apparently reflects mental matur-
ity and not chronological age. This was noted by
Smith (506) and by McElwee (524), who reported
a correlation of 0.72 between the Draw-A-Man
and the Stanford-Binet mental ages for a sample
of 45 subnormal 14-year-old children. Israelite
(562) found a correlation of 0.71 between the
Draw-A-Man and the Stanford-Binet for 256 men-
tal defectives. Others have also successfully
tested mentally defective adults with the Draw-A-
Man Test.
Relation to Artistic Ability
An area of special interest in the interpreta-
tion of children's drawings has been the relation
of drawing "'maturity," as reflected in point score,
and artistic ability. Goodenough acknowledged that
drawings could be influenced by special coaching
(as can most human responses) but that ordinary
art instruction in school has little effect on the
Draw-A-Man score. She reported a correlation
of 0.44 between the Draw-A-Man and teacher
ratings of drawing ability (504).
Perturbing Factors
Intelligence scores based on drawings are
relatively independent of artistic ability, However,
there is evidence that both internal factors, such
36
as health, emotions, and attitudes, and external
environmental factors affect the drawing content.
In the present review, studies have been found
which demonstrate the influence on drawings of
factors such as height and weight (543), sex and
body image (512, 537-539, and 541), physical
handicaps (571 and 572), mental age (521), affec-
tive states experienced and experimentally in-
duced (529, 530, and 532), institutionalization
(540), teacher attitude (533), sociometric popu-
larity (534), social acceptance (531), and social
class (536).
Although size of drawings appears to increase
with mental age over the effective range of the
Draw-A-Man, size standards have not been incor-
porated in any of the published point scores. In
general, the studies referred to in the preceding
paragraph may be viewed as minor perturbing
influences within a homogeneous cultural frame-
work. Variability among drawings attributable to
perturbing factors of the types enumerated within
the social boundaries of the American culture
appears to have significance for the study of
personality and social behavior, but it does not
appear to influence measures of intelligence de-
rived from children's drawings in the age range
5S to 12 years.
Culture
The factors which influence children's draw-
ings of the human figure most are those that re-
flect the effects of a culture's customs and
values, since these determine the way in which
children are exposed to different representations
of the human figure in dress, art, photographs,
religious practices, and sex roles and attitudes.
Hunkin (554) found the Goodenough norms inap-
plicable to Bantu school children, and Dennis
(555) attributed the steady decline in mean Draw-
A-Man IQ from 5 to 10 years of age (among
Egyptian and Lebanese children in the Near East)
to the Arab culture, which restricts access to
representations of the human figure. Studies of
the Draw-A-Man with children of various Ameri-
can Indian tribes on reservations (558-560) have
produced varying results which may perhaps be
understood only in the context of their respective
culture patterns.
On the other hand, Anastasi and DeJesus
(556) found sex differences in agreement with
Harris, discussed below, but found no ethnic dif-
ferences in a comparison of Draw-A-Man scores
of 50 Puerto Rican children of low socioeconomic
class in New York City with those of Negro and
white children of similar status which were re-
ported by other investigators. Similarly, Levinson
(243) found that the Draw-A-Man, as well as WISC
Block Design, is culturally "fair" for native-born
Jewish bilingual children in New York City.
The importance of taking into account cultural
variations when dealing with a heterogeneous pop-
ulation such as that sampled by the Health Exami-
nation Survey is illustrated by the following quota-
tions from Harris (522, pp. 131 and 132). These
quotations have been exerpted to illustrate how the
customary dress of Eskimo children affects point
scores on drawings of the human figure.
Eskimo children are less likely to depictthe
neck, the ears, and to correctly place the
ears. These facts seem to reflect the greater
prevalence of parkas in the Eskimo group's
drawings and [this] is thus an artifact of the
drawing situation. Due to the voluminous
parka garments, elbow joints, knee joints and
modeling of the hips are less likely [to be]
shown, resulting in greater stiffness of fig-
ures portrayed.
Since the Eskimo boot does not have a heel,
Eskimo children are less likely to indicate
heels in their drawings. [Several instances],
however, show that when the garb is appro-
priate, the heel is shown. The children do
have the concept of heels; their drawings are
quite appropriate to the type of figure they
are representing at the time. Eskimo chil-
dren are also less likely to portray the arm
and shoulder performing some type of move-
ment, probably due to the loose parka, though
this is not invariably the case.
On the other hand, Eskimo children are more
likely to portray with exactness the nostrils,
the bridge of the nose, and, when portrayed
at all, the thumb or fingers. The character-
istic tendency of the Eskimo children to show
a mittened hand earns for them a greater
credit on the thumb opposition point and on
the hand as distinct from fingers or arm in
the age group ten to thirteen inclusive. In
this age group also the Eskimo is more
likely to draw the arms down at the side
than held out stiffly from the body. The Es-
kimo child is more likely to show the feet
with a wide stance, that is, with toes pointing
apart, or in perspective in either full-face
or profile drawings. The Eskimo drawings
include fewer transparencies in these age
groups, and a larger percentage of them earn
credit for showing a distinct costume, which
of course follows from the tendency to draw
the parka—the everyday costume in this part
of Alaska.
Aspects of the Eskimo drawings thataredis-
tinctive and that are not apparent in the de-
tailed scoring technique of the Goodenough
method include: a greater emphasis on the
eyebrow, on the nostrils and nose (as in-
dicated above), and on general detail of facial
features. There is some evidence of a general
decrease in quality of the drawing in adoles-
cence. This isnot sufficiently great, however,
to reveal itself markedly in the trend of
median scores as in the normative group. It
is most noticeable in the increased tendency
to draw the facial features and hands "'sketch-
ily." Particularly among young Eskimo chil-
dren there is a very distinct tendency to draw
shorter arms and legs than in the norm group.
Here again there is the possibility that the
proportions of the body are distorted some-
what by so many children depicting the fig-
ures in parkas.
Cultural factors influence drawings in many
obvious ways such as type of garb, vehicles, im-
plements, and actions portrayed, but the nature
of the influence on a Goodenough-type point score
is subtle, as illustrated in the preceding quota-
tions from Harris. Because such variations are
often inconsequential within the mainstream of
American culture, there has been a wide tempta-
tion to use the Draw-A-Man as a culture-free
intelligence test. Nevertheless, as Harris prop-
erly insisted (522, p. 133), "the data . . . suggest
that the child's drawing of certain body features
or parts is influenced by garb, and possibly by
other conditions of living that call attention to
particular parts or their functions. Allowance
would have to be made, both in scoring and in
37
the movrms, for parts omitted in one of these
cultures included in the present scoring system.
Such allowance would have to be worked out em-
pirically within each culture group." (italics
added)
Goodenough and Harris (510), in their 1950
review, affirmed that although the test may be
unsuited to comparing children across cultures,
it may still rank children within a culture accord-
ing to relative intellectual maturity. In his 1963
publication (522, p. 133) Harris has further amend-
ed this position to state that "for the most valid
results, the points of the scale should be re-
standardized for every group having a distinctly
different pattern of dress, mode of living, and
quality or level of academic education.’ In Harris’
judgment, "This conclusion virtually rules out the
scale for cross-cultural comparisons; indeed,
psychologists increasingly believe that mean dif-
ferences among large, representative samples
drawn from varying cultures express the gross
differences in conceptual experience and training
these groups have had. Further work, to determine
exactly which aspects of intellectual or conceptual
maturity the drawing task expresses, will be
necessary to explain scientifically these observed
cultural differences."
No systematic research such as Harris de-
lineated with respect to Eskimo childrenhas been
done on the detailed effects of microvariations
within the American culture. Yet there is little
reason to doubt that subtle differences between
urban and rural, industrial and suburban, warm
climate and cold, eastern and western, and other
prominent contrasting situations within the con-
tinental United States (to say nothing of Alaska
and Hawaii) might produce some significant
variations. Undoubtedly, some of these subcul-
tural variations reflect ethnic factors, such as
the superstitious reluctance of some southwestern
children of Mexican origin to draw eyes because
of fear of the "evil eye."
It is also possible that secular trends, which
are revealed in the comparison of the 1926 and
1963 norms, may be occurring at differential
rates in different localities and segments of the
culture and that these also may subtly affect
point scores. For example, the high-fashion
announcements of transparent garments for fe-
males not only aroused different reactions among
38
different segments of the population but also re-
ceived widely varying prominence in different
localities, Although this is an extreme example,
it is nevertheless possible that some children
might draw the female figure appropriately re-
flecting a sophisticated transparent garment and
be penalized on the point score for what could be
considered a "bright' response.
Sex Differences
Both Goodenough (504) and Harris (522) have
reported qualitative and quantitative differences
in drawings which are related to the sex of the
person doing the drawing. Harris' more recent
work is of greater relevance. He believes that
these sex differences cannot be attributed to dif-
ferential selection of boys and girls according
to intellect. Harris' recent data show that sex
differences in total point scores appear at an
early age and are considerably greater than those
reported by Goodenough. Harris found that for the
drawing of aman, the mean score difference favors
girls by about one-half year of growth at each year
of age, while for the drawing of a woman, this
difference is roughly equal to a full year of growth,
The Harris point scale, applied differentially to
Man and Woman drawings by boys and by girls,
appears to reduce mean differences.
Sex differences in drawing point scores re-
flect differences in maturation, cultural factors—
including sex role and awareness—and perhaps
some degree of difference in drawing proficiency.
However, it is believed that these will be mini-
mized by the adoption of the Harris norms and
scoring system and that the remaining residual
error probably will be inconsequential, Without
doubt, the error will be smaller than that which
would result from the blanket use of one uniform
scoring system for the entire population.
PERSONALITY STUDY
BY CHILDREN'S DRAWINGS
Although personality evaluation is not the
primary reason for including the Draw-A-Man
Test in the Survey, a review of the potentialities
for such analysis is relevant. Since this topic has
been covered more extensively by Harris in his
recent publication than in this review, the following
discussion is organized in celation to Harris'
summary. Below are eight widely accepted but not
necessarily established generalizations concern-
ing personality measurement by children's draw-
ings. These were evaluated by Harris inhis recent
book (522, p. 52). As will be noted, several of the
generalizations are rejected.
1. Drawing interpretation is move valid when
based on a series of a subject's protocols
than when based on one drawing. Despite
the lack of clear-cut empirical evidence
on this issue, Harris equates additional
pictures as having the effectof increasing
the length and therefore the reliability of
the test. From this logical viewpoint, he
considers it justified.
2. Drawings ave most useful for psychologi-
cal analysis when teamed with other avail-
able information about the child. This, too,
is a logically sound principle, "especially
when it is the content of drawings alone
that is being used for psychological in-
terpretation.,"
3. Free drawings ave move meaningful psy-
chologically than drawings of assigned
topics. This is probably true for certain
purposes, such as exploration of interests,
but systematic comparison of individuals,
as in a national survey, requires control
of the task.
4. When a human figure drawing is assigned,
the sex of the figure first drawn relates
to the image the drawer holds of his own
sex role. Of the studies summarized in
Appendix III, those most relevant to the
study of children ages 6 to 12 years are
as follows: 512, 537-539, 541, and 542.
According to Brown and Tolor (541), nor-
mal individuals of both sexes tend todraw
their own sex first, while persons with
behavior disorders draw the opposite sex
first. Harris agrees that most children of
either sex will draw their own sex first
when asked to ''draw a person.'' He further
elaborates that as girls grow older there
is an increasing tendency for them to draw
a male figure. This, he feels, reflects both
the cultural preference given to the male
role and an increasing dissatisfaction with
the female role.
Harris also hypothesizes that the male
figure is more culturally stereotyped and
easier to draw than is the female figure.
He considers deviates from this norm to
be psychologically different from non-
deviates. He also feels that the deviation
has different meanings for the two sexes
and has unique, idiosyncratic meanings
to individuals. Since many deviations from
the norm occur and since the meaning of
such deviations is as yet unknown, it is
unlikely that the principle (the figure
drawn first relates to the image the
drawer holds of his own sex role) is uni-
versally valid. Therefore, even though
about 86 percent of boys and 65 percent
of girls have been reported to draw their
own sex first, it is not possible to for-
mulate any reliable interpretation for
those who do not.
. A child adopts a schema or style of draw-
ing which is peculiar to him and which be-
comes highly significant psychologically.
Most of the evidence is opposed to this and
suggests rather that developmental pat-
terns do exist among children's drawings.
The manner in which certain elements are
portrayed in drawings may be used as
signs of certain psychological states or
conditions in the artist. In agreement with
Harris, the present writer regards this
statement as one of the eternal, unful-
filled wishful myths of the "depth psychol-
ogist." Two particular statements by
Harris are relevant to possible further
research in this frustrating area. First,
"whether or not 'signs' are selected by an
empirical or deductive procedure, there
is still the question whether form or con-
tent will provide the cues. Size, quality
or texture of line, degree of angularity,
pattern or shape, and placement on the
page are often thought to be highly signifi-
cant avenues for 'projecting' unconscious
motives or needs.' References 512, 521,
537, 540, 543, 564, and 566 support this
view, but neither form nor content signs
of unequivocal value have thus far been
validated. Thus, Harris' second state-
ment, that ''useful and valid signs leading
to dependable conclusions are, for the
39
most part, still to be ascertained," dis-
poses of this generalization.
7. Drawings must be interpreted as wholes
rather than segmentally or analytically.
This, too, has been a strong sentimental
favorite, but the evidence is mostly the
other way, particularly in personality
assessment, In fact, the history of psy-
chometric progress has been away from
global analysis toward specific analysis,
has favored linear over curvilinear rela-
tions, and generally has demonstrated that
quantitative procedures are more valid,
even if less spectacular, than those based
on scorer judgment.
Harris has cited analytic studies of com-
ponent qualities of children's drawings,
by Martin and Damrin and by Stewart
(522, p. 56), which suggest that "drawings
are actually appraised in terms of a few
general dimensions, although they may be
rated on a number of specifically defined
elements or qualities." Harris believes
that these studies lend credence to the
belief that broad, dimensional evaluations
(rather than highly particularistic ones),
based on such analytic results, may be
made more readily and more reliably, He
also believes that they suggest the direc-
tion these quantitatively and factorially
defined "global ratings may take. "Their
findings in relation to personality quali-
ties, however, are not of such magnitude as
to support the use of drawings indiagnos-
ing individual cases."
8. The use of color in dvawings can be sig-
nificant for studying personality. This is
another popular clinical belief, on which
the empirical evidence is equivocal.
RESEARCH ON THE
GOODENOUGH TEST
Reliability Studies
Table 6 summarizes the reliability coeffi-
cients reported for the Draw-A-Man Test in the
studies included in this review (523-528). In
general, the reliabilities obtained by independent
investigators have confirmed those reported by
40
Goodenough, The reliability of the point scale
holds up in the mentally retarded range (523
and 524), and scorer agreement is high (526).
One problem observed in interscorer com-
parisons by the reviewer which is mentioned in
connection with the Goodenough vs. the Good-
enough-Harris comparison is that while the re-
sults of two scorers may show a very high
correlation, there may nevertheless be a constant
difference in score levels between them, reflecting
individual idiosyncrasies of their interpretations.
The safest method of coping with such constant
errors, in a survey in which a number of scorers
may be used for different segments of the total
sample, would be to have at least two people
score every test and to use the average of the
two for record.
Correlations With Other Tests
Correlations of the Draw-A-Man with the
Stanford-Binet are summarized in table 7, and
its correlations with other tests, in table 8.
Similar tables appear in Harris (522, pp. 96 and
97). With few exceptions, correlations of the
Draw-A-Man with the Stanford-Binet (in which
coefficients are based on IQ's) reported by other
investigators have averaged lower than those re-
ported by Goodenough in 1926 (504). The ex-
ceptions found are Williams (505), Israelite
(562), White (565), and Ellis (unpublished master's
colloquim paper, University of Minnesota, 1953),
whose data agree substantially with those of
Goodenough.
Unfortunately, most of the publications cited
which involve correlations of the Draw-A-Man
with the Stanford-Binet and a number of other
tests are based on very small samples (rarely
more than 100), are usually not representative
of their respective subuniverses, and do not
always present assurance of testing under stand-
ard conditions. As a result, the collection of
correlation coefficients can only be interpreted
very generally,
These results indicate a considerable as-
sociation between the Draw-A-Man Test and
general intelligence tests, such as the Stanford-
Binet and the WISC, which measure mental
maturity. The common variance is probably about
50 percent. Maturationally, the original rationale
presented by Goodenough—that drawing point
Table 6. Studies reporting reliability coefficients of human figure drawing tests
Number I 1 2
. God eliabilit
Investigator Year ... 28 Subjects” Age range Type of coefficient al
Z M F
Yepsen (523) ---- | 1929 | Goodenough--=--- Feebleminded---=- 9.0 - 18.2 37 37 - | Test-retest
Administration 0.89
Administration 2-3----- 0.91
Administration 1-3----- 0.91
Brill (525)----- 1935 | Goodenough=-=---- Feebleminded----- N.R. N.R. | === | --- | Test-retest
71 73 - | Administration 1-2-=---- 0.77
65 65 - | Administration 2-3----- 0.80
67 67 - | Administration 1-3----- 0.68
Albee and Hamlin | 1949 | Human Figure VA Mental N.R. N.R === | === | Interjudge-~=====m====- 0.95
(579) « Drawing, Paired | Hygiene Clinic. BPC ALIIAN = BOW www wen] 0.98
Comparisons. Range—normals
to psychotics.
Albee and Hamlin | 1950 | Machover------- Neurotic, N.R. 72 ---| === | Interjudge-============o 0.89
(581). schizophrenic,
normal.
Hinrichs (586) -- | 1935 | Goodenough-=---= Normals----------| 10-18 years 81 | --- | --- | Split-half, Spearman- 0.88-0.90
Brown.
b
Herron (532) ---- | 1957 | Goodenough----- Normals, Grades 113 months 16 16 - | Test-retest, group A,
3 and 4. (mean) Administration 1-2----- 0.52
Administration 2-3===--- 0.51
Administration 1-3----- 0.27
28 - 28 | Test-retest, group Ab
Administration 1-2----- 0.79
Administration 2-3-=----| 0.69
Administration 1=-3----- 0.85
24 24 - | Test-retest, group B’
Administration 1-2===-- 0.92
Administration 2-3====- 0.40
Administration 1-3----- 0.86
15 - 15 | Test-retest, group B®
Administration 1-2-=-=--- 0.85
Administration 2-3----- 0.73
Administration 1-3----- 0.63
McCurdy (527) --- | 1947 | Goodenough----- Normals--=======-| 83.2 Soaths 59 59 - | Test-retest-==-=-====m===-| 0.69
mean
Buhrer, de 1951 | Goodenough----- Normals, 7-14 years [1,936 --- | === | N.Rmmmommommmmm mmm meem | 0.97
Navarro, and Spanish-
Velasco (511). speaking.
Frankiel (518)-- | 1957 | Goodenough and | NormalS--=-eeemecfe ccc cme 200 100 | 100 [rome cmm nnn mcnn nm rm mmm mmm mmm ——————
Frankiel.
7 years 100 50 | 50 | Intrajudge----=---=-=-=~- 0.83
7 years 100 50 | 50 | Interjudge----=--=-===-==- 0.71-0.84
12 years 100 50 | 50 |Intrajudge------==-==--=- 0.89
12 years 100 50 | 50 | Interjudge--------=--==-~ 0.81-0.86
McHugh (508) ---- | 1945 | Goodenough----- Normals, pre- 62.0 months 83 | --- | --- | Test-retest=======mm=nm==-| 0.46 (IQ)
school. (mean) 0.51 (MA)
Seine 1926 | Goodenough----- Normals--=====u=- 4-12 years | 5,627 | === | === bommmmmmm memes
gf Split-half, Spearman- 0.77
Brown.
Test-retest, Grade 1 only-| 0.94
See footnotes at end of table.
41
Table 6.
Studies reporting reliability coefficients of human figure drawing tests—Con.
Number
Investigator Year Win ang od Subjects? Age range Type of coefficient Relispiliy
z M F
Williams (505)-- [1935 | Goodenough----- Normals--======- 3-15 years 100 50 50 | Interrater--==========- 0.80-0.96
Smith (506) ==--- 1937 | Goodenough-=-=-- Normals---===mecl com cannann maa 1000 wm wit {Pg eGR BE eww vem mmm mlm mmm mm
6 years 100 | === | === |emm mmm mmm 0.91
7 years 100 | === | === |rmmmmmmm meme 0.91
8 years 100 | === | === frm mm mmm 0.95
9 years 100 | === | === fem=mmmmmmmm meme 0.96
10 years 100 | === | === |emmmmmmmmmmm meee 0.93
11 years 100 | === | === |emmmmmmmm mmm emma 0.95
12 years 100 | === | === fmm meme eee 0.92
13 years 100 | === | === fmmmmmm mmm 0.92
14 years 100 | === | === bemmmm mma 0.94
15-16 years | 100 | === | =o= Ee=wne=awe ————r 1 om 0.84
McCarthy (526) -- | 1944 | Goodenough----- Normals, Grades N.R. B86 || meek] wien eee sree a ———— — m—
3 and 4.
InLTascorer = =mnmnumm= 0.94
Interscorer-=-----===-=-- 0.90
Test-retest------------ 0.68
Odd-even, Spearman- 0.89
Brown,
McHugh (529)---- | 1952 | Goodenough----- | Normals, N.R. 118 838i] 60 [racsmrnmnmammnn mm.
Grade 3.
Intrajudge----========- 0.98
Interjudge----======--- 0.97
Stone (582) ----- 1952 | Machover--=----- Normals, N.R. BIZ | muw | wm Bmmmmmmmmmmm non
Grade 6.
Split-half
First drawing 0.82
Second drawing-------- 0.76
Test-retest
Drawings 1 and 2,
males-=-mmmmmcccmaaaan 0.56
Drawings 1 and 2,
females--===-ccccmaoa- 0.39
Drawings 1 and 2,
LOERL~==m=mmmtw mm nnimn- 0.50
Designations of subjects are always white Americans unless otherwise specified.
Indicates conditions preceding Draw-A-Man testing.
Group
A
B
Satisfying activity
Frustrating activity
Initial test
Second test
Satisfying activity
Frustrating activity
Third test
Frustrating activity
Satisfying activity
NOTES: Unless otherwise indicated, it is assumed that reliability coefficients were Pearson Product-Moment and were com-
puted from raw scores,
Z —Total population; M—male; F—female; N.R.—noOt reported; IQ—intelligence quotient; MA—mental age.
42
Table 7. Studies reporting correlations between the Goodenough and Stanford-Binet
Number Correlations
Investigator Year Subjects” Age range
z M|F 1Q MA
McElwee (524) -==-=-mmmmmmmmmmm emma 1932 |Retarded--==--===ccmmmmmmmm emma 14 years 45 | =--|--- | N.R. | 0.72
Rohrs and Haworth (569)-=---mcceena- 1962 |Retarded-===-==m=m cmon mmm bee meee a] 46 | 237 23] 0.28 |N.R.
(Form
L-M)
Familial-----==-=mcccmmcmmmmm ama 12.57 years 20 10] 10 [{ N.R., | N.R.
(mean)
Organic--=---=c--cmmmmmm mmm 9.2 years 26 137 13 | N.R. |N.R.
(mean)
Birch (550) -=====-mmmmmmmcmmee eee 1949 | Retarded-=----===-=cmmmmmmmmm meme 10-6 ~- 16-3 68 | 43] 25] 0.62 [0.69
1szaelite (300) ~mmesmerencnmmsanmmn 1030 [Fosbloninded-messes sremunmnssannans 6-3 - 40 years | 256 | 162| 94 | N.R. | 0.71
Johnson, Ellerd, and Lahey (592)---- | 1950 | State hospital population---=-=-=--=--- 6-9 - 17 years 209 | ---|---| 0.48 | N.R.
White (565) =-=-=-cccmmmecccmeeeeeem 1945 | ==mmm mmm me ee eee femme eee 14] | mmm |mmm fmm mmm meme
Feebleminded 8- 19-4 47 | =--|---] 0.63 [ N.R.
Epileptic -mmmm==n 8-0 = 19.4 47 [ -==(=~=] 0.52 | N.R:
Normal -=memmm cme cme ee meme een 4-8 - 10-6 47 | ===|=-=-=1 0.71 Rs
Havighurst and Janke (544) -===-=ee-- 1944 | Normal§-=====-==occmmmmme meme mmm 10 years 114 | -=- |---| 0.50 | N.R
Fowler (531) -=---ccmmmcccccmcceeeee 1953 | Normalg§==========m-cmcmeeemcmceceen 9-2 - 12-1 41 | 19] 22 | 0.41 [N.R.
Lessing (551) -===m--cmmomcmmmceeeeoe 1961 | Normals=====-==-==--cmmcmemmmna an 8-9 years 23 21 200.5) | N.B.
McHugh (549) —====-cmmmcmm cee eee en 1945 | NormalsS=======memmccmmmccmc me memmem 64 months 90 43| 47 | 0.41 | 0.45
(mean)
Thompson and Finley (552) ---===-=--- 1963 | Guidance clinic referrals---------- 5-9 years 164 8l| 83| 0.67 te
Form
L-M)
Goodenough (504) =====c=ccmmmoamanann 1926 | Normal§==========m-ommmmmm mmm mmm 4-12 years 5627 | ==~-|---| 0.74 | 0.76
Williams (505) -====m=meemcmmm eee om 1935 | Normals-====c-ccmeccmmac mcm neem 3-15 years 100 50; 50] 0.65; 0.80
"Designations of subjects are always white Americans unless otherwise specified.
NOTES: Unless otherwise indicated all correlations are Pearson Product-Moment, with the Stanford-Binet, Form L.
3 =Total population; M—male; F-—female; IQ—intelligence quotient; MA—mental age; N,R,=—rnot reported.
scores largely reflect the ability to form con-
cepts—is supported by the network of corre-
lations compiled from a variety of tests and
by studies such as that of McHugh (549), which
analyzed Draw-A-Man items. McHugh computed
biserial correlations of Goodenough items with
the Stanford-Binet and reported positive corre-
lations for 29 items; the remainder were zero or
slightly negative. The highest correlations, which
support the conceptual interpretation stated, were
the following:
Item Correlation
2 (legs present) -------- 0.48
7a (eyes present) -------- 0.47
9a (clothing present) ---- 0.40
11b (leg joint shown) ----- 0.35
12e (proportion, two di-
mensions) ---====cm-an 0.54
13 (heel shown) ========== 0.35
43
Table 8. Studies reporting correlations between the Goodenough and other measures
Number
Investigator Year Test or criterion variable Subjects” Age range Correlation
Zz M F
Havighurst, Gunther, and | 1946 | Arthur Point Scale of Performance | American 6-11 years | 294 | m=» | won |swmmmcmr manne
Pratt (558). Tests (IQ). Indians.
Zuni--=--=ccopmmmmm mm - 42 | =| === 0.10
Hopil---mmmmmmbm mmm meee] 78 | === | === 0.21
Navaho----===p=====mmum--- 47 | === | =-- 0.23
SiouUR-==-==-mfmmm mmm mm ae] 53 | === | === 0.33
Papago---=-==lf-=====m====o 74 | === | === 0.64
Albee and Hamlin (579) ---| 1949 | Clinical ratings of adjustments----| VA Mental N.R. N.R.| === | === 0.62
Hygiene (rank order)
Clinic.
Range—nor- 0.64
mals to psy- (product
chotics. moment)
Havighoese and Janke 1944 | Cornell-Coxe Performance Ability Normals=-=====- 10 years 114 | === | === 0.63
(544) . Scale.
Havighurst, Gunther, and 1946 | Cornell-Coxe Performance Ability Normals==-=-===~| 6-11 years 66 28 38 0.63
Pratt (558). Scale.
Hinrichs (586) -=-===--=~-~ 1935 | Furfey Revised Scale for Measuring | Delinquents---| 9-18 years 425 | === | === 0.35
Developmental Age in Boys.
Johnson (557) ----==------ 1953 | Hoffman Bilingual Schedule=--=-=-==-=- Spanish N.R. 30 | === | === 0.05
bilinguals
(U.8.).
Boehncke (546) ---==meeemm 1938 | Leiter International Performance Normals===-===~| 5-12 years 257 | === | === 0.83
Scale.
Ansbacher (553) -------=-- 1952 | MacQuarrie Test for Mechanical Normals=---=-=-=-- 10 years 100 | === | ===|======e——————
Ability.
Tracing-===-cemmemccmm omen ———— 0.34
Tapping- 0.23
Dotting======semeeee mmm cme mm 0.16
Brenner and Morse (517)--| 1956 | Metropolitan Readiness Tests, Normals------- 4-7 - 5-11 16 7 0.58
Number Readiness (IQ). (rank order)
Havighurst and Janke 1944 | Revised Minnesota Paper Form Normals-=--=----| 10 years 110 | ---| === 0.48
(544) . Board Test, Form AR.
Brenner and Morse (517) --| 1956 | Monroe Visual subtest (IQ)-=---=-=-=- Normals-=---=-- 4-7 - 5-11 16 7 9 0.64
(rank order)
Hornowski (547) ========== 1961 | Moray House Picture Intelligence Normals N.R. N.R.| ===] === 0.34 (M)
Test. (Scotland). 0.49 (F)
Johnson (557) =-==-===-==== 1953 | Otis Self-Administering Tests of Spanish N.R. 30 | ===] === -0.02
Mental Ability. bilinguals
(U.8.).
Brenner and Morse (517)--| 1956 | Picture Judgment of Maturity (IQ)--- Normals---=---- 4-7 - 5-11 16 7 9 0.64
(rank order)
Pintner-Cunningham Primary Mental |====-==sec-ecomoocmc eee e gee e eee ee] 0.66
Test (MA). (rank order)
Shirley and Goodenough 1932 | Pintner Non-Language Primary Deaf--~----=-=~ 5+ years 229 | === | === 0.33
(575). Mental Test (IQ).
Norman and Midkiff (559)-| 1955 | Progressive Matrices-----=--=====--| Normals, 6-6 - 15-6 96 |---| --- 0.24 (1Q)
American 0.35 (MA)
Indian.
Harris (548) -----ceceeean 1959 | Progressive Matrices----==-===m-aee Normals=-==-===--| 5-1 - 6-1 98 | 45., 53 0.22
Johnson (557) =====m=mmmmm 1953 | Reaction time--==--c--ccmomcammanan Spanish N.R. 30 | === | === 0.43
bilinguals
(u.s.).
Brenner and Morse (517)--| 1956 | Sangren Information Mental Age-----| Normals------- 4-7 - 5-11 16 7 9 0.67
(rank order)
See footnotes at end of table.
44
Table 8. Studies reporting correlations between the Goodenough and other measures—Con.
Number
Investigator Year Test or criterion variable Subjects” Age range Correlation
z M F
Buhrer, de Navarro, and 1951 | School grades----=---mm=mcemmeme——— Normals, 7-14 years | 1,936 === | === |mcmmmmmmaaaa
Velasco (511). Spanish-
speaking,
Mathematics -0.04
Language m= == m= -0.10
Language and Mathematics--- -0.01
Drawing====e== mmm mmm mmm ee | 0.27
Fowler (531)--=---mc-meen 1953 | Social Distance Scale (Fowler)-----| Normals------| 9-2 - 12-1 41 19 | 22 0.40
Shirley and Goodenough 1932 | Stanford Achievement, Education Deaf--------- 5+ years 41 | === | === 0.34
(575). (quotient).
Ansbacher (553) ---------- 1952 | SRA Primary Mental Abilities------- Normals------| 10 years | 100 | === | === |rm-==mmmeanaa
Word Vocabulary=--------ceceeamao 0.23
Picture Vocabulary----=---=-- es 0.19
Total Verbal Meaning-=----- 0.26
a 0.38
Word Grouping-- 0.28
Figure Grouping--- 0.34
Total Reasoning-- 0.40
Perception------ 0.37
NUMb ET = =m mm mmm mmm eee eee ee ee eee Hemme ee me ee ee 0.24
Total Nonreading 0.45
Total Score 0.41
SHRAP-==mm mmm mmm mmm mmm mmm mee 0.48
Harris (548) --=--------o-- 1959 | SRA Primary Mental Abilities-------| Normals------4 5-1 = 6-1 | 98 | 45 | 53 |-=--==-cccuua
Verbal mmm mm mm mmm mm mo me mm me 0.50
Perception--- 0.44
Quantitative- 0.54
Motore=eeeee- 0.40
SPACE = mmm mmn = mnie wm mm mw i 0.51
Brenner and Morse (517)-- | 1956 | Teacher rank of school readiness---| Normals------ 4-7 - 5-11 16 7 9 | 0.69 (rho)
Britton (536) ------------ 1954 | Warner's Index of Status Charac- Normals=-=--==---| 9-11 years | 232 | 102 | 130 0.11
teristics.
Hanvik (593) -----=ceeomuo= 1953 | WISC Full Scale (IQ)-=-=-========= ~| Psychiatric 5-12 years 25 | === | --- 0.18
patients. (rank order)
Rohrs and Haworth (569)-- | 1962 | Wechsler Intelligence Scale for Retarded, N.R. 46 23 23 |rmmmmmmmme mem
Children (IQ). familial and
organic.
Verbal Scale----~==-=--momomemumnx 0.28
Performance Scale- 0.53
Full Scale====-=====--ccmcmmaann 0.46
Designations of subjects are always white Americans unless otherwise specified.
NOTES: All correlation coefficients are Pearson Product-Moment unless otherwise specified.
Z —Total population; Mw-male; F—female; IQ—intelligence quotient; N.R.—not reported; MA—mental age.
45
It is of interest that a careful survey of the
literature spanning a period of over 40 years
fails to disclose any definitive pattern of the
particular components of mental maturity meas-
ured by the Goodenough test. Harris believes
that this may be attributed to the fact that such
components are themselves not clearly differ-
entiated in young children. The correlational
results do, however, suggest strongly that the
Draw-A-Man is more highly associated with
factors measured by performance tests than with
verbal abilities,
In the Health Examination Survey, corre-
lations of the Draw-A-Man with WISC and, more
particularly, with the short form composed of
WISC Vocabulary and Block Design would be most
relevant, Table 3 includes three reports (115, 130,
and 224) which mention correlations between the
Draw-A-Man Test and the Full Scale IQ of the
WISC. Of these, none mentions correlations be-
tween the Draw-A-Man and the short form
of the WISC. Harris' summary also cites the
following unpublished data by Ellis.
Correlation
with:
Age Number
FS VS PS
8 years======-- 16 0.70.1 0.77 10.67
9 years------- 34 0.671 0.63 |0.59
10 years=---=--- 20 0.24] 0.17 {0.26
11 years=-==-=-- 17 0,50 | 0.45 | 0.46
12 years------- 19 0.62| 0.50 [0.68
13 years=------ 17 0.13] 0.05 .{0.15
Disregarding the 13-year-old group, since it is
outside the effective range of the test as well
as outside the age range of the Survey, Ellis’
results for the total sample of 106 have an
average correlation with the WISC Full Scale
IQ of 0.57. Again, this is higher than the corre-
lations reported by others.
In summary, it appears that the WISC corre-
lations with the Draw-A-Man Test are substantial
bur lower than those of the Stanford-Binet.
They are, however, higher with the Performance
46
Scale than with the Verbal Scale (except in
Ellis' two lowest grades).
In comparing Draw-A-Man scores with WISC
Full Scale estimates, there is no reason to assume
any systematic differences in mean levels across
the entire population. However, for statistical
estimation as well as analytic purposes, it is
most appropriate to compute the regression of
Draw-A-Man on Voc., BD, and Total Score and
then to work with differences between regressed
and actual scores for discrepancy analysis,
rather than with differences between scaled
scores.
In view of the Draw-A-Man's sensitivity to
cultural variations, cases in which there are
large discrepancies between the Draw-A-Man
and the WISC should be thoroughly evaluated in
the light of the WRAT scores and other infor-
mation from the Health Examination Survey.
Although Harris' summary and the reports con-
sulted in this review have suggested a number of
promising diagnostic score patterns, none of them
seem well enough established to be adopted.
THE HARRIS REVISION OF THE
GOODENOUGH TEST
Dale Harris' 1963 publication (522), which
he has named the Goodenough-Harris Drawing
Test, is a thorough revision and extension of
Goodenough's test. As already mentioned, it bases
the lengthier point-score scales on both drawings
of the male figure and drawings of the female
figure, for which it provides separate norms for
boys and for girls. A third picture, in which the
child draws a representation of himself, has not
been empirically standardized.
- Standardization of the Harris revision was
completed on a total sample of 2,965 children,
representative of four major geographic areas of
the country. The sample was also representative
of the 1960 census distribution of fathers' occupa-
tions. Total point scores are converted to standard
scores with a mean of 100 and a standard deviation
of 15. Conceptually, these are equivalent to the
WISC deviation IQ's. The new scales overlap
extensively with the original point scales, and
Harris found that children now earn substantially
higher scores when the 1963 norms, rather than
the 1926 ones, are utilized. The explanation for this
phenomenon is not clear. The new norms do
appear to take into account technical and social
changes which have occurred between 1926 and
1963. They also offer the advantages of greater
length (hence, higher reliability) and more ad-
equate provision for sex differences.
Comparison of Goodenough and
Goodenough-Harris Scores
It seems desirable to inquire whether the
Harris scales and norms could be used to score
human figure drawing obtained in the Health
Examination Survey. As noted above, in this
Survey only one picture is drawn by each child,
who is instructed, '"Make a picture of a person.
Make the very best person that you can." To use
the Harris scales in the Survey it would be
necessary for the scorer to decide whether each
drawing was of a "Man" or of a "Woman,"
A sample of 200 drawings, 100 drawn by boys
and the other 100 drawn by girls, was taken at
random from the Survey files, These drawings
were then carefully scored using Harris' norms,
and the scores obtained were compared with the
scores the drawings had already received on the
1926 Goodenough scale. (Scoring by the 1926
method is completed in the field by Survey staff
psychologists.)
Of the 200 cases, 195 were usable. Three
drawings were rejected because they contained
a face only, and for two cases age had been in-
advertently omitted, precluding the computation
of standard scores. For the remaining drawings,
neither scorer reported any difficulty in identi-
fying the sex represented, and their agreement
on this was perfect.
Table 9. Means of Goodenough-Harris and Goodenough variables and correlations between
scorers and between methods for total sample and six subsamples
Drawings of a Drawings of a
Draw- Draw- woman man
» Toagd ings of | ings of
Variable & p a woman a man By By By By
boys girls boys girls
N=195 N=94 N=101 N=17 N=77 N=83 N=18
1. Goodenough-Harris
point (A)----==-=-=--- 30.75 31.41 30,13 | 28.12; 32.134 | 30.20 29,78
2. Goodenough-Harris
SS (A)-----=-==--=o-= 96.59 95.89 97.24 | 93.06 96.52 | 97.29 97.00
3. Goodenough-Harris
point (B)-========== 36.02 36.62 35.47 | 34.71 37.04 | 35.54 35.11
4. Goodenough-Harris
SS (B)============-- 105.97 105.15 | 106.73 | 104.06 | 105.39 | 106.63 107.22
1+3. Average Goodenough-
Harris point (A,B)-- 33.39 34.02 32.80 | 31.42) 34.39 | 32.37 32.45
2+4. Average Goodenough-
Harris SS (A,B)----- 101.28 100.52 | 101.99; 98.56{ 100.96 | 101.96 102.1)
5. Goodenough point----- 26.38 25.57 27.14 | 24.29] 25.86 27.20 26.83
6. Subject's CA----====- 115,01 111.89 | 1.7.92] 118.35 110.47 1 118.10 117.11
7- Goodenough MA==-=====-- 114.61 112.48 116.59 | 108.88 | 113,27 | 116.71 116.06
8. Goodenough IQ-=-=--=---- 101.23 102.27 | 100.27| 92.59] 104.42 | 100.10| 101.06
Dy 3 wmmscmmmmminmn mm sim msn 0.90 0.89 0.91 0.82 0.91 0,90 0.95
Ef mms mom mm. —— 0,90 0.88 0.91 0.79 0.89 0.92 0.83
F208 mmm mmm mmm mmm 0.78 0.76 0,81 0,60] 0,78| 0,87 0,47
TB wm wim mmr mmm es mimi me miei mi 0,81 0,78 0,84 0,58 0,82 0,89 0,48
MA—mental age; r--correlation.
NOTE: Ne-number; A—scorer A; B—scorer B; SS—standard score; CA-——chronological age;
{ 47
The usable sample of 195 cases consisted
of 100 boys and 95 girls. Of these, 17 boys drew
a Woman figure and 18 girls drew a Man figure,
The remaining 82 percent of the total group
(83 percent of the boys and 81 percent of the
girls) drew their own sex.
The following eight variables were recorded
for all 195 cases:
Harris method, point score, scorer A
Harris method, standard score, scorer A
Harris method, point score, scorer B
Harris method, standard score, scorer B
Goodenough point score
Subject's chronological age in months
Goodenough mental age
Goodenough IQ
0 NON UN Ah W N=
Means, standard deviations, and intercorrelations
were computed for the total sample and for the
following six subsamples: (1) Woman drawings
(N=94), (2) Man drawings (N=101), (3) Woman
drawings by boys (N=17), (4) Woman drawings
by girls (N=77), (5) Man drawings by boys (N=83),
and (6) Man drawings by girls (N=18). A summary
of the most relevant results, for all seven sample
combinations, appears in table 9.
The correlations between the two scorers
{x 13 and r, 4 are high despite a systematic tend-
ency for scorer B's results to exceed those of
scorer A (they average 5.25 above scorer A on
point score and 9.38 higher on standard score).
As a more stable estimate of the Harris scores
for comparison with the Goodenough, average
mean scores for the two scorers were computed.
These appear in table 9 between variables 4and 5.
Although agreement between the two scorers
is generally high, the lowest correlations were
found for the 17 boys who elected to draw a
female figure (subsample 3). The standard score
correlations for the 18 girls who elected to draw
a male figure (subsample 6) are also com-
paratively low. These opposite-sex drawings
also reflect the lowest correlations between
Harris and Goodenough IQ's for both scorers
s scorer agreement is lowest
(Tog and re) Thus s g
on opposite-sex drawings, and the results for
these show the poorest agreement, correlation-
wise, between the Goodenough-Harris and Good-
enough IQ's. It is possible that these differences
48
could be eliminated by further training of scorers.
Certainly these results illustrate the importance
of quality control of scoring. The averaging pro-
cess is also highly recommended if systematic
scorer differences cannot be eliminated.
The principal support, indicating an advantage
of the Goodenough-Harris scale, appears in the
comparison of mean scores for boys and girls on
Woman and Man drawings as abstracted in table
10. In accordance with Harris' own findings, girls
score higher than boys, but the differences are
greater on the Goodenough scale than on the Good-
enough-Harris scales and are greater on the
Woman drawings than on the Man drawings. The
greatest discrepancy and resulting scoring pen-
alty by the Goodenough scale occurs in the case
of the 17 percent of boys (subsample 3) who
elected to draw a Woman. At the same time, the
81 percent of girls (subsample 4) who elected to
draw their own sex received disproportionately
high scores on the Goodenough, in comparison
with the mean levels on the Goodenough-Harris,
The Goodenough-Harris scores are higher than the
Goodenough for both sexes on the Man drawing.
The problems with the Woman drawing clearly
support the observation, first pointed out by
Goodenough and strongly reiterated by Harris,
that the female figure is more culture-bound
than the male, is less stereotyped, and is more
susceptible to individual interpretation. Although
the data on which the present analysis is based
are limited, they do suggest that the Harris
revision does less violence to the female figure
than does the Goodenough scoring and that, in
general, the Harris revision is more adequate for
opposite-sex drawings.
These data, which indicate a superiority of
girls over boys in drawing scores, a tendency
for the Goodenough-Harris scores to be higher
than the Goodenough scores, and a tendency for
girls who draw male figures to be older than girls
who draw their own sex (while no such differ-
entiation occurs among boys), are all consistent
with trends reported elsewhere in the literature,
However, the most important argument in favor
of using the Goodenough-Harris scoring system
is that the variation of mean scores among the
four subsamples is thereby greatly reduced around
a mean of 100. This range is from 92.59 to 104.42
(11.83) on the Goodenough and from 98.56to 102.11
(3.55) on the Goodenough-Harris. Although the
Table 10. Comparison of Goodenough-Harris and Goodenough mean IQ's for boys and girls
on same-sex and opposite-sex drawings
Drawing of a woman Drawing of a man
Sex
Goodenough | Goodenough- 3 Goodenough | Goodenough- 9
1Q Harris IQ Difference 1Q Harris IQ Difference
Boys~~-=~=~~ 92.59 98.56 +5.,97 100.10 101.96 +1.86
Girls------ 104.42 100.96 -3.46 101.06 102.11 +1.05
Difference- 11,83 2.50 | “ummm 0.96 0:15 [ namimminmms
Table 11. Coefficients of variation of Harris and Goodenough IQ's for total sample and
six subsamples
Drawings of Drawings of
Draw- | Draw-
_. Total|| ings | ings a woman a man
group || of a of a
woman | man By By By By
boys | girls | boys girls
Harris standard score--=---===c-=-= 0.16 0.15]. 0.15] 0.10 |.0.,16 | 0.17 0.13
Goodenough IQ-======c--cecceeeaaa=- 0.19} 0.18] 0.19} 0.14 0.18 0.21 0.18
standard deviations of the Goodenough-Harris
and Goodenough scores were not shown in table
9, the relative variability of scores based on the
two systems is indicated intable 11, which reports
hs Ching standard deviation
coefficients of variation ra r— for
Goodenough-Harris standard scores and for Good-
enough IQ's for each of the subsamples. It is
apparent that in every case variance is lower for
the Harris scores.
Recommendation
On the basis of this analysis it is recom-
mended that the following steps be adopted in
relation to the Draw-A-Man Test in the Survey:
(1) the Goodenough-Harris system should be used;
(2) the entire sample should be scored centrally
by uniform standards, with adequate training of
scorers and quality control procedures routinely
followed; and (3) if scorer variations cannot be
eliminated by training, the procedure of averaging
the results of two or more scorers should be
adopted.
SUMMARY AND CONCLUSIONS
The foregoing review of the Draw-A-Man Test
supports the view that it is a reliable and valid
nonlanguage measure of mental maturity, although
highly sensitive to cultural influences on the
child's conceptual representation of the human
figure. Its use in a national survey in the 6 to 12
age range, in conjunction with the WISC and WRAT,
is logical and desirable—particularly as ameans
of assessing intellectual development in cases in
which there is impairment of verbal development
or verbal performance.
Personality assessment by means of thematic
and qualitative assessment of children's drawings
would probably be unrewarding. Some indications
justifying further research have been noted; how-
ever, such research is not sufficiently promising
to warrant the expenditure of Survey funds. On
the other hand, several lines of empirical work
appear worthwhile, These are enumerated below.
As discussed in the final portion of the review
of the Draw-A-Man, there is strong evidence for
the adoption of the Harris revision of the Draw-A-
Man with central scoring by trained scorers, and
49
averaging of scores of two or more scorers, if
scorer variations cannot be eliminated in train-
ing. This procedure need not be regarded as
expensive, since it could leave the field psychol-
ogists free to test more children while the
scoring is done centrally by lower paid workers,
Although research on personality-assess-
ment uses of the drawings within the Survey pro-
gram is not recommended, the following lines of
empirical study and analysis are regarded as
useful and even important:
1. A systematic study of cultural variations
related to the principal geographic areas
in which Survey data were collected to
evaluate the effects of factors such as
customs, attitudes, dress, art, and social
roles in relation to the items in the point
scalesby which the Draw-A-Man is scored.
Even if the results of such an analytic
study should be negative, they would be
very reassuring in relation to the use of
the Draw-A-Man scores in the Survey.
2. Regression studies of Draw-A-Man
scores with other psychometric variables
in the Survey so that comparisons can
be made on the basis of differences be-
tween regressed and actual scores rather
than directly between raw scores.
3. Further restandardization of the Good-
enough-Harris norms on a national sample
would be a valuable contribution to psycho-
logical measurement of children that
could only reflect credit on the Survey
and would be of major importance for
future use of this well-established and
useful intelligence test. This significant
undertaking, if approved, should include a
complete item analysis as well as recom-
putation of norms.
Some additional suggestions regarding cross-
disciplinary studies with reference to the Draw-A-
Man Test are presented in a later section of this
report.
BIBLIOGRAPHY
General References to Draw-A-Man
501. Herrick, M. A.: Children’s drawings. Ped.Sem. 3:338-
339, 1893.
502: Barnes, E.: A study of children’s drawings. Ped.Sem.
2:455-463, 1893.
503. Lukens, H.: A study of children’s drawings in the early
years. Ped.Sem. 4:79-110, 1896.
504. Goodenough, F.L.: A new approach to the measurement
of the intelligence of young children. J.Genet.Psychol.
33:185-211, 1926.
505. Williams, J. H.: Validity and reliability of the Goodenough
intelligence test. Sch.&Soc. 41:653-656, 1935.
506. Smith, F. O.: What the Goodenough intelligence test
measures. Psychological Bull. 34:760-761, 1937.
507. Barnhart, E.N.: Developmental stages in compositional
construction in children’s drawings. J.Exp.Educ. 11:156-
184, 1942.
508. McHugh, G.: Changes in Goodenough IQ at the public
school kindergarten level. J.Educ.Psychol. 36:17-30,
1945.
509. Lehner, G. F.J., and Silver, H.: Some relations between
own age and ages assigned on the Draw-a-Person Test;
abstracted, Am.Psychologist 3:341, 1948.
510. Goodenough, F. L., and Harris, D. B.: Studies in the
psychology of children’s drawings, II, 1928-1949. Psy-
chological Bull. 47:369-433, 1950.
50
511. Buhrer, L., de Navarro, R., and Velasco, E. S.: Ensayo
de tipificacion de la prueba mental ‘‘Dibujo de un hom-
bre’’ de F. Goodenough. Publ.Inst.Biotipol.Exp.U.Cuyo,
2:113, 1951.
512. Weider, A., and Noller, P. A.: Objective studies of chil-
dren’s drawings of human figures, II, Sex, age, intelli-
gence. J.Clin.Psychol. 9:20-23, 1953.
513. Tuska, S. A.: Developmental Concepts With the Draw-a-
Person Test at Different Grade Levels. Unpublished
master’s thesis, Ohio University, 1953.
514. Stewart, N.: Review of Goodenough Draw-A-Man Test, in
0.K. Buros, ed., The Fourth Mental Measurements Year-
book. Highland Park, N.J. The Gryphon Press, 1953.
515. Woods, W. A., and Cook, W. E.: Proficiency in drawing
and placement of hands in drawings of the human figure.
J.Consult.Psychol. 18:119-121, 1954.
516. Bliss, M., and Berger, A.: Measurement of mental age as
indicated by the male figure drawings of the mentally
subnormal using Goodenough and Machover instructions.
Am.J.Ment.Deficiency 59:73-79, 1954.
517. Brenner, A., and Morse, N. C.: The measurement of chil-
dren’s readiness for school. Pap.Mich.Acad.Sci. 41:
333-340, 1956.
518. Frankiel, R. V.: 4 Quality Scale for the Goodenough
Draw-a-Man Test. Unpublished master’s thesis, Univer-
sity of Minnesota, 1957.
519.
520.
521.
522.
Zuk, G. H.: Children’s spontaneous object elaborations
on a visual-motor test. J.Clin.Psychol. 16:280-283,
1960.
Stoltz, R. E., and Coltharp, F. C.: Clinical judgments
and the Draw-A-Person Test. J.Consult.Psychol. 25:
43-45, 1961.
Zuk, G. H.: Relation of mental age to size of figure on
the Draw-A-Person Test. Percept.Mot.Skills 14:410,
1962.
Harris, D. B.: Children’s Drawings as Measures of Intel-
lectual Maturity. New York. Harcourt, Brace & World,
Inc., 1963.
Goodenough: Reliability Studies
523.
524.
525.
526.
528.
Yepsen, L. N.: The reliability of the Goodenough draw-
ing test with feebleminded subjects, J.Educ.Psychol.
20:448-451, 1929.
McElwee, E. W.: The reliability of the Goodenough in-
telligence test used with sub-normal children fourteen
years of age. J.Appl.Psychol. 16:217-218, 1932.
Brill, M.: The reliability of the Goodenough Draw-a-Man
Test and the validity and reliability of an abbreviated
scoring method. J.Educ.Psychol. 26:701-708, 1935.
McCarthy, D.: A study of the reliability of the Good-
enough Drawing Test of Intelligence. J.Psychol. 18:
201-206, 1944.
. McCurdy, H. G.: Group and individual variability on the
Goodenough Draw-A-Man Test. J.Educ.Psychol. 38:428-
436, 1947.
Harris, D. B.: Intra-individual vs. inter-individual con-
sistency in children’s drawings of a man; abstracted,
Am .Psychologist 5:293, 1950.
Goodenough: Factors Affecting Drawing Productions
529.
530.
531.
532.
533.
534.
535.
536.
McHugh, A. F.: The Effect of Preceding Affective States
on the Goodenough Draw-A-Man Test of Intelligence.
Unpublished master’s thesis, Fordham University, 1952.
Reichenberg-Hackett, W.: Changes in Goodenough Draw-
ings after a gratifying experience. Am.J.Orthopsychiat.
23:501-517, 1953.
Fowler, R. D.: The Relationship of Social Acceptance
to Discrepancies Between the IQ Scores on the Stanford-
Binet Intelligence Scale and the Goodenough Draw-a-Man
Test. Unpublished master’s thesis, University of Ala-
bama, 1953.
Herron, W. G.: The Effect of Preceding Affective States
on the Goodenough Drawing Test of Intelligence. Un-
published master’s thesis, Fordham University, 1957.
Koppitz, E. M.: Teacher’s.attitude and children’s per-
formance on the Bender-Gestalt Test and Human Figure
Drawings. J.Clin.Psychol. 16:204-208, 1960.
Richey, M. H,, and Spotts, J. V.: The relationship of
popularity to performance on the Goodenough Draw-A-
Man Test. J.Consult.Psychol. 23:147-150, 1959.
Tolor, A.: Teachers’ judgments of the popularity of chil-
dren from their human figure drawings. J.Clin.Psychol.
11:158-162, 1955.
Britton, J. H.: Influence of social class upon performance
on the Draw-A-Man Test. J.Educ.Psychol. 45:44-51, 1954.
Goodenough: Body Image, Sexual Identification
537.
538.
539.
540.
541.
542.
543.
Weider, A., and Noller, P. A.: Objective studies of chil--
dren’s drawings of human figures. I, Sex awareness and
socioeconomic level. J.Clin.Psychol. 6:319-325, 1950.
Knopf, I. J., and Richards, T.W.: The child’s differen-
tiation of sex as reflected in drawings of the human fig-
ure. J.Genet.Psychol. 81:99-112, 1952.
Swenson, C. H., and Newton, K. R.: The development of
sexual differentiation on the Draw-a-Person Test. J.
Clin.Psychol. 11:417-419, 1955.
Lakin, M.: Certain formal characteristics of human fig-
ure drawings by institutionalized aged and by normal
children. J.Consult.Psychol. 20:471-474, 1956.
Brown, D. G., and Tolor, A.: Human figure drawings as
indicators of sexual identification and inversion. Per-
cept.Mot. Skills 7:199-211, 1957.
Fisher, G. M.: Sexual identification in mentally subnor-
mal females. Am.J.Ment.Deficiency 66:266-269, 1961.
Silverstein, A. B., and Robinson, H. Le The represen-
tation of physique in children’s figure drawing. J.Con-
sult.Psychol. 25:146-148, 1961.
Goodenough: Relation to Other Tests
544.
545.
546.
547.
548.
549.
550.
551.
552.
553.
Havighurst, R. J., and Janke, L. L.: Relations between
ability and social status in a midwestern community.
I, Ten-year-old children. J.Educ.Psychol. 35:357-368,
1944.
Condell, J. F.: Note on the use of the Ammons Full-
Range Picture Vocabulary Test with retarded children.
Psychol.Rep. 5:150, 1959.
Boehncke, C. F.: A Comparative Study of the Goodenough
Drawing Test and the Leiter International Performance
Scale. Unpublished master’s thesis, University of South-
ern California, 1938.
Hornowski, B.: Interpretation psychologique des diffe-
rences entre sexes dans le dessin due bonhomme chez
les jeunes adolescents (Psychological interpretation of
sex differences in the Draw-a-Man Test among young
adolescents). Revue Psychol. Appl. 11:7-9, 1961.
Harris, D. B.: A note on some ability correlates of the
Raven Progressive Matrices (1947) in the kindergarten.
J.Educ.Psychol. 50:228-229, 1959.
McHugh, G.: Relationship between the Goodenough Draw-
a-Man Test and the 1937 revision of the Stanford-Binet
Test. J.Educ.Psychol. 36:119-124, 1945.
Birch, J. W.: The Goodenough Drawing Test and older
mentally retarded children. Am.J.Ment.Deficiency 54:
218-224, 1949.
Lessing, E. E.: A note on the significance of discrep-
ancies between Goodenough and Binet IQ scores. J.
Consult.Psychol. 25:456-457, 1961.
Thompson, J. M., and Finley, C. J.: The relationship
between the Goodenough Draw-a-Man Test and the Stan-
ford-Binet Form L-M in children referred for school guid-
ance services. Calif.J.Educ.Res. 14:19-22, 1963.
Ansbacher, H. L.: The Goodenough Draw-A.Man Test
and primary mental abilities. J.Consult.Psychol. 16:
176-18C, 1952.
51
Goodenough: Cultural Variations, Bilingualism
554.
555.
556.
558.
559.
560.
Hunkin, V.: Validation of the Goodenough Draw-a-Man
Test for African children. J.Soc.Res.Pretoria 1:52-63,
1950.
Dennis, W.: Performance of Near Eastern children on the
Draw-a-Man Test. Child Development 28:427-430, 1957.
Anastasi, A., and DeJesus, C.: Language development
and nonverbal IQ of Puerto Rican preschool children in
New York City. J.Abnorm.&8ocial Psychol. 48:357-366,
1953.
. Johnson, G. B., Jr.: Bilingualism asmeasured by a re-
action-time technique and the relationship between a
language and a non-language intelligence quotient. J.
Genet.Psychol. 82:3-9, 1953.
Havighurst, R. J., Gunther, M. X., and Pratt, I. E.: En-
vironment and the Draw-A-Man Test, the performance of
Indian children. J.Abnorm.&Social Psychol. 41:50-63,
1946.
Norman, R. D., and Midkiff, K. L.: Navaho children on
Raven Progressive Matrices and Goodenough Draw-A-
Man Tests. SWest.J.Anthrop. 11:129-136, 1955.
Carney, R. E., and Trowbridge, N.: Intelligence test per-
formance of Indian children as a function of type of test
and age. Percept.Mot. Skills 14:511-514, 1962.
Goodenough: With Subnormal, Retarded, and
Mentally Defective Children
561.
562.
563.
564.
565.
566.
567.
568.
52
McElwee, E. W.: Profile drawings of normal and subnor-
mal children. J.Appl.Psychol. 18:599-603, 1934.
Israelite, J.: A comparison of the difficulty of items for
intellectually normal children and mental defectives on
the Goodenough drawing test. Am.J.Orthopsychiat. 6:
494-503, 1936.
Spoerl, D. T.: Personality and drawing in retarded chil-
dren. Character. Pers. 8:227-239, 1940.
Spoerl, D. T.: The drawing ability of mentally retarded
children. J.G@enet.Psychol. 57:259-277, 1940.
White, M. R.: The Performance of Epileptic, Feeble-
minded and Normal Children on the Goodenough Test of
Intelligence. Unpublished master’s thesis, State Univer-
sity of Iowa, 1945.
Gunzburg, H. C.: The significance of various aspects in
drawings by educationally subnormal children. J.Ment.
Sc. 96:951-975, 1950.
Fabian, A. A.: Clinical and experimental studies of
school children who are retarded in reading. Quart.J.
Child Behav. 3:15-18, 1951.
Hunt, B., and Patterson, R. M.: Performance of familial
mentally deficient children in response to motivation on
the Goodenough Draw-A-Man Test. Am.J.Ment.Deficien-
cy 62:326-329, 1957.
. Rohrs, F. W., and Haworth, M. R.: The 1960 Stanford-
Binet, WISC, and Goodenough Tests with mentally re-
tarded children. Am.J.Ment.Deficiency 66:853-859, 1962.
Goodenough: Chronic Encephalitis
570.
Bender, L.: The Goodenough Test (Drawing a Man) in
chronic encephalitis in children. J.Child Psychiat. 3:
449-459, 1951.
Goodenough: Physically Handicapped
571.
572.
573.
Martorana, A. A.: A Comparison of the Personal, Emo-
tional, and Family Adjustment of Crippled and Normal
Children. Unpublished doctoral dissertation, University
of Minnesota, 1954.
Silverstein, A. B., and Robinson, H. A.: The represen-
tation of orthopedic disability in children’s figure draw-
ings. J.Consult.Psychol. 20:333-341, 1956.
Johnson, O. G., and Wawrzasek, F.: Psychologists’ judg-
ments of physical handicap from H-T-P drawings. J.Con-
sult.Psychol. 25:284-287, 1961.
Goodenough: Intelligence of Deaf Children
574.
575.
576.
Peterson, E. G., and Williams, J. M.: Intelligence of deaf
children as measured by drawings. Am.Ann.Deaf 75:273-
290, 1930.
Shirley, M., and Goodenough, F. L.: A survey of the in-
telligence of deaf children in Minnesota schools. Am.
Ann.Deaf 77:238-247, 1932.
Springer, N. N.: A comparative study of the intelligence
of deaf and hearing children. Am.Ann.Deaf 83:138-152,
1938. i
Goodenough: Measurement of Adjustment
5717.
578.
579.
580.
581.
582.
583.
584.
Brill, M.: A study of instability using the Goodenough
drawing scale. J.4bnorm.&8ocial Psychol. 32:288-302,
1937.
Springer, N.N.: A study of drawings of maladjusted and
adjusted children. J.Genet.Psychol. 58:131-138, 1941.
Albee, G.W., and Hamlin, R.M.: An investigation of the
reliability and validity of judgments of adjustment in-
ferred from drawings. J.Clin.Psychol. 5:389-392, 1949.
Ochs, E.: Changes in Goodenough drawings associated
with changes in social adjustment. J.Clin.Psychol. 3:
282-284, 1950.
Albee, G. W., and Hamlin, R.M.: Judgment of adjustment
from drawings; the applicability of rating scale methods.
J.Clin.Psychol. 6:363-365, 1950.
Stone, P.M.: 4 Study of Objectively Scored Drawings of
Human Figures in Relation to the Emotional Adjustment
of 6th Grade Pupils. Unpublished doctoral dissertation,
Yeshiva University, 1952.
Palmer, H. R.: The Relationship of Differences Between
Stanford-Binet and Goodenough 1Q’s to Personal Adjust-
ment as Indicated by the California Test of Personality.
Unpublished master’s thesis, University of Alabama,
1953.
Popplestone, J. A.: Male Human Figure Drawing in Nor-
mal and Emotionally Disturbed Children. Unpublished
doctoral dissertation, Washington University, 1958.
585. Feldmau, M. J.. and Hunt, R. G.: The relation of diffi-
culty in drawing to ratings of adjustment based on human
figure drawings. J.Consult.Psychol. 22:217-219, 1958.
Goodenough: With Delinquents
586. Hinrichs, W. E.: The Goodenough drawing test in relation
to delinquency and problem behavior. Archs.Psychol.,
N.Y. No. 175, 1935.
587. Starke, P.:4An Attempt To Differentiate Delinquents From
Non-delinquents by Tests of Dominance Behavior, Dom-
inance Feeling and the Goodenough Drawing of a Man.
Unpublished master’s thesis, University of Minnesota.
1950.
Goodenough: With Disturbed Persons
588. Berrien, F.K.: A study of the drawings of abnormal chil
dren. J.Educ.Psychol. 26:143-150, 1935.
589. Despert, J.L.: Emotional Problems in Children. Utica.
State Hospitals Press, 1938.
59C. Des Lauriers, A., and Halpern, F.: Psychological tests
in childhood schizophrenia. Am.J.Orthopsychiat. 17:57-
67, 1947.
591. Holzberg, J. D., and Wexler, M.: The validity of human
form drawings as a measure of personality deviation. J.
Project.Tech. 14:343-361, 1950.
592. Johnson, A. P., Ellerd, A. A., and Lahey, T. H.: The
Goodenough Test as an aid to interpretation of chil-
dren’s school behavior. Am.J.Ment.Deficiency 54:516-
520, 1950.
593. Hanvik, L. J.: The Goodenough Test as a measure of in-
telligence in child psychiatric patients. J.Clin.Psychol.
9:71-72, 1953.
Goodenough: Other References Cited in Text
594. Buck, J. N.: The H-T-P technique, a qualitative and
quantitative scoring manual. J.Clin.Psychol. 4:317-396,
1948.
595. Goodenough, F.: Measurement of Intelligence by Draw-
ings. New York. Harcourt, Brace and World, Inc., 1926.
596. Machover, K.: Personality Projection in the Drawing of
the Human Figure. Springfield, Ill. Charles C. Thomas,
1949.
IV. THE THEMATIC APPERCEPTION TEST
The technology of personality measurement
lags far behind that of ability and achievement
measurement. This lag makes it difficult for
organizations (such as the Division of Health
Examination Statistics) which seek to estimate
population parameters on the basis of definitive
test scores. At present thereis not a single per-
sonality test for children that could be recom-
mended without qualification, In view of the
extensive use of personality tests in clinical
practices and in school situations, this sweeping
statement may appear extreme. It is, neverthe-
less, regrettably true. Perhaps clinical psychol-
ogists can justify their use of various personality
measures, on the basis of intensive individual case
study in which test responses and scores are in-
terpreted, by the clinician, in relation to con-
sistent patterns of performance in the context of
a total life record. The clinician usually feels
free to accept or disregard information in this
frame of reference, and he often employs informal,
unstandardized ''tests'' as well as published pro-
cedures without regard for formal considerations
of reliability and validity. Furthermore, since
clinical judgments are confined to individual
cases, they are not subject to verification by the
rules of evidence observed in scientific studies.
Educators often justify their personality testing
as contributing to research, which is important,
and the only tenable position in the light of the
facts.
In contrast with the clinical and research uses
of personality measures, where legitimacy is not
primarily a function of the proven adequacy of the
measurement instruments employed, surveys such
as this one (HES) operate under severe constraints.
53
The survey scientist must defend the validity and
reliability of his instruments as well as the ade-
quacy of his sampling design for the purposes of
his survey; both considerations affect the validity
of population estimates from sample data.
The choice of a personality measurement
instrument for Cycle II must be considered inthe
context of the preceding discussion. Although the
California Personality Test and Cattell's Junior
Personality Quiz are, in the opinion ofthe writer,
the most adequately documented of the currently
published and objectively scored personality tests
for children, neither meets the reliability and
validity standards necessary for Survey use and
neither is appropriate for the entire age range of
6 through 11 years. Apart from these, no available
tests even approach the requirements of this
Survey.
In the psychometric sense, the Thematic
Apperception Test (TAT) is not a fest. It is a
projective device consisting of a series of am-
biguous (unstructured) pictures individually pre-
sented to the subject (or patient), who is asked to
imagine and relate a story. The rationale of the
procedure is that people will seek to create
structure when a stimulus situation is unstruc-
tured and that in doing so they will draw on their
own experience, needs, attitudes, and values to
provide the details. This process is viewed as a
projection’ of inner processes on the un-
structured stimulus,
The TAT was developed by Henry A. Murray
of Harvard University in 1938 (788). At the same
time he presented a report which outlined a
motivational system of organismic needs and en-
vironmental presses. This report was highly in-
fluential and stimulated much research. Five
years later (in 1943), the TAT pictures and a
manual for their use were published (799).
From the objective scoring standpoint, it is
necessary to recognize that all projective methods
share a major problem, since in all of them the
testing strategy depends on the process by which
subjects add structure to ambiguous stimuli.
Although this structuring process does involve
projection, in the sense defined above, it also
simultaneously involves other factors. Indeed,
the structuring process may be as much a
function of external, situational factors, to which
the subject is responding, as of internal factors.
54
How these various factors combine are only
imperfectly understood in the scientific study
of perception; they have not, to the writer's
knowledge, been investigated in relation to the
TAT pictures. In spite of these facts, for the past
60 or more years users of projective techniques
have continued to assume that responses to
various stimuli represent projection only.
Cattell (796) has suggested that "projective"
tests (which he thinks should be called "misper-
ception tests'"), should employ stimuli of a much
lower order of complexity than those of the TAT
and the Rorschach inkblots in order to simplify
interpretation. Technically this may be an im-
provement, as Cattell has shown in the misper-
ception tests which he designed for his objective
test batteries. In these tests the subject's latitude
of response to a specific ambiguity (e.g., esti-
mating the number of communist party members
in the United States or the value of a college
degree) is extremely limited. A similar con-
clusion is also implicit in the modifications of
the TAT pictures made by McClelland (798) in
his studies of motivation measurement in fantasy.
In a complex projective technique such as
the TAT, the story produced by a subject may
represent his response to the entire picture or
only to certain parts of the stimulus picture. In
addition, the story itself necessarily requires
technical interpretation by the examiner to the
extent that it employs idiosyncratic language,
symbols, and ideation. Because of the freedom
and informality of the method, which is deliberate
(in order to avoid prompting or the addition of
extraneous variance contributed by the examiner),
it is virtually impossible to relate responses to
specific internal and external cues or patterns of
cues,
The very looseness of the interpretative
procedure, in contrast to fixed scoring keys in
the case of questionnaires (usually answered
“ves.” no,” or "72", led George Kelly (797), in
an Annual Review article, to observe that while
in the case of questionnaires the subject tries to
guess what the examiner is thinking, in projective
techniques the examiner must guess what the
subject is thinking. In either case, thereisa good
deal of guessing going on.
The TAT has some similarity to the Draw-
A-Man Test in that the Draw-A-Man provides an
unstructured stimulus (the instruction to draw a
person) and permits wide latitude of response
structuring on the part of the subject. It is note-
worthy that the Draw-A-Man has produced no
acceptable schemes for personality interpreta-
tion. However, as pointed out in the discussion of
the Draw-A-Man, the most promising results in
personality, as well as in cognitive assessment,
have been those employing detailed, objective
techniques of scoring, such as the point scales.
The selection of five cards of the TAT for
the Survey undoubtedly reflects (1) the appraisal
of existing personality tests mentioned above,
combined with (2) the wecognition of apparent
widespread acceptance of the TAT as a pro-
jective technique and (3) the belief that an
appropriate method of objective scoring of re-
sponses to them can be developed for the specific
use of the Survey as well as for later more
general use by professional workers. The basis
for this appraisal cannot be documented here,
although the writer is prepared to defend it.
Reference to the forthcoming Sixth Mental Meas-
urements Yearbook (O. Buros, ed., New Bruns-
wick, N.J., The Gryphon Press) might be suffi-
cient for this purpose. The evidence for the
recognition of acceptance of the TAT is discussed
below, together with an evaluation of the prospects
for successful development of an objective scoring
procedure,
REVIEW OF THE LITERATURE
ON THE TAT
The present review includes abstracts of pub-
lished research articles, theses, and critical
reviews of the TAT literature, as well as 5 general
references on the thematic apperception method.
These constitute only a small portion of the ex-
tensive psychological, anthropological, and socio-
logical research on the TAT and its variants which
have appeared in undiminished quantity over the
years (e.g., Thompson's Negro editionof the TAT,
Symonds’ Picture Story Test, Bellak's Children's
Apperception Test (CAT), Van Lennep's Four
Picture Test, Phillipson's Object Relations Tech-
nique, and numerous other techniques which can
be traced to the Murray version). Both the TAT
procedure and the Murray ''meed-press'' concepts
have been used extensively in personality studies
and studies of motivation. The items selected for
inclusion in this report were judged relevant if
they (1) used a measurement approach, (2) were
validation or normative studies, (3) had an appli-
cable sample in terms of age, or (4) used an
adequate scoring procedure,
Overview
Treatment of the TAT by different writers
ranges from uncritical acceptance on the basis
of a priori assumptions, illustrated by Henry (749)
and Piotrowski (702), through qualified acceptance
with a "soft" attitude toward the contradictory
evidence, as demonstrated by Mayman (701) and
Lindzey (703), to objective evaluation, illustrated
by Eron (706), Windle (704), and others. Windle's
comment, that there is little agreement among
results reported by different investigators, seems
to describe accurately this field of research. One
area in which some agreement may be found,
however, is that of cognitive evaluation (714 and
737-739); this is highly reminiscent of the Draw-
A-Man.,
The TAT literature abounds in elaborate but
largely untested (critically, that is) scoring
systems. Most of these are too extensive for brief
summarization and go beyond the purposes of this
report. However, they have been reviewed in
anticipation of a further empirical study of the
Survey's Thematic Apperception Test data, and
references to 21 additional selected reports are
included in the bibliography of section IV.
Most of these, as well as a number of other
suggested analytic methods of scoring the TAT,
are well summarized in a 1951 publication by
Edwin S. Shneidman, Walther Joel, and Kenneth B.
Little (800). Although the modes of analysis vary
in detail and in terminology, the typical one in-
volves interpretation and frequency counting or
evaluation on a rating scale of all or part of the
following types of information, usually across all
of the stories obtained for a selection of cards.
(The full series of cards is often abridged because
of practical time limitations, as it is in the
Survey.)
Formal (structural) aspects of the stories
Compliance with instructions (including card
rejection)
55
Consistency of stories
Length of stories; vocabulary level
Grammatical forms (nouns, pronouns, verbs,
incomplete sentences)
Number and type of situations described
Number and type of characters included
Outcome of stories
Level of response (from description to im-
aginative interpretation)
Interpretive categories
Feelings, moods, worries, emotional tone
Needs expressed (or implied)
Conflict areas
Presses—physical, emotional, mental, eco-
nomic, social, religious
Characters—strivings, attitudes, obstacles,
barriers, traits, and roles of hero, major
characters, and minor characters
Outcomes reflecting success, failure
Thematic content—family dynamics, - inner
adjustment, sexual adjustment, interpersonal
relations, aggression (physical, nonphysical)
Developmental level in Freudian (psycho-
sexual) context
Defense mechanisms utilized
Manner in which environment is assimilated
The number of variables enumerated under
these categories is extensive (Murray's need-
press system alone exceeds 83), and in most
cases the variables require detailed, careful
definition and intensive training of scorers. High
reliabilities have often been achieved among
scorers within a particular laboratory for a given
period of tenure of the staff members involved,
but these have not generally been maintained with
staff changes or when systems have been tried
out at other institutions. Often, definitions change
over time as new generations of protocols appear,
requiring decisions in relation to categories
developed on the basis of earlier samples.
56
In spite of the logical (from some theoretical
positions) appeal of these analytic approaches,
they do not fit the requirements of psychometric
procedures. Such analytic approaches satisfy the
needs of various clinicians or investigators in
their individual practices and researches, but for
survey purposes they are useful primarily because
they suggest areas which may be suitable for
objective study. With the exception of some formal
characteristics (such as length of story and other
items that can be counted fairly accurately) which
have been related to developmental rather than
personality-adjustment concepts, there is so little
agreement in the literature on most scoring cate-
gories that an investigator seeking to develop an
objective scoring procedure might as well start
from "scratch."
Research Demonstrating
Developmental Factors
Edelstein (737) completed an interesting pilot
study demonstrating a system for scoring TAT
stories. From her system a total age-adjusted
score, correlating well with Stanford-Binet IQ's,
could be derived. She used the following six
scoring categories—number of words, qualifier/
word ratio, number of conditions, number of
responses, number of situations involved, and
number of characters. Her sample included only
15 boys and 13 girls (ages 9-5 to 12-5), but from
a methodological viewpoint her study is promising.
In a conceptually related study, Armstrong
(714) administered the CAT (cards 1, 2, 4, 8, and
10) to a sample of 60 children in grades 1 to 3 in
the University of Minnesota elementary school.
The findings of her study relevant to the present
review are as follows: (1) length of story in-
creases with grade, (2) girls' protocols are
longer than those of boys, (3) the use of first
person pronouns shows a slight but consistent
decline with grade progression, (4) girls tend
to make more subjective and personalized state-
ments than boys, and (5) girls have a consistently
longer reaction time than boys.
Slack (761) gave the TAT to 15 exogenous
feebleminded boys and 12 endogenous ones at the
Vineland Training School. He correlated a score
reflecting the number of causally and purpose-
fully connected statements with the Stanford-Binet
and with Thurstone's test of Primary Mental
Abilities (PMA). With chronological age held
constant, causally or purposefully connected
statements correlated with other variables as
follows: S-B MA, 0.58; PMA MA, 0.70; PMA
Verbal MA, 0.51; PMA Motor MA, 0.72. Length of
stories (number of words) correlated as follows
with the same variables (CA held constant): S-B
MA, 0.31 (ns); PMA MA, 0.34 (ns); PMA Verbal
MA, 0.53; PMA Motor MA, 0.48. The age-Cor-
rected correlation of number of purposeful re-
lations with the PMA Verbal MA was 0.90, and the
correlation of number of causal relations with
the same measures was 0.42. Slack also reported
a significant difference between the endogenous
and exogenous groups on length of stories.
These studies lend some limited support to
the possibility of developing an objective scoring
system based on developmental criteria for the
five TAT pictures used in the Survey.
Other Relevant Research
The following studies were selected for cita-
tion on the basis of their relevance to the Survey
problems. Lesser (720) demonstrated how a
Guttman-type scale could be developed for
measurement of aggressive fantasy. Bijou and
Kenny (732) and Murstein (734) investigated
ambiguity values of TAT cards. The former found
the following ambiguity ranks (out of 21) for the
four picture cards used in the Survey (card 16,
blank, was not rated):
Card number Rank
lcm mmm cm mm mn ————— 2
emcee ————— 3
Bertie som im se ete i mmm em ee 17
8BM mmm 11
The latter reported that cards with medium
ambiguity (8BM) were most "productive of the-
matic content among college students.
Milam (735) demonstrated the sensitivity of
TAT responses to examiner influence. Apparently,
the attitudes and behavior of the examiner, as
perceived by the subject, account for variance in
the TAT responses. This is true of all psycho-
logical tests. It is not possible to say whether
this is a greater problem on the TAT than on the
WISC, for example, but it must be kept in mind
as a significant source of uncontrolled variation.
Gurevitz and Klapper (763) found that schizo-
phrenic children characteristically respond to
CAT cards with bizarre outcomes, evaluation of
stimuli, use of titles, hostility, and verbosity.
Holden (766) compared a small sample of cerebral
palsied children with normal controls, His results
clearly suggest that cerebral palsied respondents
tend to describe the cards, while normal controls
give more thematic content, The average number
of descriptions (out of 10 cards) was 6.0 for the
palsied children and 2.8 for the controls. Leitch
and Schafer (770) reported a number of response
criteria identifying psychotic responses.
From the standpoint of further research on
the development of a scoring procedure for the
TAT, the following list of specific items has been
recorded and evaluated in one or more of the
studies reviewed (reference numbers shown in
parentheses). In most cases the results were not
included in the main discussion either because of
sample limitations, subjective methods of scoring,
inconclusiveness of results, or unrelatedness to
the present problem. Many of them, however, do
appear definable and worthy of further study.
Frequency and duration
RT latency (705 and 747)
Total reaction time (705 and 747)
Number of words (707, 714, 737, 741, 746,
747, and 764)
Number of adjectives (737)
Number of adverbs (737)
Number of nouns (714)
Number of pronouns (714)
Number of verbs (714)
Number of questions (705)
Number of ego words (714)
Number of situations (737)
Number of characters (707 and 737)
Male, female
Nature of action
Crying (718)
Dancing (737)
Disaster (713)
Drunkenness (737)
57
58
Escape solutions (705 and 718)
Fear of punishment (742)
Fighting (720)
Hardship (713)
Illness (713)
Loss of ability, skill,
money (737)
Suicide (705)
Frightening (737)
Killing (720)
Ridiculing (720)
Making fun of (737)
Punishment (705 and 743)
Stealing (737)
Receiving aid (705)
Giving aid (705)
Teaching (737)
Laughing (737)
Singing (737)
Book or movie cited as source (705)
Criticism of picture (705)
Liked, disliked (705)
Title (763)
Number of themes (707, 712, and 764)
Card description
Parts referred to (705)
Number of rare picture details (705)
Compliance with instructions (705, 707,
and 721)
Examiner included in story (770)
Response
Bizarre (705 and 763)
Queer (770)
Contradictory (770)
Incoherent (705 and 770)
Transcendental (707 and 714)
Number of references
Future events (705 and 721)
Past events (705 and 721)
Present events (705 and 721)
Level (712, 721, 755, 766, and 776)
Enumerative
Descriptive
Interpretive
Language
Neologisms (770)
Stereotyped (705)
Vocabulary level (705)
Unusual wording (770)
Fluency (705)
Repetitions (770)
Foreign expressions
Relative age of characters (705)
Older
Peer
Younger
Sex role identification (705)
Own
Opposite
Ambiguous
Tone of story (712)
Emotional
Submission to fate
Rebellion
Fear
Worry
Lack of affect
Aspiration
Shift of tone
Theme of story
Unrelated (770)
Curiosity (738)
Scorning (720)
Social approval (713)
Positive
Negative
Evasive
Stressful (725)
Ordinary family activity (712)
Mental inadequacy (713)
Motivational inadequacy (713)
Physical inadequacy (713)
Perceptual distortions (705, 712, and 770)
Neatness or orderliness of story (705)
Overspecific statements (770)
Overgeneralizations (770)
Autistic logic (770)
Feelings
Anger toward parent(s) (743)
Aesthetic (705)
Ambivalent (705)
Benign (705)
Conflict (705)
Empathy (723)
Frustration (705 and 713)
Guilt (705 and 713)
Happiness (747)
Hate (720)
Independence (713)
Inferiority (705)
Paranoid (705)
Parental anger to child (743)
Pleasant (705)
Pleasure (713)
Sadistic (705)
Security (713)
Number of causal relations (761)
Number of purposeful relations (761)
Outcomes (713, 763, 772, and 775)
Failure
Success
Aggressive (772)
Clarity of statement (705)
Bizarre (763)
Self-reference (705)
Number of personalized statements (705 and
714)
Degree of response certainty (705)
Level of interpretation (Eron, 712)
Symbolic
Abstract
Descriptive
Unreal
Fairy tale
Central character not in picture
Autobiographical
Continuations
Alternate themes
Comments
Denial of theme
Rejection
Peculiar
Confused
Includes examiner in story
No connection between story and picture
Humorous
PROSPECTS FOR DEVELOPING
AN OBJECTIVE SCORING KEY
FOR THE SURVEY'S TAT
Although the TAT literature is scientifically
"sloppy" in comparison with the material reviewed
in relation to the WISC and the Draw-A-Man Test,
the following assumptions seemed warranted: (1)
a substantial number of items (both formal-struc-
tural and thematic-interpretive) can be reliably
defined and accurately scored, (2) discriminating
developmental criteria can be devised, and (3)
an objectively defined scoring system can be
developed which will contribute useful information
regarding development between ages 6 and 12
years.
It seems unlikely, in light of the literature
reviewed, that scoring scales can be constructed
which will measure factors such as motivation,
affective states, and personality traits, However,
this is not serious since there is no indication that
these factors have any developmental impli-
cations.
The anticipated developmental scales would
greatly enrich the information obtained in the
Survey by possibly providing developmental norms
with regard to behavioral aspects not encompassed
by the other tests, such as verbal expression,
thematic content of imagination in standard test
situations, associations to standard stimuli, role
concepts and attitudes in relation to self, peers of
same and opposite sex, parental and adult figures,
and common cultural values.
While the picture samples are limited, they
appear to be well chosen for the purpose. Card 1
has a boy as the central figure; card 2, a girl;
card 5, an adult-parental (mother) figure; and
card 8BM, a possible stressful situation—involv-
ing a father figure—within the experience back-
ground of most school-age children, Card 16, the
blank card, is completely unstructured. As a set
of cards having nearly universal applicability in
a United States national sample, the selection
appears excellent.
One of the advantages that an investigator
working on this problem would have over most of
those who have published reports in this area is
the large sample obtained under standardized
survey conditions. With adequate funds to work
with a fairly large sample of perhaps 1,000 or
more cases, a good test of these conclusions
could be made. Of course, there is no guarantee
that the results will be entirely satisfactory,
although the prognosis appears good.
However, the Survey is committed to doing
something with these data, and no suitable scoring
procedure is presently available, In the writer's
judgment, the options available were nearly all
unsatisfactory, and the one taken may prove to be
a wise decision.
59
BIBLIOGRAPHY
General References to TAT
701.
702.
703.
704.
705.
706.
707.
708.
TAT:
709.
710.
711.
712.
713.
714.
715.
60
Mayman, M.: Review of the literature on the Thematic
Apperception Test, in David Rapaport, Diagnostic Psy-
chological Testing. Vol. II, The Theory, Statistical
Evaluation, and Diagnostic Application of a Battery of
Tests. Chicago. Year Book Publishers, 1946. pp. 496-
506.
Piotrowski, Z. A.: A new evaluation of the Thematic Ap-
perception Test. Psychoanalyt.Rev. 37:101-127, 1950.
Lindzey, G.: Thematic Apperception Test, interpretive
assumptions and related empirical evidence. Psychology
Bull. 49:1-25, 1952.
Windle, C.: Psychological tests in psychopathological
prognosis. Psychology Bull. 49:451-482, 1952.
Hartman, A. A.: An experimental examination of the The-
matic Apperception Technique in clinical diagnosis.
Psychological Monographs. Vol. 63, No. 8 (Whole No.
303). Washington, D.C. American Psychological Asso-
ciation, Inc., 1950.
Eron, L. D.: Some problems in the research application
of the Thematic Apperception Test. J.Project.Tech.
19:125-129, 1955.
Lindzey, G., and Silverman, M.: Thematic Apperception
Test, techniques of group administration, sex differ-
ences, and the role of verbal productivity. J.Personal-
ity 27:311-323, 1959.
Sanford, R. N., and others: Physique, personality and
scholarship; a cooperative study of school children, in
Society for Research in Child Development, Monograph,
Vol. 8, No. 2. Washington, D.C. National Research
Council, 1943.
Normative Data
Cox, B. F., and Sargent, H. D.: The common responses
of normal children to ten pictures of the Thematic Apper-
ception Test series; abstracted, Am.Psychologist 3:363,
1948.
Bell, J. E.: A comparison of children’s fantasies in two
equated projective techniques; abstracted, Am.Psychol-
ogist 3:263, 1948.
Whitehouse, E.: Norms for certain aspects of the Themat-
ic Apperception Test on a group of nine and ten year
old children; abstracted, Persona 1: 12-15, 1949.
Eron, L. D.: A normative study of the Thematic Apper-
ception Test. Psychological Monographs. Vol. 64, No.
9. Washington, D.C. American Psychological Associa-
tion, Inc., 1950.
Cox, B., and Sargent, H. D.: TAT responses of emotion-
ally disturbed and emotionally stable children, clinical
judgment versus normative data. J.Project.Tech. 14:61-
74, 1950.
Armstrong, M. A. S.: Children’s responses to animal and
human figures in thematic pictures. J.Consult.Psychol.
18:67-70, 1954.
Fisher, G. M., and Shotwell, A. M.: Preference rankings
of the TAT cards by adolescent normals, delinquents,
and mental retardates. J.Project.Tech. 25:41-43, 1961.
716. Brayer, R., Craig, G., and Teichner, W.: Scaling difficulty
values of TAT cards. J.Project.Tech. 25:272-276, 1961.
TAT: Scoring Schemes
717. Eron, L. D., Terry, D., and Callahan, R.: The use of
rating scales for emotional tone of TAT stories. J.Con-
sult.Psychol. 14:473-478, 1950.
718. Fine, R.: A scoring scheme for the TAT and other ver-
bal projective techniques. J.Project.Tech. 19:306-309,
1955.
719. Friedman, I.:Objectifying the subjective, a methodolog-
ical approach tothe TAT. J.Project.Tech. 21:243-247,
1957.
720. Lesser, G. S.: Application of Guttman’s scaling method
to aggressive fantasy in children. Educ.Psychol.Measur.
18:543-551, 1958.
721. Dana, R. H.: Proposal for objective scoring of the TAT.
Percept.Mot.Skills 9:27-43, 1959.
TAT: Stability, Reliability
722. Porter, F. S.: A Study of Certain Aspects of the Relia-
bility and Validity of the Thematic Apperception Test.
Unpublished master’s thesis, Iowa State University, 1944.
723. Harrison, R., and Rotter, J. B.: A note on the reliability
of the Thematic Apperception Test. J.Abnorm.&Social
Psychol. 40:97-99, 1945.
724. Jeffre, M. F. D.: A Critical Study of the Thematic Ap-
perception Test Performance of Normal Children. Un-
published master’s thesis, University of Iowa, 1945.
725. Mayman, M., and Kutner, B.: Reliability in analyzing
Thematic Apperception Test stories. J.Abnorm.&Social
Psychol. 42:365-368, 1947.
726. Kagan, J.: The stability of TAT fantasy and stimulus
ambiguity. J.Consult.Psychol. 23:266-271, 1959.
TAT: Validity Studies
727. Calvin, J.8., and Ward, L. C.: An attempted experiment-
al validation of the Thematic Apperception Test. J.Clin.
Psychol. 6:377-381, 1950.
728. Saxe, C. H.: A quantitative comparison of psychodiag-
nostic formulations from the TAT and therapeutic con-
tacts. J.Consult.Psychol. 14:116-127, 1950.
729. Davenport, B. F.: The semantic validity of TAT inter-
pretations. J.Consult.Psychol. 16:171-175, 1952.
730. Bendig, A. W.: Predictive and postdictive validity of
need achievement measures. J.Ed.Res. 52:119-120, 1958.
731. Henry, W. E., and Farley, J.: The validity of the The-
matic Apperception Test in the study of adolescent per-
sonality. Psychological Monographs. Vol. 73, No. 17
(Whole No. 487).Washington, D.C. American Psycholog-
ical Association, Inc., 1959.
TAT: Ambiguity Values of Cards
732. Bijou, S.W., and Kenny, D. T.: The ambiguity values of
TAT cards. J.Consult.Psychol. 15:208-209, 1951.
788. Davenport, B. F.: The Ambiguity, Universality, and Re-
liable Discrimination of TAT Interpretations. Unpub-
lished doctoral dissertation, University of Southern Cali-
fornia, 1951.
784. Murstein, B. I.: The relationship of stimulus ambiguity
on the TAT to the productivity of themes. J.Consult.
Psychol. 22:348, 1958.
TAT: Examiner Influence, Interpreter Influence
785. Milam, J. R.: Examiner influences on Thematic Apper-
ception Test stories. J.Project.Tech. 18:221-226, 1954.
786. Young, R. D., Jr.: The Effect of the Interpreter’s Per-
sonality on the Interpretation of Thematic Apperception
Test’s Protocols. Unpublished doctoral dissertation,
University of Texas, 1953.
TAT: Effects of Intelligence, Achievement
787. Edelstein, R. T.: The Evaluation of Intelligence From
TAT Protocols. Unpublished master’s thesis, College
of the City of New York, 1956.
738. Kagan, J., Sontag, L. W., Baker, D. T., and Nelson, V.
L.: Personality and IQ change. J.Abnorm.&Social Psy-
chol. 56:261-266, 1958.
739. Murstein, B.I., and Collier, H. L.: The role of the TAT
in the measurement of achievement as a function of ex-
pectancy. J.Project.Tech. 26:96-101, 1962.
TAT: Personality Variables
740. McDowell, J. V.: Development Aspects of Phantasy Pro-
duction onthe Thematic Apperception Test. Unpublished
doctoral dissertation, Oklahoma State University, 1952.
741. Cook, R. A.: Identification and ego defensiveness in the-
matic apperception. J.Project.Tech. 17:312-319, 1953.
742. Mussen, P. H., and Naylor, H. K.: Relationships between
overt and fantasy aggression. J.Abnorm.&Social Psy-
chol. 49:235-240, 1954.
743. Kagan, J.: Socialization of aggression and the percep-
tion of parents in fantasy. Child Development 29:311-
320, 1958.
744. Fitzgerald, B. J.: The Relationship of Two Projective
Measures to aSociometric Measure of Dependent Behav-
ior. Unpublished doctoral dissertation, Ohio State Uni-
versity, 1959.
745. Breger, L.: Conformity and the Expression of Hostility.
Unpublished doctoral dissertation, Ohio State University,
1961.
TAT: Effects of Set, Recent Experience,
Stimulus Variables
746. Lubin, B.: Some effects of set and stimulus property on
TAT stories. J.Project.Tech. 24:11-16, 1960.
747. Newbigging, P. L.: Influence of a stimulus variable on
stories told to certain TAT pictures. Can.J.Psychol.
9:195-206, 1955.
748. Coleman, W.: The Thematic Apperception Test. I, Ef-
fect of recent experience. II, Some quantitative obser-
vations. J.Clin.Psychol. 3:257-264, 1947.
TAT: Environmental Variations; Culture, Social Class,
Race, Ethnic Group, Home Conditions, Sex Role,
Sociometric Status, Social Acceptance
749. Henry, W. E.: The Thematic Apperception Technique in
the study of culture-personality relations. Genet.Psy-
chol.Monogr. 35:3-135, 1947.
750. Mason, B., and Ammons, R. B.: Note on social class
and the Thematic Apperception Test. Percept.Mot. Skills
6:88, 1956.
751. Fisher, S., and Fisher, R. L.: A projective test analysis
of ethnic subculture themes in families. J.Project.Tech.
24:366-369, 1960.
752. Mitchell, H. E.: Social class and race as factors affect-
ing therole of the family in Thematic Apperception Test
stories; abstracted, Am.Psychologist 5:299-300, 1950.
753. Mussen, P. H.: Differences between the TAT responses
of Negro and white boys. J.Consult.Psychol. 17:373-
376, 1953.
754. Mussen, P. H.: Some personality and social factors re-
lated to changes in children’s attitudes toward Negroes.
J. Abnorm .&8ocial Psychol. 45:423-441, 1950.
755. Shields, D. L.: An Investigation of the Influences of
Disparate Home Conditions Upon the Level at Which
Children Responded to the Thematic Apperception Test.
Unpublished master’s thesis, University of Pittsburgh,
1950.
756. McArthur, C.: Personality differences between middle
and upper classes. J.4bnomm.&Social Psychol. 50:247-
254, 1955.
757. Cox, F. N.: Sociometric status and individual adjust-
ment before and after play therapy. J.Abnorm.&Social
Psychol. 48:354-356, 1953.
758. Herman, G.N.: 4 Comparison of the TAT Stories of Pre-
adolescent School Children Differing in Social Accept-
ance. Unpublished master’s thesis, University of Toron-
to, 1952.
759. Milner, E.: Effects of sex role and social status on the
early adolescent personality. Genet.Psychol.Monogr.
40:231-325, 1949.
760. Butler, O.P.: Parent Figures in Thematic Apperception
Test Records of Children in Disparate Family Situations.
Unpublished doctoral dissertation, University of Pitts-
burgh, 1948.
TAT: With Feebleminded, Retarded, Handicapped, Brain
Injured, Palsied, Disturbed, and Psychotic Children
761. Slack, C.W.:Some intellective functions in the Thematic
Apperception Test and their usein differentiating endog-
enous feeblemindedness from exogenous feebleminded-
ness. Train.Sch.Bull. 47:156-169, 1950.
762. Tolman, N. G., and Johnson, A. P.: Need for achieve-
ment as related to brain injury in mentally retarded chil-
dren. Am.J.Ment.Deficiency 62:692-697, 1958.
763. Gurevitz, S., and Klapper, Z. S.: Techniques for and
evaluation of the responses of schizophrenic and cere-
bral palsied children to the Children’s Apperception Test
(C.A.T.). Quart. J. Child Behavior 3:38-65, 1951.
61
764. Abel, T. M.: Responses of Negro and white morons to
the Thematic Apperception Test. Am.J.Ment.Deficiency
49:463-468, 1945.
765. Beier, E. G., Gorlow, L., and Stacey, C. L.: The fantasy
life of the mental defective. Am.J.Ment.Deficiency 55:
582-589, 1951.
766. Holden, R. H.: The Children’s Apperception Test with
cerebral palsied and normal children. Child Develop-
ment 27:3-8, 1956.
767. Hood, P.N., Shank, K. H., and Williamson, D.: Environ-
mental factors in relation to the speech of cerebral pal-
sied children. J.Speech & Hearing Disorders 13:325-331,
1948.
768. Bergman, M., and Fisher, L. A.: The value of the The-
matic Apperception Testin mental deficiency. Psychiat.
Quart. Suppl. 27:22-42, 1953.
769. Ericson, M.: A study of the Thematic Apperception Test
as applied to a group of disturbed children; abstracted,
Am Psychologist, 2:271, 1947.
770. Leitch, M., and Schafer, S.: A study of the Thematic Ap-
perception Tests of psychotic children. Am.J.Orthopsy-
chiat. 17:387-342, 1947.
771. Shank, K.H.: An Analysis of the Degree of Relationship
Between the Thematic Apperception Test and an Origi-
nal Projective Test in Measuring Symptoms of Personal-
ity Dynamics of Speech Handicapped Children. Unpub-
lished doctoral dissertation, University of Denver, 1954.
772. Christensen, A.H.: 4 Quantitative Study of Personality
Dynamics in Stuttering and Non-Stuttering Siblings. Un-
published master’s thesis, University of Southern Cali-
fornia, 1951.
773. Young, F. M.: Responses of juvenile delinquents to the
Thematic Apperception Test. J.Genet.Psychol. 88:251-
259, 1956.
TAT: With CAT and Michigan Picture Test
774. Symonds, P. M.: Adolescent Fantasy, an Investigation
of the Picture-Story Method of Personality Study. New
York. Columbia University Press, 1949.
775. Light, B. H.: Comparative study of a series of TAT and
CAT cards. J.Clin.Psychol. 10:179-181, 1954.
776. Andrew, G., Walton, R. E., Hartwell, S. W., and Hutt, M.
L.: The Michigan Picture Test, the stimulus value of
the cards. J.Consult.Psychol. 51:51-54, 1951.
Special Bibliography of TAT Scoring Systems 1
777. Andrew, G., Hartwell, S. W., Hutt, M. L.., and Walton, R.
E.: The Michigan Picture Test. Chicago. Science Re-
search Associates, Inc., 1953.
778. Arnold, M. B.: A demonstration analysis of the Thematic
Apperception Test in a clinical setting. J.4bnorm.&
Social Psychol. 44:97-111, 1949.
779. Aron, B.: A Manual for Analysis of the Thematic Apper-
ception Test. Berkeley, Calif. Willis E. Berg, 1949.
780. Bellak, L.: A Guide to the Interpretation of the Thematic
Apperception Test. New York. The Psychological Cor-
poration, 1947.
2 See also 717 to 721.
781. Cox, B., and Sargent, H.: TAT responses of emotionally
disturbed and emotionally stable children. J.Project.
Tech. 14:61-74, 1950.
782. Dana, R. H.: Norms for three aspects of TAT behavior.
J.Genet.Psychol. 57:83-89, 1957.
783. Fine, R.: Manual for Scoring Scheme for Verbal Projec-
tive Techniques (TAT, MAPS, Stories, and the Like).
Washington, D.C. Veterans Administration, 1948.
784. Fry, F.D.: Manual for scoring the TAT. J.Psychol. 35:
181-195, 1958.
785. Hartman, A. A.: An experimental examination of the The-
matic Apperception Technique in clinical diagnosis.
Psychological Monographs. Vol. 63, No. 8 (Whole No.
303). Washington, D.C. American Psychological Asso-
ciation, Inc., 1950. pp. 1-48.
786. Henry, W.E.: The Analysis of Fantasy. New York. John
Wiley and Sons, Inc., 1956.
787. Klebanoff, S.: Personality factors in symptomatic chronic
alcoholism as indicated by the Thematic Apperception
Test. J.Consult.Psychol. 11:111-119, 1947.
788. Murray, H. A.: Eawplorations in Personality. New York.
Oxford University Press, 1938.
789. Rappaport, D.: The Thematic Apperception Test, Ch. IV,
in Diagnostic Psychological Testing, Vol. II, Chicago.
Yearbook Publishers, Inc., 1946.
790. Shorr, J. E.: A proposed system for scoring the TAT. J.
Clin.Psychol. 4:189-195, 1948.
791. Stone, H.: The TAT Aggressive Content Scale. J.Proj.
Tech. 20:445-452, 1956.
792. Terry, D.: The use of a rating scale of level ofresponse
in TAT stories. J.Abnorm.&Social Psychol. 47:507-511,
1952.
793. Tomkins, S. 8., and Tomkins, E. S.: The Thematic Ap-
perception Test, the Theory and Technique of Interpre-
tation. New York. Grune and Stratton, 1948.
794. White, R.K.: Value Analysis, the Nature and Use of the
Method. New York. Society for the Psychological Study
of Social Issues, 1951.
795. Wyatt, F.: The scoring and analysis of the Thematic Ap-
perception Test. J.Psychol. 24:319-330, 1947.
Other References Cited in Text
796. Cattell, R. B.: Personality and Motivation Structure and
Measurement. New York. Harcourt, Brace and World,
1959.
797. Kelly, G. A.: The theory and technique of assessment,
in P. R. Farnsworth and Q. McNemar, eds., Annual Re-
view of Psychology, Vol. 9. Palo Alto, Calif. Annual
Reviews, Inc., 1958.
798. McClelland, D.: Studies in Motivation. New York. Ap-
pleton-Century-Crofts, Ine., 1955.
799. Murray, H. A.: Thematic Apperception Test, Pictures
and Manual. Cambridge. Harvard University Press,
1943.
800. Shneidman, E. S., Joel, W., and Little, K. B.: Thematic
Test Analysis. New York. Grune and Stratton, 1951.
V. TOTAL PSYCHOLOGICAL TEST BATTERY
The foregoing reviews of the several com-
ponents of the Survey's psychological testbattery
have discussed the strengths and weaknesses of
each test and the problems involved in estimating
population parameters on a national scale from
the sample data. In each case a number of specific
problems were raised, and suggestions for treat-
ment of data or for further research have been
made in the respective sections of the report.
However, the most important common problem
derives from the examination of the standardi-
zation basis of these tests. The norms for the
WISC are unquestionably the most satisfactory,
with the Draw-A-Man being second; the adequacy
of the Wide Range Achievement Test norms has
been questioned (see section II). Finally, new
norms, related to the scoring system to be
developed for the TAT, are yet to be constructed.
In order to achieve the soundest possible
basis for population estimates with this battery,
it is recommended that new national norms, based
on the total Survey sample, be developed for all
of the tests before any final population estimates
are published. While some preliminary estimates
may be warranted, using norms provided by the
test publishers, the discussions in the individual
sections of the report point up the necessity of
the recommended restandardization.
In the event that this work cannot be fully
supported, the order of priority indicated by the
review would place the reanalysis of the WRAT
first, the Draw-A-Man Test second, and the WISC
third. It is assumed that this must be done for the
TAT when a new scoring procedure is completed
and adopted.
The issues in relation to the WRAT are as
follows: (1) No adequate sampling plan was fol-
lowed in standardizing the 1963 revision, and, in
fact, the bias of the sample is clearly mentioned
in the manual. (2) The test scores used to compile
the sample by levels are not equivalent; therefore,
only limited confidence can be placed in the re-
sulting norm levels, even though substantial
correlation of the WRAT scales with concurrent
criteria appears likely.
In the case of the Draw-A-Man Test, it is
recognized that (1) the Goodenough norms are
outmoded, and that (2) the use of the Harris
norms (which is recommended) without analysis
of the raw score distributions on the national
sample might lead to some errors. The adminis-
tration of the Draw-A-Man Test in the Survey
was different from that recommended by Harris,
and it would be prudent to proceed empirically
rather than to assume that the Survey drawings
are equivalent. In addition, Harris' own norms do
not reflect as good a national sample as even the
WISC, for which further standardization is un-
questionably justified.
One of the major problems with the WISC
subtests is that of examining further the optional
basis for estimating Full Scale IQ's from the
Vocabulary and Block Design scores. Even if
restandardization should reveal no need for re-
scaling the subtest items, the adoption of published
conversion tables or direct proration is con-
sidered unjustified without further research. This
is discussed in more detail in section I.
The information expected from the test
battery may be summarized as follows:
1. WISC Vocabulary—score. This test indi-
vidually provides a good estimate of ''g,"
the common ''general intelligence'' factor
in the WISC, and may be accepted as a
good measure of the verbal component
of the general measure of intelligence.
2. WISC Block Design—score. This test is
also well saturated in ''g'" and second only
to Vocabulary in reliability. It should be
accepted as a strong nonverbal intelli-
gence test and as an estimate of the non-
verbal component of the full test.
3. Draw-A-Man Test— Goodenough-Harris
standard score. The Goodenough-Harris
standard score (preferably restandard-
ized on the total Survey sample) can be
interpreted as a deviation IQ, ina manner
comparable to the WISC IQ's. This score
is a reliable and reasonably valid non-
language measure of mental maturity.
4, WRAT Orval Reading—grade equivalent
(RQ).
5. WRAT Oral Reading—standard score
(Rss).
6. WRAT Avithmetic—grade equivalent (Aq).
7. WRAT Aritnmetic—standard score (Ass).
63
Both the grade equivalents and the stand-
ard scores will be useful for the WRAT
Reading and Arithmetic subtests (partic-
ularly if they are restandardized on the
total Survey sample). The grade equiva-
lents will permit assessment of school
retardation, while the standard scores,
which have the same characteristics as
deviation IQ's, will be more appropriate
in pattern analytic combination with the
WISC and Draw-A-Man scores.
8. TAT— developmental score(s). This may
actually be a series of scores, Itis entered
""symbolically' at this time.
It is possible to think of these data as pro-
viding individual profiles or patterns which sup-
plement information represented by the individual
scores. For example, some children may rank
high or low on all scales, indicating general ex-
cellence or retardation in comparison with the
general population. There may also be discrimi-
nable test patterns associated with such special
conditions as reading disability, mental defi-
ciency, scholastic retardation, verbal impair-
ment due to physical or social reasons, behavior
disorders, and cultural deprivation. If such pat-
terns exist, it should be possible to identify them
by a standard research design based on discrim-
ination of experimentally formed criterion groups.
A hierarchical grouping analysis of score profiles,
seeking to identify characteristic profiles of
groups, would be an alternative approach.
In this procedure, identification of criterion
characteristics of the groups would follow rather
than precede the main analysis. In either case,
criterion data would be obtained from record
sources within the Health Examination Survey.
In this type of analysis it might also be profitable
to explore patterns based on scores representing
discrete residuals, with common variance par-
tialled out and represented by an additional
variable.
Computer programs for these types of analy-
sis are available, and such studies could be con-
ducted economically on subsamples of the Survey
sample.
The inclusion of these psychological tests in
the National Health Survey was a very important
step which has tremendous practical value to the
health, education, and welfare fields and which
also has immense scientific value in the life
sciences concerned with child development. De-
spite the technical criticisms, which are in-
evitable in a problem of the magnitude of this
national survey, the tests have been judged to be
either a good choice or at least an eminently
reasonable compromise with reality within the
constraints of the Survey.
The research recommended should be looked
on as an unprecedented opportunity to contribute
toward adequate mental measurement of children.
It is important for those working in this Survey
to bear in mind that this is the first general sur-
vey of psychological functions of children ever
conducted on a sophisticated national sample.
The standardization programs for the tests re-
viewed—and for others referred to—fail to qualify
for this distinction, National psychological sur-
veys of adults have been made in both World Wars,
and recently a national survey of adolescents was
conducted by Project TALENT, However, CyclelIl
is, to the writer's knowledge, the first one of its
kind in the age range of 6 to 12 years.
VI. CROSS-DISCIPLINARY ANALYSES
The complete data of Cycle Il may be regarded
as composing a matrix of several thousand vari-
ables (specific measures or components of meas-
urement procedures) over a sample of nearly
8,000 children. In the processes of data reduction
and analysis, many of these variables will remain
in the matrix without further manipulation (e.g.,
height, weight, body temperature, family income
64
level, twin status, number of siblings, and ages
of parents). Some will require prescheduled
analysis and computation of indexes according to
established procedures in the respective fields
(e.g., visual acuity, exercise tolerance, and
electrocardiogram), while others will require
extensive processing on the basis of empirically
constructed or revised scoring keys and norms,
as in the case of the psychological tests dis-
cussed in this review,
Upon completion of segmental analysis of each
testing and examining procedure and reduction of
all data to indexes and primary variables, it would
be desirable to consider multivariate analysis of
the resulting matrix. This type of approach will
undoubtedly reveal many significant interrelation-
ships not previously investigated because of lack
of appropriate data. It is premature to consider
it now, however, before the reduced data schedule
is more definitely known.
The primary purpose of the present dis-
cussion is to explore possible linkages between
the psychological tests in the Survey battery and
other variables. This, too, is a formidable task,
but some important areas of investigation are
opened up by this Survey, and these opportunities
for significant research deserve special mention.
DATA AVAILABLE
From various sources within the Survey, data
on items such as the following, which have im-
portant behavioral implications, will be available:
Parents—age, nativity, education, income level,
language spoken, psychiatric history, marital
status, handedness, and use of medical care.
(The distributions of these variables are of
interest. In addition, an SES index of socio-
economic level can be derived.)
Siblings—number, twins, ages, education, marital
status, work status. (From these data an
additional variable, birth ordinal position,
can be derived.)
Family—size, living status, ethnic classification,
race, SES.
Child—school information: grade placement;
progress rate; absences; characterization as
requiring special provision ror hard ofhear-
ing, visually handicapped, speech therapy,
orthopedically handicapped, gifted, slow
learning, mentally retarded, emotionally dis-
turbed; description in relation to adjustment,
attention, interpersonal relations, discipline,
popularity, intellectual ability, academic per-
formance. (These data are worthy of some
detailed analysis in order to formulate ex-
ternal rating criteria for independent test
validation and to derive further indexes, such
as peer rejection (based on interpersonal
relations and popularity), general adjustment,
and general adequacy (based on a frequency
count of negative citation).
Child— medical history: prenatal and birth cir-
cumstances, food habits, enuresis, thumb-
sucking, age of walking, talking, early
learning rate, attendance at kindergarten,
experience of unconsciousness, bad burns
(with resulting scars), serious illness, weak-
ness, nightmares, sleeping arrangements,
age at puberty (girls). (Frequency distribu-
tions of these items, particularly of food
habits, which would also provide a basis for
judging food idiosyncracies, and sleeping
arrangements, which should correlate with
SES but may also relate to other variables,
should be of great interest. Correlations of
many of these items with other data may be
extremely important, as, for example, the
investigation of sequelae of early uncon-
sciousness and the development of a growth
retardation classification, a disturbance in-
dex, and a "weakness'’ index.)
Child— sensory and motor indexes: visual acuity,
color vision, hearing indexes, handedness,
grip strength, vital capacity, exercise toler-
ance.
Child— body measurements: height, weight, an-
thropometry, X-ray, dentition,
Child— psychophysiological indexes: blood pres-
sure, temperature, electrocardiogram, pho-
nocardiogram.
Child—medical findings: health status, pathology.
Child— psychological tests: 1Q estimates; verbal
ability level; performance ability level;
reading, arithmetic, maturity level; adjust-
ment index.
ANALYSES INDICATED
The organization and ordering of the lines of
analysis suggested in this section are tentative
and are not intended to suggest priorities, In
most cases, further study of the literature in the
particular areas and consultation with qualified
professional persons would be appropriate before
committing time and funds to particular studies.
65
Nevertheless, the richness of this ''data bank' is
recognized as a source of new scientific knowledge,
and it is hoped that it can be adequately exploited.
Growth Indexes
It is expected that mean growth indexes for
boys and girls will be computed for as many
functions as possible over the six age periods.
Analysis of relations among growth trends—
separately for boys and for girls—and of growth
rate patterns would be of direct interest and
would also permit comparison of pattern indexes
with psychological test scores. Sex differences in
growth patterns and relations of sex-related
patterns to test scores are also of great interest.
Other Factors Related to Test Scores
Discriminant pattern analyses might be un-
dertaken systematically in a multivariate design
to investigate parental, sibling (including birth
order and twin resemblance for the twin sample),
family, school, medical, sensory and motor,
anthropometric, psychophysiological, and medical
correlates of psychological test scores. While
this recommendation may appear forbidding in
magnitude, the multivariate approach is actually
more efficient and economical in total perspective
than piecemeal analyses. Among the studies im-
plied in this broad prescription are the following
types of investigations:
1. Reading disability. Effects of visual and
auditory impairment; handedness; SES;
growth trends; developmental history;
early, recent, and continuing emotional
disturbance; illness; birth order, etc.
2. Mental retardation. Every item in the
above enumeration is potentially related
to mental retardation.
3. School retardation. Same as above,
4. Analyses of discrepancies between actual
and predicted status in velation to con-
comitant ov associated factors. These
data offer an excellent opportunity to look
for significant variance associated with
overachievement and underachievement in
school grade placement, reading achieve-
ment (WRAT and school report), scho-
lastic achievement (school report, WRAT
Arithmetic), and peer relations (deviation
from central tendency).
While more detailed and specific investi-
gations could be enumerated, it is more con-
structive to emphasize the advisability of using
the multivariate approach, since computer equip-
ment and programs are available for such analyses
and since results of greater value can be obtained
at a far lower unit cost.
66
Acknowledgments
The literature review and preparation of ab-
stracts was under the immediate direction of
Samuel H. Cox, Research Associate at the Institute
of Behavioral Research. Principal persons assist-
ing Mr. Cox were Robert M. Marx, John McCrady,
Henry Orloff, and Max S. Taggart II.
The project also was greatly expedited through the
efforts of Miss Johnoween Gill, Reference Librar-
ian, Texas Christian University.
Without the loyal and competent help of these
individuals this report could not have been com-
pleted in only 3 months.
BD:
CA:
CAT:
CMAS:
CRT:
CTMM:
E-G-Y:
FRPV:
FS:
HES:
1Q:
MA:
ns:
PPVT:
PS:
RT:
SAT:
S-B:
SES:
SRA:
SRA-PMA:
SS:
TAT:
Voc.:
VS:
WAIS:
WISC:
WRAT:
GLOSSARY OF ABBREVIATIONS
Block Design subtest of the Wechsler Intelligence Scale for Children
Chronological age
Children's Apperception Test
Children's Manifest Anxiety Scale
California Reading Test
Chicago Tests of Primary Mental Abilities
Kent E-G-Y Test (Scale D, Kent Series of Emergency Scales)
Full-Range Picture Vocabulary Test (by Ammons)
Full Scale (or Full Score) of the Wechsler Intelligence Scales
General, or "global," intelligence factor
Health Examination Survey
Intelligence quotient
Mean
Mental age
Number
Not significant
Peabody Picture Vocabulary Test
Performance Scale (or Performance Score) of the Wechsler Intelligence tests
Range
Correlation
Response time
Stanford Achievement Test
Stanford-Binet Intelligence Scale
Socioeconomic status
Science Research Associates, Inc.
SRA Primary Mental Abilities
Standard score
Thematic Apperception Test
Vocabulary subtest of the Wechsler Intelligence Scales
Verbal Scale (or Verbal Score) of the Wechsler Intelligence tests
Wechsler Adult Intelligence Scale
Wechsler Intelligence Scale for Children
Wide Range Achievement Test
67
% U.S. GOVERNMENT PRINTING OFFICE : 1966 O - 206-188
Series 1.
Series 2.
Series 3.
Series 4.
Series 10.
Series 11.
Series 12.
Series 20.
Series 21.
Series 22.
OUTLINE OF REPORT SERIES FOR VITAL AND HEALTH STATISTICS
Public Health Service Publication No. 1000
Programs and collection procedures.—Reports which describe the general programs of the National
Center for Health Statistics and its offices and divisions, data collection methods used, definitions, and
other material necessary for understanding the data.
Reports number 1-4
Data evaluation and methods research.—Studies of new statistical methodology including: experimental
tests of new survey methods, studies of vital statistics collection methods, new analytical techniques,
objective evaluations of reliability of collected data, contributions to statistical theory.
Reports number 1-15
Analytical studies.—Reports presenting analytical or interpretive studies based on vital and health sta-
tistics, carrying the analysis further than the expository types of reports in the other series.
Reports number 1-4
Documents and committee reports.—Final reports of major committees concerned with vital and health
statistics, and documents such as recommended model vital registration laws and revised birth and
death certificates. :
Reports number 1 and 2
Data From the Health Interview Survey.—Statistics on illness, accidental injuries, disability, use of
hospital, medical, dental, and other services, and other health-related topics, based on data collected in
a continuing national household interview survey.
Reports number 1-27
Data From the Health Examination Survey—Statistics based on the direct examination, testing, and
measurement of national samples of the population, including the medically defined prevalence of spe-
cific diseases, and distributions of the population with respect to various physical and physiological
measurements.
Reports number 1-12
Data From the Health Records Suvvey.——Statistics from records of hospital discharges and statistics
relating to the health characteristics of persons in institutions, and on hospital, medical, nursing, and
personal care received, based on national samples of establishments providing these services and
samples of the residents or patients.
Reports number 1-4
Data on mortality. —Various statistics on mortality other than as included in annual or monthly reports—
special analyses by cause of death, age, and other demographic variables, also geographic and time
series analyses.
Reports number 1
Data on natality, marviage, and divorce.—Various statistics on natality, marriage, and divorce other
than as included in annual or monthly reports—special analyses by demographic variables, also geo-
graphic and time series analyses, studies of fertility.
Reports number 1-7
Data From the National Natality and Mortality Surveys.—Statistics on characteristics of births and
deaths not available from the vital records, based on sample surveys stemming from these records,
including such topics as mortality by socioeconomic class, medical experience in the last year of life,
characteristics of pregnancy, etc,
Reports number 1
For a list of titles of reports published in these series, write to: National Center for Health Statistics
U.S. Public Health Service
Washington, D.C. 20201
U.C. BERKELEY LIBRARIES
€021205979