Survey of Income and Program Participation and Related Longtudinal Surveys: 1984 Selected Papers given at the 1984 Annual Meeting of the American Statistical Association in Philadelphia, Pa August 13-16,1984 Department of Commerce Bureau of the Census Population Division January 1985 Digitized by the Internet Archive in 2012 with funding from LYRASIS Members and Sloan Foundation http://archive.org/details/surveyofincomeprOOkasp Compiled by: DANIEL KASPRZYK DELMA FRANKEL Population Division Bureau of the Census Department of Commerce January 1985 SURVEY OF INCOME AND PROGRAM PARTICIPATION AND RELATED LONGITUDINAL SURVEYS: 1984 Selected Papers given at the 1984 Annual Meeting of the American Statistical Association in Philadelphia, Pa. August 13-16, 1984 U.S. BUREAU OF THE CENSUS John G. Keane, Director C.L. Kincannon. Deputy Director William P. Butz, Associate Director for Demographic Fields POPULATION DIVISION Roger A. Harriot, Chief Acknowledgments This publication Is composed of papers prepared by many different autnors for presentation at the American Statistical Association meetings, August 13-16, 1984. We would like to thank these authors for their cooperation 1n making the papers available for publication. (The views expressed are the authors' and do not necessarily repre- sent the views of the Bureau of the Census.) These papers have been compiled and prepared for publication by Daniel Kasprzyk and Del ma Frankel , Population Division. Clerical and editorial assistance was provided by Hazel Beaton. Suggested Citation "Survey of Income and Program Participation and Related Longitudinal Surveys: 1984," compiled by Daniel Kasprzyk and Delma Frankel. U.S. Bureau of the Census, Washington, D.C. 1984. PREFACE This report is comprised of a selection from the papers presented at the 144th Annual Meeting of the American Statistical Association (ASA) in Philadelphia, Pennsylvania, August 13-16, 1984. Most of the papers in the volume deal with methodological and substantive studies related to the Survey of Income and Program Participation (SIPP) and the Income Survey Development Program (ISDP), an experimental program designed to test, procedures used in conducting SIPP. The SIPP is a new Census Bureau survey collecting data that will help measure income distribution and poverty throughout the country more accurately. These data will be used to study Federal and State aid programs (such as food stamps, Social Security, Supplemental Security Income, Aid to Families with Dependent Children, Medicaid, Medicare, and others), to estimate future program costs and coverage, and to assess the effects of proposed changes in program eligibility rules or benefit levels. Households in the survey will be interviewed at 4-month intervals over a period of 2 1/2 years. The reference period will be the 4 months preceding the interview. In all, about 20,000 households will be in- terviewed, approximately 5,000 each month. Field operations will be handled through the Census Bureau's 12 regional offices. Recurring questions will deal with employment, types of income, non- cash benefits, assets, liabilities, and taxes. Periodic questions will be added dealing with school enrollment, marital history, migra- tion, disability, and other topics. Additional supplemental questions will also be added to the SIPP questionnaire as the need arises. Preliminary tabular data from the SIPP is distributed as part of the Current Population Report series P-70, Econo mic Characteristics of Households in the United States . The first report covering the third quarter of 1983 was released in September 1984. In addition, the first in a series of microdata files from the SIPP is now available. This file contains the results of interviews conducted between October 1983 and January 1984. Microdata files for Waves 2 through 9 of the 1984 SIPP Panel will be released approximately every 4 months, with Wave 2 appearing in February 1985. iii Conten ts. As shown in the Table of Contents, the grouping of the papers~Tand accompanying discussion comments) basically is in keeping with the ASA sessions at which the presentations were originally given. There were four "Survey of Income and Program Participation" sessions, two of which were included in the Social Statistics Section and two in the Survey Research Methods Section of the ASA meetings. These sessions covered a range of topics, both methodological and substantive, about longitudinal surveys and SIPP. Finally, the Social Statistics Section sponsored a session on "Case Studies in Panel Survey Design: The International Experience." This session provided an opportunity for individuals involved in the design and development of longitudinal surveys to share experiences and discuss issues of mutual interest. The contents of the report were prepared by the individual authors for publication in the 1984 American Statistical Association Proceedings . For this reason, the format conforms basically to that required by ASA. Acknowledgements . This publication is composed of papers prepared by many different authors for presentation at the ASA meetings, August 13- 16, 1984. We would like to thank these authors for their cooperation in making the papers available for publication. Clerical and editorial assistance was provided by Hazel Beaton. SURVEY OF INCOME AND PROGRAM PARTICIPATION AND RELATED LONGITUDINAL SURVEYS: 1984 Preface. Survey of Income and Program Participation: Session I Chair: PAUL PLANCHON Mathematica Policy Research, Inc. "Analysis of Intra-Year Income Flows on the ISDP" 3 Written by P. DOYLE, Mathematica Policy Research, Inc. (Focuses on the analysis of minimizing the error in the approximation of monthly income for poor or near poor households which form the universe of potentially eligible units for participation in major means-tested transfer programs.) "An Analysis of Turnover in the Food Stamp Program".. 9 Written by T. CARR and I. LUBITZ, Mathematica Policy Research, Inc. (Analyses turnover in participation and the eligibility to participate in the Food Stamp program.) "The Measurement of Household Wealth in SIPP" 15 Written by E. J. LAMAS and J. M. McNEIL, Bureau of the Census (Discusses the measurement of wealth in the ISDP, imputation, and early SIPP results.) "The Wealth and Income of Aged Households" 25 Written by D. B. RADNER, Social Security Administration (Discusses the economic resources of aged households using data on both income and wealth.) "Using Selective Assessments of Income to Estimate Family Equivalences Scales: A Report on Work in Progress" 31 Written by D. VAUGHN, Social Security Administration (Presents estimates of family size equivalence scales derived from survey respondents' subjective assessments of the adequacy of their family income.) Discussion COURTENAY SLATER, CEC Associates 37 Survey of In come and Program Participation: Session II Chair: GEORGE HALL Baseline Data Corporation "Toward a Longitudinal Definition of Households (Working Paper Series No. 8402)" 41 Written by D. B. McMILLEN and R. HERRIOT, Bureau of the Census (Discusses the cross-sectional/longitudinal conflict and attempts some reconciliation.) "Lifetime Work Experience and Its Effects on Earnings: Data From the ISDP" 47 Written by J. M. McNEIL, Bureau of the Census, and J. T. SALVO, New York City Department of Planning (Discusses differences between men's and women's work experience and the impact on earnings.) "Panel Surveys as a Source of Migration Data" 57 Written by D. DAHMANN, Bureau of the Census (Discusses analyses of migration and previous panel surveys to assess how SIPP can further our understanding of geographical mobility processes.) "SIPP and CPS Labor Force Concepts: A Comparison" 63 Written by P. M. RYSCAVAGE, Bureau of the Census (Discusses differences between labor force concepts found in SIPP and the Current Population Survey, our official source of employment and unemployment estimates.) "Matching Economic Data to the Survey of Income and Program Participation: A Pilot Study" 69 Written by S. HABER, George Washington University, and P. RYSCAVAGE, D. SATER, and V. VALDISERA, Bureau of the Census (Discusses matching data from SIPP with data from files such as the Standard Statistical Establishment List, which contains employer data on payroll and sales receipts. Focuses on the mechanics of the matching process and the uses of the data.) Discussion MARTIN DAVID, University of Wisconsin 75 HAROLD WATTS, Columbia University 79 Survey of Income and Program Participation: Session III Chair: DANIEL HORVITZ Research Triangle Institute "Obtaining a Cross-Sectional Estimate From a Longitudinal Survey: Experiences of the ISDP" 83 Written by H. HUANG, Bureau of the Census (Examines alternative cross-sectional weighting schemes.) "Weighting of Persons for SIPP Longitudinal Tabulations" 89 Written by L. ERNST, D. HUBBLE, D. JUDKINS, D. B. McMILLEN, and R. SINGH, Bureau of the Census (Discusses changes in the sample during the survey and various weighting methods.) "Longitudinal Family and Household Estimation in SIPP" 97 Written by L. ERNST, D. HUBBLE, and D. JUDKINS, Bureau of the Census (Compares weighting approaches for use with longitudinal household and family concepts.) "Longitudinal Item Imputation in a Complex Survey" 103 Written by M. E. SAMUHEL and V. HUGGINS, Bureau of the Census (Discusses problems associated with item imputation in a SIPP annual file.) "Early Indications of Item Nonresponse on SIPP" 109 Written by J. CODER and A. FELDMAN, Bureau of the Census (Studies nonresponse rates in the first SIPP interviews.) Discussion ROY WHITMORE, Research Triangle Institute 115 vii Survey of Income and Program Participation: Session IV Chair: GRAHAM KALTON Survey Research Center University of Michigan "Month-to-Month Recipiency Turnover in the ISOP" 123 Written by J. C. MOORE and D. KASPRZYK, Bureau of the Census (Examines a tendency for reported program turnover in the 1979 Panel to occur between waves more often than within waves.) "The Student Fol low-Up Investigation of the 1979 ISDP" 129 Written by A. M. ROMAN and D. V. O'BRIEN, Bureau of the Census (Discusses objectives, design, and results of an investigation to ascertain whether parents are reliable proxies for their children who are students living away from home.) "The ISDP 1979 Research Panel as a Methodological Survey: Implications for Substantive Analysis" 135 Written by R. A. KULKA, Research Triangle Institute (Discusses the potential for the data in this panel to address social, economic, and policy research issues either by cross-sectional or longitudinal analysis.) "Some Data Collection Issues for Panel Surveys with Application to SIPP" 141 Written by A. JEAN and E. K. McARTHUR, Bureau of the Census (Describes operational methods for dealing with household changes during the survey.) "Managing the Data From the 1979 ISDP" 147 Written by P. DOYLE and C. CITRO, Mathematica Policy Research, Inc. (Discusses data access and analysis problems in the ISDP and the use of a data base management system to resolve them.) Discussion GREG DUNCAN, Survey Research Center, University of Michigan 153 RICHARD ROCKWELL, Social Science Research Council 157 viii Case Studies in Panel Survey Design: The International Experience Session V Chair: BARBARA A. BAILAR Associate Director for Statistical Standards and Methodology Bureau of the Census "The Survey of Income and Program Participation" 161 Written by R. A. HERRIOT and D. KASPRZYK, Bureau of the Census (Presents the rationale for the survey and the history of its development. Discusses the survey's content, design, frequency, data products, and current - research activities.) "The 'German Socio-Economic Panel" 171 Written by UTE HANEFELD, Deutches Institut fur Wirtschaftsforschung, Federal Republic of Germany (Describes the aims, the sample, the interviewing procedure, the questionnaires, the methods to main- tain the panel, and the interim findings of the fieldwork for the first wave of the socio-economic panel .) "Household Market and Nonmarket Activities—The First Year of a Swedi sh Panel Study" 179 Written by N. ANDERS KLEVMARKEN, University of Goteborg, Sweden (Describes the purpose and scope, population, general design, data collected, pretest experiences, sample design, fieldwork considerations, and the response rate to the interviews of the panel study.) "The Australian National Longitudinal Survey" 187 Written by IAN McRAE, Bureau of Labour Market Research, Australia (Discusses the background of the ANLS, both political and technical ; describes the general structure of the survey, data to be collected, and sampling and estimation procedures; and concludes with initial thoughts on analysis.) SURVEY OF INCOME AND PROGRAM PARTICIPATION: SESSION I This section is comprised of five papers presented in this session which was sponsored by the Section on Social Statistics. AN ANALYSIS OF INTRA-YEAR UNEARNED INCOME FLOWS ON THE ISDP PAT DOYLE, MATHEMATICA POLICY RESEARCH, INC. This paper presents excerpts from a study conducted for the Food and Nutrition Service, USDA (Doyle, 1984). The overall objective of the study was to provide the foundation upon which improvements could be made to the simulation of Food Stamp Program costs and caseloads. The simulations are currently conducted on CPS-type files which lack detail on intra-year income receipts. The Food Stamp Program, however, has a monthly accounting period, thus requiring that the simulation model first allocate annual income to monthly amounts. In so doing, the current food stamp model relies on simple assumptions concerning variation in unearned income. These assumptions were made in the absence of infor- mation on intra-year income flows. Essentially the model assumes that with the exception of welfare and unemployment compensation benefits, unearned income is received evenly throughout the year. Examination of data from the ISDP shows, as expected, that for some sources this assump- tion is sufficient. However, for other income sources, the observed intra-year income fluc- tuations are large. Therefore, in order to improve the model of the Food Stamp Program, existing procedures for allocation of unearned income should be changed. This presentation reports on the monthly variation in income recip- iency and amounts for unemployment compensation, workmen's compensation, asset income, and other unearned income exclusive of welfare. THE DATA The focus of this research was guided by the information available on CPS-type files. One important source of information on these files which contributes to the understanding of intra- year patterns of recipiency is the duration of the work period within a calendar year. That is, part-year workers can be distinguished from full- year workers. Hence, one dimension of the descriptive statistics which follow is work versus non-work. Furthermore, since the improvements to the simulation do not extend to the imputation of fluctuations in household composition within the year on the CPS, an analysis file was designed to replicate the cross-sectional nature of the CPS files. That is, individuals in households as they existed in Wave 5 were extracted along with retrospective longitudinal data on income receipts during the calendar months of 1979. In order to target the research to the potentially eligible population, the universe for the study was further restricted to low income households. Low income households were defined as households with composition defined as of the Wave 5 inter- view where the total monthly income aggregated over all members was less than twice the Food Stamp eligibility limit for at least one month during calendar year 1979. * For the initial screen, if an observation reported recipiency for an income source but did not report the dollar amount it was counted as receipt of a zero amount. Individuals not present in the sample for the entire reference period were treated as non-recipients of income during their absence. There were 9,383 individuals extracted, 7,468 of which were present in the ISDP sample all 12 months of calendar year 1979. THE RESULTS Workmen's Compensation . For thi6 source the ISDP provided mixed results. There were extreme- ly few observations (71 individuals reporting receipt during the year, 65 of whom were part- year workers). Furthermore, individual reported amounts were observed to be unreasonable (one observation reported receipt of $16,666 in each of three months). The question still remains, therefore, as to whether or not to allocate this income source solely to periods of non-work. The ISDP data showed that more than half of the respondents (53% based on weighted counts) reported receipt during the work period. Of those, 57% received workmen's compensation only during the work period, the remainder received it during both work and non-work periods. For purposes of modeling food stamp eligibil- ity, this income source was lumped with the residual other income category discussed below. As described there, duration of receipt was first randomly imputed and then the actual calendar months of recipiency were randomly assigned. This stochastic process was independent of the assignment of work versus non-work periods. Unemployment Compensation . In the analysis of unemployment compensation the issue of whether or not to allocate recipiency evenly over the year is irrelevant because this source is directly related to labor force activity. It is currently assumed that when both recipients of unemployment compensation and weeks of unemployment are reported, that the periods coincide. Therefore, the key issue is to what extent does UI recipi- ency coincide with work periods. Table 14 dis- plays recipients of unemployment compensation by duration of the work period and by the coin- cidence of work and receipt of this income source. Months working in this case (as in all other cases in this report) are months during which earnings are reported. This table shows that 17.6% of UI recipients in poor households were full-year workers. An additional 42% were part-year workers reporting UI receipt while working. This seems unusually high compared to other estimates. Burtless (1983) reports that 6% of UI recipients in a given week are underem- ployed rather than unemployed. It is useful, therefore, to examine the recipients more closely. Table 15 shows the duration of the coincidence of work and UI receipt for the 2.6 million workers. In over 95% of the cases the periods of receipt of UI did not encompass the full period of work. Over half of this group only received unemployment compensation during one of the months reported working. An addi- tional 30% reported receipt only during two months of working. This suggests that the coin- cidental period was one of transition. That is, either a job was lost early in the month and UI benefits are received later in the month or UI month of overlap reflect individuals with only one transition period and the remaining 30% represent cases with two transition periods. Fifteen percent of the cases had more than two months of coincidental receipt of earnings and UI. These include both cases where more than two transitions occurred as well as underemployed individuals. These results suggest that a significant number of the cases reporting earnings and UI receipt concurrently were in a transition period. This is further supported by the fluctuation in average benefits displayed in Table 16. average UI benefits received during the worked period were $190 whereas average benefits received during the nonwork period were $316. When disaggregated by whether or not individuals received UI benefits during both periods of work and nonwork, the contrast in average benefits across the two periods is increased. Recipients during both periods report an average of $191 in the work period and $331 in the nonwork period. Persons receiving benefits solely during non-work periods reported an average of $304 and persons receiving benefits solely during work periods reported $189. Based on these results the allocation of UI within the year on CPS-type files will be modi- fied. The process will be to first derive expected duration of receipt of UI from the reported annual amount incorporating regulations regarding maximum weekly benefits and maximum weeks of receipt. Then the period of receipt will be allocated to periods of unemployment. If the duration of receipt exceeds the unemployment period, excess months of recipiency will be allocated to periods of work immediately sur- rounding periods of unemployment. Asset Income . Table 17 shows the frequency counts of persons in low income households by number of months receiving income from assets. This category includes interest income from savings accounts, etc. , dividend income from stocks and bonds, and rental income from pro- perty. Overall, 55% of the population reporting asset income received that income the full year. Of the remaining persons with asset income, the distribution by number of months received is biased due to the way in which asset income data were gathered in the ISDP. For many sources included here, lump sum quarterly or semi-annual amounts were requested by the interviewer depend- ing on whether the individual fell in the three month recall sample or the six month recall sample. Recipiency for these cases was evenly allocated over the relevant period thus intro- ducing a bias into the examination of recipiency patterns within the year. Table 17 also shows the differences in recip- iency patterns for elderly and non-elderly (elderly in this case refers to persons age 60 and older). For the older population, the proportion of individuals receiving asset income for 12 months is 72%, which is significantly higher than the overall population. Forty-nine percent of the non-elderly recipients received asset income the full year. Based on the fact that such a high proportion of the asset income recipients received that source the full year and that the ISDP question- naire design prevents a detailed analysis of intra-year patterns of recipiency, asset income recipiency on the FNS data base will be evenly allocated across calendar months. Table 18 shows the change in average monthly asset income between work and non-work periods for persons in low-income households who received asset income during both periods and for persons whose assets were less than $3,000. The asset screen was imposed to further isolate individuals who were potentially eligible to receive food stamps. For the potentially eligible group, asset income did not vary significantly across periods of work and non-work. Potentially eligible elderly individuals reported $10 on average during both periods while the non-elderly reported $5 during the nonwork period and $4 during the work period. Based on the observation that individuals in low income households with small amounts of assets tend not to have significant variation within the year, asset income amounts in the FNS data base will be evenly allocated across calendar months. Other Income . The pattern of recipiency of other income varies significantly for elderly and non-elderly recipients. Table 19 shows that 77% of the elderly report receiving other Income continuously throughout the year whereas only 24% of non-elderly received this income amount for the full 12 months. Recipiency patterns for non- elderly were further disaggregated by labor force activity but then did not appear to be a signifi- cant difference between patterns of recipiency of part-year workers and the rest of the non-elderly population. Recognizing that this residual income category includes income sources such as lump sum payments which are received intermittently throughout the year as well as regularly received sources such as pensions and social security, the non-elderly recipients were further examined by type of other income received. Table 21 shows the distribution of non-elderly recipients in poor households by- number of months receiving regular and irregular income. Regular income consists of social secur- ity, railroad retirement, veterans benefits, and other disability payments excluding workmen's compensation, and private and government pen- sions. Irregular income Includes all other sources not classified in the previous sections or classified as regular income. Over one-third of the recipients of regular income received that source for 12 months whereas only 15 percent of the recipients of irregular income received that source for the full year. In both cases, the majority of the remaining recipients fall at the lower end of the distribution. The balance of the recipients in both cases are probably fairly evenly distributed but apparent reporting prob- lems on the ISDP tend to bias the observed dis- tribution. The apparent reporting problems result from respondents' tendency to report changes in recipiency status more often between waves than between consecutive pairs of months (Moore and Kasprzyk, 1984). Table 22 shows the change in average amounts received for other income across periods of work and non-work separately for elderly and non- elderly recipients. Overall, average income amounts were 20% higher during work periods. As was true for recipiency, there was a significant difference between elderly and non-elderly income flows* For the elderly in low Income households who received other Income during work and non- work periods, amounts received while working were approximately the same as those received while not working. When restricted further to elderly individuals with low assets, more variation existed but average amounts received while work- ing were only six percent less than amounts received while not working. Non-elderly indivi- duals, on the other hand, reported average bene- fits received while working to be 130 percent of those received while not working. When restrict- ed to individuals with low assets, the variation was reduced but still remained high with amounts received during work periods 20 percent higher than amounts received while not working. Again this analysis of the 1SDP data suggest that for the elderly an even allocation of other income across months would be sufficient for modeling food stamp eligibility. However, for non-elderly, this method would understate the variation in amounts received from this source. Based on the distribution presented here, the allocation of other income recipiency on the Food Stamp data bases will be a stochastic process whereby duration is randomly determined and then a period of recipiency is stochastically assigned within the twelve month reference period. The probability upon which the duration will be determined will be derived from the distributions presented in Tables 19 through 21. Recognizing that the irregularity of the distributions is caused, at least to some extent, by reporting patterns on the ISDP, they will be smoothed out in the construction of a cumulative probability function. Also, workmen's compensation will be included with regular income. Amounts received will be evenly allocated over the imputed period of recipiency. FOOTNOTES The food stamp eligibility limits were those in effect July 1, 1979. These reflect the 0MB poverty guidelines for mid 1978 updated by the CPI. As is true for other unearned income sources, the CPS does not measure duration of receipt. As is true for all unearned income sources, the CPS does not measure duration of receipt of other income. Receipt Period Recipients Work Status Count (1000) X Full Year Workers Work only 778.27 17. bX Nonworkers Nonwork only 198.31 4.5 Part Year Workers Both Nonwork only Work only 1679.63 1595.64 182.46 37.9 36.0 4.1 Total 4434.31 100.0 SOURCE: Prepared by Mathematica Policy Research using the ISDP/RAMIS II system. NOTE: This table is based on 371 obser- vations who reported unemployment compensation and who were present in the sample the full year. REFERENCES Burtless, Gary. "Why is Insured Unemployment So Low?" In Brookings Papers on Economic Acti- vity . (Washington, D.C.: The Brookings Institute, 1983) Doyle, Pat. An Analysis of Intra-Year Income Flows . Final Report to USDA. (Washington, D.C.: Mathematica Policy Research, July 1984) Moore, J. C. and Kasprzyk, D. "Month to Month Income Recipiency Changes in the ISDP." Paper prepared for the American Statistical Asso- ciation Meetings, August 1984. TABLE 17 DURATION OF RECEIPT OF ASSET INCOME Number of Months Number of Receiving UI Months While Working Worked Recipients Count X (1000) 3-6 7-11 126.5 846.8 338.4 1,437.6 4.8 32.1 12.8 54.4 144.3 367.4 28 3.2 5.5 13.9 10.7 794.9 30.1 36.1 215.1 156.6 1.4 8.1 5.9 406.9 15.4 2640.4 100.0 URCE: Prepared by Mathematics Policy Research using the ISDP/RAMIS II system. TE: This table is based on 210 observations who were interviewed over the entire calendar year. There were an addi- tional 27 observations reporting receipt of UI while working who were not interviewed during at least one wave. TABLE 16 AVERAGE UNEMPLOYMENT COMPENSATION FOR PART- YEAR WORKERS BY WORK STATUS Period of Receipt Work Only Nonwork only Both Prepared by Mathematica Policy Research using the ISDP/RAMIS II system. This table is based on 334 obser- vations reporting UI receipt who also reported amounts, that is, cases of nonresponse have been screened out. Elderly Count (1000) X Nonelderly Total Count Count (1000) X (1000) 176.1 VI 1096.7 IX 1272.8 338.6 2 746.3 2 1084.9 290.0 2 2 305.6 5 2595.6 308.1 2 1060.2 2 1368.3 264.8 1 968.4 2 1233.2 961.7 5 5991.9 13 6953.6 648.6 4 1831.6 4 2480.2 371.7 2 2597.4 6 2969.1 848.8 5 4034.3 9 4883.1 639.0 3 1191.9 3 1830.9 237.0 1 1309.9 3 1546.9 12 13187.7 72 21902.4 49 35090.1 55 Total 18272.1 100 45036.6 100 63308.7 100 Prepared by Mathematica Policy Research using the ISDP/RAMIS II system. This table is based on 5992 observa- tions, 1664 of which are elderly. Persons not present in the sample the full year have not been screened. TABLE 18 Average Monthly Asset Income Age Months Working Months Not Working Assets Months Working C $3000 Months Not Working Elderly Not Elderly Total 125 22 33 70 27 32 10 5 6 10 4 5 Unweighted Number of Persons upon which Averages are Based Months Working Assets < $3000 Age Months Not Working Months Working Months Not Working Elderly Not Elderly Total 138 1,158 1,296 107 929 1,036 75 939 1,014 55 753 808 Prepared by Mathematica Policy Research using the ISDP/RAMIS II system. This table consists of part-year workers who reported receiving asset income in both periods of work and periods of non- work. Persons not interviewed the full year have not been screened. DISTRIBUTION OF NON-ELDERLY RECIPIENTS OF OTHER INCOME BY DURATION OF RECEIPT BY TYPE OF OTHER INCOME RECEIVED Number of Irregi ilar Regular Count Months Count X X Receivi ng (1000) (1000) 1 4931. 5 32 954.3 13 2 1825.1 12 547.3 7 3 2316.3 15 788.1 11 4 765.1 5 177.9 2 5 1040.9 7 327.0 4 6 384.1 2 524.2 7 7 456.8 3 122.9 2 8 259.0 2 161.4 2 9 835.0 5 476.4 6 10 129.3 1 156.6 2 11 355.8 2 311.7 4 12 2346.1 15 2801.5 38 TOTAL 15645.0 100 7349.2 100 SOURCE: Prepared by Mathematica Policy Research using the ISDP/RAMIS II system. CHANGE IN AVERAGE MONTHLY OTHER INCOME ACROSS PERIODS OF WORK AND NONWORK Average Monthly Other Income '""""" " ts < $3000 Months Months Not Months Not Working Working Working Working Elderly 330 325 302 321 Not Eldei :ly 452 348 418 346 Total 407 339 381 338 Unweighted Numb' er of Persons upon which Aver ages are Based Assets < $3000 Months Months Months Not Months Not Age Working Working Working Working Elderly 204 201 137 133 Not Elderly 349 332 306 291 Total 553 533 44 3 424 SOURCE: Prepared by Mathematica Policy Research using the ISDP/RAMIS II system. AN ANALYSIS OF TURNOVER IN THE FOOD STAMP PROGRAM Irene Smith Lubitz, Mathematics Policy Research Timothy J. Carr, U.S. General Accounting Office The findings reported in this paper are based on research carried out by Tim Carr, Pat Doyle and Irene Smith Lubitz as a part of MPR's analy- sis of participation in the Food Stamp Program, pursuant to contract No. 53-3198-0-101 with the Food and Nutrition Service, USDA. The authors are indebted to Steven Carlson, Harold Beebout, Jim Ohls and others for helpful comments on the research on which this paper is based. The con- clusions presented here are solely those of the authors and do not necessarily reflect the opinions of the Food and Nutrition Service or Mathematics Policy Research. INTRODUCTION Researchers and policymakers have long had an interest in analyzing the factors associated with the decision of eligible individuals and house- holds to participate (or not to participate) in income maintenance programs. Accordingly, a large and growing body of research on this topic has appeared in recent years. There has also been considerable interest in patterns of program participation over time (i.e., entry into, and exit from, these programs). However, there have been relatively few studies of the longitudinal patterns of participation (which we hereafter refer to as "turnover"), because of stringent data requirements. Such studies that have appeared have usually been based on peculiar samples (e.g., participation in negative income tax experiments). In this paper we report the findings of an analysis of turnover in the Food Stamp Program based on data from the 1979 Income Survey Development Program Research Panel (hereafter referred to as the ISDP data). The ISDP data base was a precursor of the forthcoming Survey of Income and Program Participation, and displays many of the same attributes that will make SIPP especially advantageous for research on turnover. As such, our research illustrates the potential uses of SIPP for research in this area. THE DATA A successful analysis of food stamp turnover requires data possessing several essential characteristics. To our knowledge, the ISDP is the first data base to exhibit all of these desirable features simultaneously. These charac- teristics are as follows: o The ISDP is nationally representative, and it yields enough observations (about 7,500 households) to permit meaningful analysis. o The ISDP is longitudinal; retrospective interviews were conducted quarterly for a period of fifteen months. o Data on income that a sample household receives from various sources (including the receipt of food stamps) is ascertained on a month-by-month basis. Furthermore, the exact timing of all changes in house- hold composition between interviews is pinpointed. These data requirements are especially crucial to the analysis of turn- over in the Food Stamp Program because many households enter and exit the program in the space of less than a year, and because program participation may be triggered by events such as a temporary drop in house- hold income that could not be picked up by a survey that only measures a household's annual income. o The data on Income, household composition, and assets permit us to simulate eligibil- ity for the program, using the eligibility criteria actually used in 1979. o There is a wealth of explanatory variables, such as household and individual character- istics, types of income, and-program parti- cipation, available to the analyst for both tabular and multivariate analyses of turnover. METHODOLOGY We focus primarily on three measures of pro- o The entry rate ; that is, the probability that a household that was not receiving food stamps in a given month received food stamps the next month. o The exit rate ; that is, the probability that a household that received food stamps in a given month did not receive food stamps in the next month. o The annual/monthly ratio ; that is, the ratio of the probability that a household receives food stamps over the course of a year to the probability that it receives food stamps in a given month. Both tabular and multivariate techniques were used in the analysis. The multivariate analysis used the RATE model for the analysis of event histories that has been developed by Nancy Tuma and her colleagues (Tuma et al. , 1979), and applied in studies of marital instability, unem- ployment duration, and other socioeconomic pheno- mena. Although the tabular analysis of exit and entrance rates focused on one characteristic at a time, making it difficult to sort out compo- sitional effects, the multivariate analysis in general yielded largely similar results. EMPIRICAL FINDINGS Tabular Analysis The evidence from the ISDP panel, via tables and multivariate analysis, is that turnover in food stamp participation is high. We estimate the ratio of annual to monthly participation at 1.74, indicating that the number of households served by the program over the course of a year is about 70 percent greater than the caseload in an average month (Table 1). To illustrate the implications of this finding, we note that pro- gram data for 1979 indicate that the average monthly caseload in 1979 was 6.5 million households (USDA, 1979). This annual-monthly ratio implies that 11.3 million households — about 14 percent of all households — received food stamps sometime during 1979. Most of the food stamp households observed in our data received food stamps for only part of the year. About two thirds of the sample house- holds who received food stamps during 1979 participated in the program for less than the full year and nearly a third of the participants received food stamps for 3 months or less in 1979. Only about one-third of all food stamp recipient households observed received food stamps "continuously" (that is, for all months present in the sample). In other words, a truly "long-term caseload" may account for only about a third of the households who receive food stamps in a given year. The 1979 sample period is rather short for observing individual households' food stamp spells over time. However, even during this relatively short observation period, about 12 percent of food stamp households experienced more than one spell of participation. This would seem to indicate that recidivism in the Food Stamp Program — households returning to the program after not participating for some interval — may be high. The average monthly rate of exit from the Food Stamp Program is estimated at over seven percent. That is, in a given month, over seven percent of the caseload may be expected to leave the program by the next month. The exit rate in a given month is the proportion of the previous month's caseload (or of a caseload subset) that has now left the program, and is estimated from caseload counts and exits from the program in each month of 1979. In the aggregate, then, a substantial share of the caseload "turns over" each month, with perhaps 500 thousand households leaving the program and being replaced, in a steady state of no program growth, by a similar number of new entrants. (When the program is expanding, entrances will exceed exits, and if contracting exits will exceed entrances.) In fact, there was significant expansion in 1979, due to program- matic changes. The program entrance rates measure inflows to food stamp participation. These rates, expressed relative to total population, are much lower than exit rates but in fact represent flows into the program that approximately equal outflows (exits). The average monthly entrance rate as shown in Table 1 is 0.5% per month — that is, the average probability of a nonparticipant in a given month becoming a participant in the next month is about half of one percent. Turnover rates in the Food Stamp Program, however measured, appear to be quite different for different kinds of households. The various measures of turnover presented for different population subgroups indicate that the more "permanent" part of the food stamp caseload includes households participating in AFDC and other welfare programs, and elderly households. A more transient group of participants includes younger non-welfare households with more labor force attachment and education. Multivariate Analysis The multivariate analysis of turnover in program participation, using the RATE model, provides estimates of the independent association of household characteristics with different turnover rates. The results of estimating our basic model of transitions to and from partici- pation in the Food Stamp Program are presented in Table 2. The precise interpretation of these coefficients is not entirely straightforwatd, as entry and exit rates are complicated functions of the coefficients. Note that the qualitative effect of an explanatory variable on entry and exit rates is indicated by the sign of its coefficient, just as would be the case in the more familiar linear regression model. For instance, the coefficient of the elderly/disabled dummy variable is positive in the entry model, and negative in the exit model. This indicates that households containing elderly or disabled persons are more likely to enter the program, and less likely to exit from it, ceteris paribus . In general, the results are consistent with the results of the tabular analysis presented above, in that the household characteristics that appear to be associated with high entry and exit rates on the basis of the tabular analysis are also those that appear to be associated with high entry and exit rates on the basis of the multi- variate analysis. In particular, the following findings are both statistically and substantively significant. o Nonwhite households who are not in the program are far more likely to enter the program in any given month than otherwise similar white households; furthermore, non- white households that are receiving food stamps in any given month are likely to stay on the program longer (i.e., have lower exit rates) than otherwise similar white households. o Households within which there is no currently employed person are both more likely to enter the program and less likely to exit the program, ceteris paribus . o Households with one head and households with an elderly or disabled person tend to stay on the program longer than other households, all other things being equal. o Households that receive AFDC are both more likely to enter the program, and less likely to leave, than otherwise similar households. This last finding, especially the higher entry rate of AFDC households, is especially interest- ing because it has been hypothesized by some previous researchers that there is a "stigma" effect that acts as a sort of psychological barrier to participation in income maintenance programs (e.g., see Czajka, 1981). These researchers have found that participation in one program is generally correlated with participa- tion in other programs, and our findings tend to confirm theirs. This behavior can be explained in two ways. First, it may be the case that there are households whose members are psycholo- 10 gically less averse to receiving welfare than others, and hence are more likely to apply for benefits from all programs. Second, a household may perceive little or no additional stigma from applying for and receiving benefits from other programs. Of course, these explanations are not mutually exclusive, and it is difficult, if not impossible, to disentangle them with the data available to us. The estimated coefficients of the RATE model can be used to predict monthly entry and exit rates, annual participation rates, the ratio of annual to monthly participation rates, and the expected duration of spells of participation for a hypothetical household with any combination of characteristics. In order to make the impli- cations of our estimated models more transparent, we have calculated the values of these functions for certain combinations of characteristics. Specifically, our approach to this presenta- tion is as follows. First, we define a "base- line" household that has characteristics that are fairly typical: a white household with two heads, at least one of whom is employed, and no children. Furthermore, this hypothetical house- hold does not receive AFDC, nor does it contain an elderly or disabled person. We have calculat- ed predicted monthly entry and exit rates and other measures of turnover for the baseline household; these results are presented in the first row of Table 3. The numbers in the other rows of Table 3 are derived by altering the assumed values of the explanatory variables one by one. For instance, the row labeled "elderly/disabled" pertains to a hypothetical household that contains an elderly or disabled person, but is otherwise similar to the baseline household defined alone, and so forth. As one would expect based on the results in Table 1, there are certain identifiable types of "low-turnover" households, such as households with an elderly or disabled person and households with no person who is currently employed, that are characterized by a low ratio of annual to monthly participation rates and a high predicted duration on the program. The last three rows illustrate the effect on a household of having two or more characteristics that are associated with low turnover. The first two of these rows simulate the case of a house- hold headed by a single person and containing a child who is under 6 (in the first case) and over 6 (in the second case). The last row describes a hypothetical household consisting of a retired elderly person who lives alone. Our results imply that if he/she receives food stamps, he will be on the program for an average of over four years before exiting, several times longer than the expected duration of participation for the population as a whole. FOOTNOTES It These estimates are based on households present in sample for the full calendar year. However, when households present for only part of the year are included, the results are similar. Note that these estimates, although illustrative of caseload composition in a given year, do not imply estimates of average duration of spell length due to the restricted sample period. Households with fewer than 12 food stamp months in 1979 may be observed during spells that began before or ended after that year. Estimate based on "true spells" — spells of participation separated by an interval in which the household was present in the sample but not receiving food stamps. "^his calculation is based on "true exits" only; where a true exit is, generally speaking, one where the unit remains in the sample but is observed to be no longer receiving food stamps, as opposed to a unit who leaves the sample following some period receiving food stamps. This estimate is based on an average monthly caseload of 6.5 million households in 1979, as indicated by program data. 5 The weighted ISDP counts show about 3.7 million entrances and 3.2 million exits over the course of 1979, consistent with the observed increase in the sample caseload for that period. The manner in which these functions were calculated is described in Carr et al. (1984, Appendix C). A detailed description of the manner in which these numbers are calculated is provided in Carr et al. (1984, Appendix C). These calculations assume that the conditions underlying the simple Markov model are satisfied; in particular that the explanatory variables account for all or most systematic variation in entry and exit rates. REFERENCES Carr, Timothy J.; Doyle, Pat; and Lubitz, Irene Smith. Turnover in Food Stamp Participation: A Preliminary Analysis . Mathematica Policy Research report submitted to the U.S. Food and Nutrition Service, July 1984. Czajka, John L. Determinants of Participation in the Food Stamp Program: Spring 1979 . Mathematica Policy Research, Draft Final Report submitted to the Food and Nutrition Service, U.S. Department of Agriculture, November 5, 1981. Tuma, Nancy Brandon; Hannan, Michael T. ; and Groeneveld, Lyle P. "Dynamic Analysis of Event Histories." American Journal of Sociology 84, no. 4 (January 1979), pp. 820- 854. U.S. Department of Agriculture, Food and Nutrition Service, Accounting and Reporting Division. Food Stamp Program Statistical Summary of Operations . January-December 1979. Annua I /Monthl Port Id potior Ratio Months of Participator 27.5J 13.81 25.4J 33.4f Under 25 25-44 45-59 60-64 65+ Family Status Married "/children Single -/children Married, no children Single, no children Race Household S'ze Children Under 19 Children Under 6 ^ghest Grade Completed Less then 9th 9th -11th Presence of Earners Present Not Prese it Elderly or Disabled Persons Elderly Disabled 0.47 2.51 0.16 0.52 0.48 0.32 0.48 1.25 0.34 0.59 0.92 31.9 10.4 '9.3 38.4 27.2 14.0 25.8 33.0 8.6 20.1 12.2 32.3 35.4 12.5 26.9 16.0 24.2 32.9 13.0 33.5 18.1 18.6 29.9 11.8 28.9 8.1 22.6 40.0 9.5 26.5 6.6 31.2 35.6 15.1 34.9 24.8 19.3 21.1 11.1 21.5 11.2 26.9 40.3 8.0 32.7 10.4 26.9 30.2 9.5 25.1 6.1 29.6 39.4 11.1 33.4 13.6 21.8 31.2 12.2 16.3 14.2 32.1 37.5 10.1 19.3 3.9 37.2 39.6 8.4 23.0 19.4 17.3 36.2 12.7 40.2 13.8 19.4 26.6 13.5 18.2 16.9 29.4 34.9 9.2 27.0 7.2 28.8 37.0 13.6 25.8 25.2 21.1 27.9 12.7 28.6 14.3 24.3 32.8 10.9 27.4 12.9 24.2 35.5 14.8 21.8 17.0 34.1 27.1 7.6 43.1 11.9 11.1 33.8 10.1 24.2 9.3 26.2 40.3 10.8 28.6 11.2 21.2 38.9 14.3 22.0 21.1 31.7 25.2 10.6 51.8 17.9 16.4 13.7 1979 I SOP Panel 12.2 35.3 15.7 25.3 10.7 18.7 11.6 25.4 6.9 25.5 5.5 30.4 11.7 23.6 13.6 12.3 18.3 25.8 8.4 22.9 12.7 27.5 16.4 26.6 text fc >r details of partlc Jlar 12 ESTIMATED COEFFICIENTS OF A MODEL OF TURNOVER I N FOOD STAMP PARTICIPATION Independent Variable Entry Model Exit Model Constant -5.374 -2.841 Elderly/Disabled .132 (0.92) -.683 <-3.59)«« Nonantte 1.601 (14.89)«» -.357 <-2.42)»« Single head .212 <1.51)« -.4 38 (-1.98)" Youngest child under 6 .793 <4.37)«« -.067 (-0.27) Youngest child 6-18 .378 (2.13)" -.037 (-0.14) AFDC recipient 1.223 (4.26)«« -.349 (-1.62)* Earner present -1.353 (-9.71)"* .901 (5.59)"* Single head, child present .743 (3.48)«« -.333 (-1.14) X 2 454.24»«* 116.79«» Number of observations 7,276 667 Source: Calculated by Mathematics Policy Research from 1979 ISDP Pam Note: Asymptotic t statistics are In parentheses. • Significant at .10 level (one-tailed test). •• Significant at .05 level (one-tailed test). •*• Significant at .01 level (one-tailed test). Monthly Annual Annual/ Part 1c! pat ton ^rt Id pat Ion Monthly Predicted Household Type P(Entry) P(Exlt) Rate Rate Ratio Duration Baseline 0.11 13.4 0.9 2.1 2.46 7.5 Elderly/disabled 0.13 7.0 1.8 3.3 1.76 14.3 Nonwhtte 0.56 9.5 5.6 11.3 2.02 10.5 Single head 0.14 8.9 1.6 3.1 1.97 11.3 Youngest child under 6 0.25 12.6 1.9 4.6 2.36 8.0 Youngest child 6-18 0.16 12.9 1.2 3.0 2.41 7.7 AFOC recipient 0.39 9.6 3.9 7.9 2.04 10.4 Ho earner present 0.45 5.7 7.4 11.8 1.61 17.7 Single head, chtld 0.67 6.0 10.0 16.3 1.64 16.6 under 6 Single head, child 6-1 3 0.44 6.2 6.6 11.0 1.67 16.2 Single head, elderly, 0.65 1.9 25.6 30.7 1.20 53.2 no earner present ulated by Mathewt i 1979 ISDP Panel 13 THE MEASUREMENT OF HOUSEHOLD WEALTH IN THE SURVEY OF INCOME AND PROGRAM PARTICIPATION Enrique J. Lamas and John M. McNeil, U.S. Bureau of the Census I. Introduction The Survey of Income and Program Parti- cipation (SIPP) is designed to obtain current estimates of income, labor force activity, and participation in government transfer programs. Currently available survey data have important limitations which have been well recognized and documented [Yeas, 1982; Yeas and Lininger, 1982; Nelson, McMillen, and Kasprzyk, 1983]. The March supplement to the Current Population Survey (CPS) is presently the major source data on the distribution of income. In the March CPS, household members at the time of the interview are asked to recall their income for the previous calendar year. While the CPS survey does well in estimating wage and salary income, it has serious underreporting problems with respect to property income (such as inter- est, dividend, and rental income) and several other income types (such as Supplemental Secu- rity Income (SSI), and worker's compensation)*. The CPS has other limitations: subannual in- come estimates are not available; annual house- hold income estimates cannot be adjusted for changes in household composition or size during the year, for example, as a result of a death, marriage, or divorce; and, information impor- tant for assessing economic well-being of the population and for policy analysis, such as assets, taxes and other characteristics, are not covered. SIPP is designed to overcome many of these limitations. Income data from a wide variety of sources including wage and salary and govern- ment transfer programs are collected on a mon- thly basis. Changes in household composition are also identified on a monthly basis. In addition to improving measures of income and program participation, SIPP will also obtain detailed data on other important topics inclu- ding asset and liabilities, and taxes. Asset and liability data are important in determining program eligibility and assessing the economic situation of families. Information on the dis- tribution of household wealth is important and changes in wealth provide data on consumer savings. SIPP will be unique among surveys in providing a recurring series on household wealth. The focus of this paper is to describe the effort of SIPP to provide estimates of household wealth holdings in the United States and to present some preliminary results. Data to study the composition and distri- bution of household and personal wealth have come from three sources: estate tax returns, synthetic databases, and surveys [Wolff, 1983], Each source, however, has its limitations. Estate tax returns consist of records filed with the Internal Revenue Service (IRS) for estate tax assessments. Coverage of the popu- lation is a major problem for this data source. Only descendents with substantial wealth holdings ($300,000 or more in 1981 and $500,000 or more in 1982) are required to file estate returns. Under certain assumptions about mortality, an "estate multiplier tech- nique" has been used to estimate wealth for living individuals who would have been required to file estate tax returns [Schwartz, 1983]. However, this technique has limited population coverage, and only provides estimates for top wealth holders. Synthetic databases, such as the Measurement of Economic and Social Performance file (MESP) [Wolff, 1983] and the 1973 Office of Tax Analy- sis file [Greenwood, 1983], merge data from several sources in order to get appropriate population and asset coverage. The MESP file consists of a statistical match of the 1970 census 1/1,000 public use sample to IRS tax returns. All asset values were imputed from the IRS tax information for financial assets and from the Consumer Expenditure Survey (CES) for consumer durables. The Office of Tax Analysis file consists of a statistical match of the 1973 CPS to tax records from the 1973 Individual Income Tax Model. The IRS data on dividends and interest are used to estimate the market value of financial assets and the IRS data on property taxes are used to estimate the value of real estate holdings. Net wealth estimates are then imputed using a regression of net wealth on asset values based on IRS estate tax data. Limiting these databases are various assumptions which underlie the matching procedures, the procedure of income capitali- zation, and the extension of estimates to the whole population [Smith, 1983]. A third source of data are surveys covering household assets and liabilities. Major previ- ous surveys include the Survey of Consumer Finances (SCF), the Survey of Financial Charac- teristics of Consumers (SFCC), and the Income Survey Development Program (ISDP). The SCF are periodic surveys (the latest conducted in 1983) with sample sizes of 3,500 to 4,000 consumer units. The SFCC, conducted in 1962 and 1963, canvassed 2,557 units. Information on the size and components of wealth was collected as of December 31, 1962 [Projector and Weiss, 1966]. Both the SFCC and the 1983 SCF used IRS records in order to oversample high income households. The ISDP, conducted in 1979 and 1980, collected information on assets and liabilities from approximately 7,000 households. The ISDP was a research panel designed to prepare for SIPP. The ISDP interviewed respondents on a quarterly basis on six occasions. In the fifth interview, a supplement on assets and liabilities was admin- istered which collected information as of December 31, 1979. Surveys have the advantage of being able to have samples and survey instruments specifi- cally designed to gather the necessary detail and information to estimate wealth holdings. Surveys, however, have suffered from limitations in population and asset coverage. In the next section of this paper, signi- ficant features of the SIPP design with respect to the measurement of wealth are discussed. The final section presents results from the first the first wave of SIPP. II. SIPP Design Features SIPP is a panel survey consisting of appro- ximately 20,000 households which are interviewed every four months for a period of 2 1/2 years. As SIPP progresses, new panels will be started every year which will allow cross-sectional an- alyses based on a total sample of approximately 35,000 households. At each interview, informa- tion on income, program participation, and other characteristics for each of the previous four months is obtained for each person. Persons who move during the life of the panel, are followed and interviewed at their new addresses. 2 Ques- tionnaire items include a "core" set of ques- tions which are repeated in each wave of inter- viewing. These items cover labor force parti- cipation, detailed income recipiency, and participation in government programs. For waves 2 through 8, the core items are updated and the questionnaire is expanded with addi- tional questions on items not covered in the core. Detailed questions concerning the amounts of personal and household asset and liabilities are included in wave 4 which is to be conducted in September through December 1984. These items will be updated one year later in wave 7. Asset and liability coverage is comprehensive. The SIPP design is expected to have a posi- tive effect on the ability to measure wealth. Research has found that a major source of bias in survey estimates of wealth is the nonre- porting of asset ownership [Ferber, 1982]. Additionally, there is some nonreporting of asset values. Several features of SIPP are expected to have positive effects on the repor- ting of ownership and value of assets [Radner and Vaughan, 1984]. First, asset ownership questions precede questions on asset values. Separating owner- ship and amount questions helps focus on iden- tification of asset holdings. In addition, the the relatively sensitive amount questions do not negatively impact on the reporting of as- set ownership. Second, the longitudinal nature of SIPP helps identify asset ownership. Asset owner- ship information is collected in each wave. In the initial interview, a set of detailed questions designed to identify ownership of income earning assets are asked for each person in the household. An asset roster is created and recorded in the control card. In subsequent interviews, the respondent's asset roster for the previous wave is pretranscribed to the current questionnaire. During an interview, the pretranscribed asset roster 1s checked for accuracy. Then questions are asked to determine whether any assets have been liquidated or whether any new ones have been acquired. With this procedure, relatively accurate asset ownership Information is obtained before re- spondents are asked about asset values and amounts of liabilities 1n wave 4. In longitudinal surveys, attrition, that is dropping out of the sample is of concern. While some respondents leave the sample, there is evidence to suggest that cooperation or rapport obtained in repeated interviews increase the reliability of financial data [Ferber and Frankel , 1978]. Furthermore, the longitudinal nature also provides the opportunity to gather information missed in a previous interview. If one interview is missed for the household or an individual respondent, a "Missing Wave" section is completed. In this section, a limited set of key questions are asked concern- ning labor force participation, income recipi- ency, and asset ownership during the missed wave. Third, ownership, income, and value of assets are asked for each individual by type of ownership. Information is gathered for assets held jointly with spouse and for assets held individually. There is some evidence that nonreporting of ownership is higher for assets held by one individual as compared to assets held jointly with others [Ferber and Frankel, 1978]. Asking respondents by type of ownership directly may tend to reduce differen- tial nonresponse between assets held in own name and jointly with others. In addition, collecting income and asset values by type of ownership rather than one total amount for an asset may tend to give more accurate income and asset value amounts. Because both asset income and asset value are collected, informa- tion about asset income can be used for asses- sing the reasonableness of the asset value data and for imputing missing values. Finally, "callback items" have been intro- duced for critical questions concerning asset values. Callback items are designed to reduce nonreporting of income and asset amounts. For selected items, when a respondent answers "don't know," the interviewer reads a statement on the importance of the information requested and the respondent is asked whether it would be possible to call back later for an estimate of the amount. For respondents agreeing to be called back, a "reminder card" is provided to the respondent with the requested information checked. The interviewer telephones the respon- dent at an agreed upon time to obtain the missing information which is entered in a section of the questionnaire reserved for callback amounts. The impact of the callback system is to reduce the incidence of missing information. Since callbacks involve added respondent and interviewer burden, only a limited number of items can be identified as callback items. To supplement the callback procedure, special instructions have been included for other important questions which instruct interviewers to probe for an estimate before accepting a "don't know" response. In this way, interviewers are alerted to key items for which they should give special effort to obtain estimates. An important issue for surveys which mea- sure wealth is population coverage. Evidence suggests that wealth holdings are concentrated. Studies have estimated that the top 5 percent of household hold approximately 30 to 50 per- cent of net wealth [Wolff, 1983; Greenwood, 1983; and Smith, 1983]. In addition, holdings of certain assets such as stocks are even more highly concentrated with the top 5 percent holding approximately 70 percent of this asset. As a result, the normal SIPP area frame sample 16 has a limited coverage of the top wealthholders. This problem was noted by the Census Advisory Committee on Population Statistics which recom- mended that "the Bureau explore the opportunity to augment the SIPP sample design every 5 years to oversample the upper end of the income and wealth distribution, where special effort would be directed to producing a reliable indication of the entire distribution of wealth" [U.S. Bureau of the Census, 1983]. The Census Bureau has done some preliminary work to explore the methods which might be used to obtain data files in which top wealth- holders are adequately represented. One possi- bility is to use IRS data files to develop a list frame sample of top income holders. An alternative approach is to use estate tax data to estimate wealth distribution of top wealthholders. Using estate tax data in con- junction with SIPP data has the potential to improve population coverage. At this time, however, the work in this area is very preli- minary. In any case, the probable limitations of SIPP wealth estimates and the need to im- prove population coverage have been recognized by the Census Bureau and the work to improve the estimates continues. III. Data and Results A. Nonresponse Rates~for Asset Ownership Ownership of assets is established in the first wave of SIPP and an asset roster is constructed for each respondent. This roster is verified and updated in subsequent waves. The amounts of assets held are not systemati- cally covered until the fourth wave topical module. In Table 1, the level of missing information on asset ownership are shown. Respondents can answer that they "don't know" if they own a specific asset type or they can refuse to answer. The nonresponse rate for all asset types is low at 1.4 percent of all persons asked about asset ownership. For specific asset types, the rates differ and range from 0.9 percent for rental property and royalties to 2.2 percent for certificates of deposit. 4 When the nonresponse rates are decomposed, the refusal rates are generally higher than the "don't know" rates. This result is true of self respondents. In general, self respondents have higher refusal than "don't know" rates. However, as would be expected, the frequency of "don't know" re- sponses for asset ownership is greater for proxy respondents which range from 0.3 to 1.7 percent, as compared to self respondents which range from 0.1 to 0.3 percent. The refusal rates for both types of respondents were simi- lar (ranging from 0.7 to 1.5 percent). As a result of higher "don't know" rates, proxy respondents had somewhat higher overall nonre- sponse rates. For both types of respondents, however, the absolute level of the nonresponse rates are low. To examine the asset ownership information further, nonresponse rates by demographic and socioeconomic characteristics are shown in Table 2. Several patterns emerge. Nonre- sponse rates for each asset type are appro- ximately the same between sex or race groups. Some of the rates, however, are significantly different by age and education levels. The nonresponse rates for each asset type increase with the age of the respondent. For respon- dents in the less than 25 and 25 to 34 age groups, nonresponse rates range between 0.3 to 1.2 percent; for the 35 to 44 age group, rates range from 0.9 to 2.0; for the '45 to 54 and 54 to 64 age groups, rates range from 1.1 to 2.8; and for the 65 and older age group, rates range from 1.3 to 3.5 percent. While the rates for the oldest age group are signifi- cantly higher than for the youngest group for each asset type, it should be noted that even the highest rates (3.4 and 3.5 percent for money market deposit accounts and certifi- cates of deposit, respectively) are relatively low. Nonresponse rates also differ by education level. The nonresponse rates for ownership of several asset types is higher for college graduates than for respondents who did not complete high school or who completed some college. Nonresponse rates for college gradu- ates range from 1.3 to 2.6 percent, while the rates for respondents with less than high school or some college education levels range from 0.5 to 1.8 percent. Differences in rates between these groups are significant for each detailed asset types, except saving accounts and interest earning checking accounts. B. Asset Ownership Patterns Analyzing the SIPP asset ownership results, several patterns emerge. The frequency of asset ownership is shown in Table 3. The most frequently held assets were savings accounts and home ownership with 65.7 and 64.1 percent of households reporting ownership, respec- tively. The ownership of the remaining assets was reported by approximately 20 percent of households, with rental properties and royal- ties reported by 10 percent or less of house- holds. Of interest are the asset types which became newly available after 1982 as a result of deregulation in the banking industry. Approximately 23 percent of the households reported ownership of one or more interest earning checking accounts, while 17 percent of the households reported ownership of money market deposit accounts. The SIPP wave 1 results for selected assets are compared to asset ownership data from other surveys in Table 4. In general, the ownership estimates obtained in wave 1 of SIPP are similar to estimates derived from other sources. The only major difference occurs in savings accounts. The Consumer Credit Survey and the ISDP results show approximately 75 percent of households reporting ownership of savings accounts, while SIPP results show approximately 66 percent of the households owning such accounts. A plausible explanation is that savings accounts are highly substi- tutable with the newly available assets. Individuals have an incentive to switch from savings accounts to new accounts for the greater liquidity available with interest earning checking accounts and for the higher interest rates available with money market deposit accounts. The result is a negative impact on the percent of households owning 17 savings accounts. C. Asset Amount Response An important feature of the SIPP design is to have income recipiency and asset owner- ship questions precede income and assest a- mounts questions. Once income recipiency is established for a respondent, questions on amount of income are asked by income type. Income from assets, e.g. interest, dividends and rental income, are covered in each wave. This section focuses on the reporting of pro- perty income and asset amounts in the first wave of SIPP. The questions on asset income are divided by type of ownership, that is, assets held jointly with spouse, and assets in own name or held jointly with others. Persons identified as holding an asset are asked questions con- cerning the amount of income received. If respondents cannot provide the amount of in- terest earned, the average balances of accounts are asked in order to impute the interest income received. A callback is provided for the balance amount if the person does not know the amount at the time of the interview but can provide an estimate later. Unlike the ownership items, questions on interest amounts refer to grouped assets. In particular, assets held with financial institutions (savings accounts, money market deposit accounts, certi- ficates of deposit, and interest earning check- ing accounts) are grouped together and the total interest earned is asked. Similarily, other interest earning assets (money market funds, U.S. Government securities, municipal or corporate bonds, and other interest earning assets) are grouped and interest income from those sources is covered. The reporting of interest income is sum- marized in Table 5. On the average, two-thirds of respondents reported an amount of interest earned. Approximately 15 to 20 percent of respondents did not know the interest income amount, but provided an estimate of the total balance in the accounts. In general, over 80 percent of respondents gave the amount of interest or the balance in the accounts. Less than 11 percent did not know the amount of interest and did not provide a balance amount. Only 5 to 8 percent of respondents refused to report the interest income. These patterns of reporting did not differ by type of ownership, that is, joint with spouse versus in own name or with others. However, the frequency of persons who refused or did not provide an interest/balance amount was lower on average for assets at financial institutions (11.7 percent) than other interest earning assets (17.1 percent). Of interest are the cases which did not know the amount of interest earned and which were asked the average balance held for imputation purposes. Over 5,900 respondents were asked the balance of accounts held with financial institutions, and 455 were asked the balance of other interest earning accounts. For assets held with financial institutions, approximately 78 percent of the respondents which reported they did not know the interest income earned were resolved using the average balance reported; for other interest earning assets, 67 percent of the "don't know" cases reported a balance amount. Only 6 percent of the respondents refused to report an amount. IV. Conclusion The SIPP will provide annual estimates of wealth. Information on asset and liabilities is useful for many types of analyses, including program eligibility simulation studies and measurements of the distribution of wealth. In this paper, significant features of the survey were presented along with some prelimi- nary results from SIPP. Several design features of SIPP, including longitudinal nature of the survey design, separating asset ownership and asset value questions, updating asset roster each interview, callbacks, probe instructions, and a missing wave section, are likely to have a positive effect on the reporting of asset ownership. In addition, asset and liability coverage is comprehensive. Results from the first wave of SIPP show that nonresponse rates for asset ownership are low. In addition, frequency of ownership patterns are reasonable and the results are comparable to findings from other surveys. Preliminary results also show some indication of improve- ments in nonresponse rates for items on asset amounts. FOOTNOTES 1 When comparing CPS income aggregates to independent benchmark estimates, CPS captures 97.4 percent of independent wage and salary totals. However, the CPS only covers 41.5 and 44.1 percent of interest and dividend totals, 69.4 percent of SSI totals, and 42.3 percent of worker's compensation totals. 2 Persons who move individually or in groups are followed if they relocate within 100 miles of any SIPP PSU. Persons who move into institutions are not interviewed if they are institutionalized. 3 Home equity, automobiles, and life insurance information were collected in wave 2, pri- marily to evaluate government program eligi- bility. 4 Significance tests were performed on the differences between rates using the formula (Pl-P 2 ) Pl(l-Pl)^ Nl P 2 (l-P2) N 2 where ?\ and ?2 are the proportion of nonre- sponses, F\ and F2 are sample design factors, and Nj and Ng are the number of sample cases for each proportion. Differences noted in the text are significant at the 95 percent confidence level. REFERENCES [I] David, Martin H., "Technical , Conceptual, and Administrative Lessons of the Income Survey Development Program (ISDP)-Papers Presented at a Conference," Social Science Research Council, Washington, D.C., 1982. [2] Ferber, Robert, John Forsythe, Harold Guthrie, and E. Scott Maynes, "Validation of Consumer Financial Characteristics: Common Stock," Journal of the American Statistical Association , Volume 64, Number 326, June 1969, pp. 415-432. [3] Ferber, Robert and Matilda Frankel , "Evaluation of the Reliability of the Net Worth Data in the 1979 Panel: Asset Ownership on Wave 1," Survey Research Laboratory, University of Illinois, August 1981. [4] Ferber, Robert and Matilda Frankel, "The Collection, Measurement, and Evaluation of Savings Account Reports," Survey Research Laboratory, University of Illinois, March 1978. [5] Greenwood, Daphne, "An Estimation of U.S. Family Wealth and Its Distribution From Microdata, 1973," The Review of Income and Wealth , Series 29, Number 1, March 1983. [6] Nelson, Dawn, David McMillen, and Daniel Kasprzyk, "Overview of the Survey of Income and Program Participation," U.S. Bureau of the Census, December 1983. [7] Pearl, Robert and Matilda Frankel, "Composition of the Personal Wealth of American Households at the Start of the Eighties," Survey Research Laboratory, University of Illinois, presented at the American Economic Association meetings, December 1981. [8] Pearl, Robert, Matilda Frankel, and Richard Williams, "The Effects of Missing Information on the Reliability of Net Worth Data From the 1979 ISDP Research Panel," Survey Research Laboratory, University of Illinois, May 1982. [9] Projector, Dorothy S. and Gertrude S. Weiss, "Survey of Financial Character- istics of Consumers," Board of Governors of the Federal Reserve System, August 1966. [10] Radner, Daniel B. and Denton R. Vaughan, "The Joint Distribution of Wealth and Income For Age Groups, 1979," Social Security Administration, Office of Re- search and Statistics, Working Paper 33, March 1984. [II] Schwartz, Marvin, "Trends in Personal Wealth, 1976-1981," Statistics of Income Bulletin , Volume 3, Number 1, Summer 1983. [12] Smith, James D., "Recent Trends in the Distribution of Wealth: Data, Research Problems and Prospects," presented at C.V. Starr Center Conference on Inter- national Comparisons of the Distribution of Household Wealth, New York University, November 1983. [13] Smith, James D. and Stanton Calvert, "Estimating the Wealth of Top Wealth- Holders From Estate Tax Returns," 1965 Proceedings of the Business and Economic Statistics Section , American Statistical Association, pp. 248-265. [14] U.S. Bureau of the Census, "Minutes and Report of Committee Recommendations, Census Advisory Committee on Population Statistics," April 8, 1983. [15] U.S. Bureau of the Census, "Money Income of Families and Persons in theiUnited States: 1980, " Current Population Reports , Series P-60, No. 132, U.S. Government Printing Office, Washington, D.C., 1982. [16] Vaughan, Denton R., T. Cameron Whiteman, and Charles A. Lininger, "The Quality of Income and Program Data in the 1979 ISDP Research Panel: Some Preliminary Find- ings," presented at the Social Science Workshop on the Survey of Income and Program Participation, Baltimore, Mary- land, December 1983. [17] Wolff, Edward N., "The Size Distribution of Household Disposable Wealth in the United States," The Review of Income and Wealth , Series 29, Number 2, June 1983, pp. 125-146. [18] Yeas, Martynas A. and Charles A. Lininger "The Income Survey Development Program: Design Features and Initial Findings - Paper 1," presented at Social Science Research Council Conference, Washington, D.C., October 1982. 4 - Jz o ujor a- in n^ o cnoooooo ^h rH -H ~h oooo cni— t t"~l N"\ N*i CNJ CNIH — ct>h cnCM cm n'i ^ oo oor«^r^f»*. O-H H .-H ^H O OOOO ocr> us r^ in o"> oov\.3"cr> HO J» ^ ^ o OOOO Ol.— t OO O SO r»* iDOfHID HCM CM W N H -H.— l~H«— I -H OOOO OO ID .3- JX^HCMITV CD O O OOOO a-r-^ —i cni on a- csicr>cr>N^ ~H.— » CM CNI H -H .— IOO~H z o o 3 O O K O CO O UI < — o — t- ~ zz< _ia. uj z cos co — to co * — — < — ~ O OU. OUU ZU3JI- Z>G.»Q.KU10:a!^H zq cca i-(_) zaJ oe z > x -C/5Z OO Q£9£C a. (3 < z s: — CO 19 — < CO (9 Z (- O CE < ce t— ui en Z Q. UO o as ec riiTO^HrOtx* )K3d CM > O K 3 Lf\r^-3-OOi— llO t— O «/> z < x oj ~ uj u «q ocsiaiunirio (O-HCSKM z a: © c o < a. i a»oofNOOir O^HCNICNICNirO 0OCNIN-»3" r-tCNI — ISM r^io ioo^ct> oo— Hcviesicsi W\K\ LT*CNir»* tor«».3-0">OLn OtOCT^O ~*^OCM ii^ jt into i- a o o o o z oa- or\i EE O <- ui a o VOP«v »o LOCT) k z a. LOtO ■_Ol0 1010 z — UJ — ce _, oo oo OO < oo oo oo o OO oo oo 1 oo • to • oo • z • z • z • o o o — CO — to — to 3 UJ o uj =3 UJ I- CO — to — to ST t- < < H- < OO to 00 z o z cs Z (9 — z — z _r — z O 3C -1 z Q- _l z o -1 z < CC 1/5 < ce < ce. a. — < — < V U UJ __" U UJ U UJ z z < »- < H- __. < 1- t- z to 3= z to z to __iu.tr u. ce to u. ce «_r _j e_ U) *on©©ON-< OOOP>4«I<1 -• W O 00 00 -> - 3000000000 o ooo inoo-ivO-to-»- xoo" ■nn^»OKio^*<>» -» o ... -, imai»sHnoifl oo uj 40ic» o<» *■> 3 ID cr ■ I 0--H-H-H(D<«IT3 o--H-HtH00<»JT3"4» cr-rt-H-H0><4l-O (dU3'H«i -h oj a u 3 h » hu ez uo3~4« .cqj«-HO-H0J'C3 W,u0C.3sEb>.C|-.G U4J8CjZcoXt,Cc« buBB^JzaiXkiU O'HO'rt 3 4J 3 0) O^O-H 34J30ia»-H O ^ 5 -H 3 U 3 II 3 «xtx ® o a to 3 txh A o a n -o -a 3 a x u, psoq a 2 £ £ So> SS. 5 2£ I! c 3 O 60 5£ « e Ut JSJ onNinonN-ivo ooooooooo o ouo- o o f^ vo o £ ^OOOOOOOOO O O -« O r*. r u-> c* o — n0O44 4 »o sONOOff.- JOOOOOOOOO O OOi/ Ju-iri\D-»irir>»n-*o en Jd ° " H U OS |ft .13 55 .13 l! h 1 |§m1ISj uu(uoia£ C 30 USING SUBJECTIVE ASSESSMENTS OF INCOME TO ESTIMATE FAMILY EQUIVALENCE SCALES: A REPORT ON WORK IN PROGRESS* Denton R. Vaughan Social Security Administration This paper will present a short discussion of the estimation of family equivalence scales based on Individuals' assessments of their own current Income level or their subjective judgments about the income required to provide specified levels of satisfaction or to sustain a given standard of living. l A scale estimated on the basis of Individuals' subjective assessments of their current income is presented and the results are discussed in terms of scales obtained by "traditional" methods and other "subjective-based" scales. Techniques for establishing equivalence. Traditional approaches. — A number of techniques have been used to estimate family size equivalence scales. Until recently, most approaches have relied either on some manifestation of consumer behavior that is interpreted as a welfare indicator or on scientific or quasi-scientific dietary standards. The general class of Engel curve analyses falls into the first category and the most frequently used welfare indicator has been the percentage of income or expenditures devoted to food as affected by family size. As food has become a less dominant element of family budgets in Western Europe and the United States, and therefore less representative of the overall material standard of living of the general population, Engel curve analyses have been extended to broader sets of consumption goods such as necessities (e.g., Sydenstriker and King 1921; Prais and Houthakker 1955; Watts 1967; Seneca and Taussig 1971). Other Engel curve applications have made use of adult consumption goods such as alcohol and tobacco (Nicholson 1949, 1976) or have analyzed variation in the percentage of income devoted to savings (U.S. Department of Labor 1948). More recently, economists have attempted to define equivalence on the basis of demand equations estimated for the full set of consumption goods. In this approach, equivalence estimates are derived from variations in demand that are associated with family size and composition differences (Kakwani 1977; van der Gaag and Smolensky 1982; Danziger, van der Gaag, Smolensky and Taussig 1984). Equivalence scales based on dietary adequacy take as a point of departure a recognized nutritional standard, such as provided by the U.S. National Research Council (NRC), and translate the standard into a per capita food plan appropriate to a given level of living. Since nutritional standards such as the one developed by the NRC incorporate variations by age and sex, their translation into food plans necessarily captures family composition effects (Peterkin 1976). By taking into account the interaction between per capita income, per capita food plan costs and the nutritional quality of diet (by family size), such dietary equivalence scales are said to capture economies of scale as well as compositional effects (e.g. Kerr and Peterkin 1976, p. 72-73). The equivalence scale incorporated in the official Federal poverty line is based in large part on an earlier version of the U.S. Department of Agriculture scale described by Peterkin. However, ad hoc adjustments were made to the ratios for families of one and two to account for the supposed higher fixed costs of operating small households (Orshansky 1965, p. 9). Direct subjective measures . Beginning in the early 1970's, the economist van Praag and his colleagues at Leyden University in the Netherlands (van Praag 1971; van Praag and Kapteyn 1973; Kapteyn and van Praag 1976; Goedhart , Halberstadt, Kapteyn and van Praag 1977; and van Herwaarden, Kapteyn and van Praag 1977) and sociologists in the United States (Rainwater 1973 and 1974; Vaughan and Lancaster 1979; Dubnoff, Vaughan and Lancaster 1981) initiated the use of individuals' perceived income needs to estimate family equivalence scales. By the early 1980 's a few additional U.S. social scientists had begun to exploit the "subjective" approach using newly available U.S. data bases (Colasanto, Kapteyn and van der Gaag 1984; Danziger, van der Gaag, Taussig and Smolensky 1984) in' conjunction with Dutch economists then resident in the United States. Actually, from a theoretical perspective the use of the term "subjective" to differentiate this class of measures from more orthodox approaches based on consumer behavior, and most particularly consumption choices, is somewhat misleading. In economics, behavior (consumer choice) is generally considered to be revelatory of subjective states, i.e., preferences are "revealed" by behavior, and much of the theorizing underlying the study of consumer behavior and welfare economics relies on primitive notions of subjective states. In the context of constraints, it is these states which essentially motivate, if not determine, behavior. So in an important sense what is novel about using subjective assessments to make inferences about preferences, utility, or welfare is not that in being subjective they are somehow set apart from the concerns of economists. Rather it is that the orthodox economist is more familiar, and therefore more comfortable, with employing behavior as an indicator of the underlying construct of welfare than with using the direct verbal representations of those constructs or states (Muellbauer 1980, p. 153; Pollak and Wales 1979, p. 219). Weighing in on the other side of the issue, however, Sen (1982, pp. 71- 72) has noted that the "idea that behavior is the one real source of information [about preferences] is extremely limiting for empirical work and is not easy to justify in terms of the methodological requirements of our discipline [viz. economics]." In any case, in the present context I will use the phrase direct subjective measures to at once highlight the theoretical relationship between this class of measures and those employed within a "revealed preference" framework while correctly stressing that they are based on consumers' perceptions or subjective assessments rather than on consumer behavior. Types of direct subjective measures . There are three basic types of direct subjective measures that have been used to estimate family equivalence scales. They are: 1) The income-subjective welfare metric . — This approach was pioneered and systematically exploited by van Praag and his associates (van Praag 1968, 1971; van Praag and Kapteyn 1973). but was also briefly experimented with by Andrews and Withey (1976, pp. 228-229, 378). Respondents are asked a 31 series of questions which permit estimation of the function which relates income to the welfare or utility derived from income as perceived by the individual. The parameters of the estimated function may then be used in a variety of applications, such as the construction of family size equivalence scales. 2) The Income-living level approach . This technique relies on respondents' estimates of the money income amounts required to attain a given level of living (getting along, living comfortably, etc). Systematic variation in the subjective income requirements associated with the various levels of living is used to model differences in underlying needs and to construct equivalence scales. Rainwater (1974) was the first to obtain estimates for a range of alternative levels simultaneously. More recently the Expert Committee on Family Budget Revisions, convened by the Bureau of Labor Statistics, recommended a program of research to explore the feasibility of estimating a set of living level standards at the national level (Watts 1980). Research funded independently by the U.S. National Science Foundation (see Dubnoff and Strate 1984) is pursuing these and related issues. 3) Income rating scales. This method relies on obtaining the respondent's rating of his or her current income using a set of ranked categories (such as from "delighted" to "terrible") believed to extend over most, if not all, of the continuum of satisfaction or welfare. The sensitivity of such responses to family size was first demonstrated in 1979 (Vaughan and Lancaster) and the first equivalence scales per se were estimated using current income rating measures in 1981 (Dubnoff, Vaughan and Lancaster). This class of measures may be thought of as a subset of the income/welfare metric approach (Vaughan and Lancaster 1980). Estimation of equivalence scales using Income rating measures : An example. This section of the paper reports on the preliminary stages of a project undertaken to evaluate the usefulness of subjective assessments for comparing differences in the level of economic welfare experienced by significant subgroups of the population. Developing equivalence estimates consti- tutes a major segment of this project. A number of the questions which will be addressed in the course of this work cannot be dealt with using the data set currently available, so the purpose of the current exercise is to give the reader the flavor of how one might address such issues on the basis of income rating scales. The scale estimation procedure. The general approach is similar to that first employed by Dubnoff, Lancaster and the current author in an earlier paper (Dubnoff, Vaughan and Lancaster 1981). Essentially, a measure of satisfaction with current income is regressed on income, family size, and a set of additional variables considered to be significant. The estimating equation is of the form: ■■ 3o + t lnFS + S,lnY • S y - satisfaction with income, recoded to represent a zero to one welfare con- tinuum, FS = family size, Y » monthly family income gross of tax, C ■ a set of (k - 2) control variables, O thru 0* are the regression coefficients and € is the disturbance term assumed to have the required characteristics. The control variables in the present context are entered as dummy variables and include, in this initial specification (note variable abbreviations in parentheses): 2 * Financial situation in 1979 judged better than in 1978 (TB), * Financial situation in 1979 judged worse than in 1978 (TW), * Financial situation in 1980 expected to be better than in 1979 (NB), * Financial situation in 1980 expected to be worse than In 1979 (NW) , and * Family reference person is age 65 or over (AGE). Depending on theoretical considerations and the reasonableness of the results these variables may be directly Included in the scale, for example, differentiating the scale by age of head, or may be used to control for effects that one does not want the scale to reflect. More information about the satisfaction measure and the income and the control variables is available from the author on request. 1 Estimation results. Based on data from the fifth and sixth waves of the 1979 Research Panel of the Income Survey Development Program (ISDP) (see Yeas and Lininger 1981), (1) is estimated as 0.1139TW - 0.0276NB - 0.0508NW + 0.0763AGE (21.01) (-4.23) (-9.53) (11.83) (R* = .263. N = 5,067) This rather simple model explains a little more than 25 percent of the variation in satisfaction with income. All variables in the model are statistically significant at conventional levels (t- values are in parentheses). As expected, income is positively related to satisfaction with income and the effect of family size is negative; holding income constant, satisfaction with income decreases as family size increases. The age coefficient has the expected sign and indicates that, controlling for the other variables in the model, the aged require less income to reach a given level of satisfaction than the non-aged. However, as we shall see, the magnitude of the age effect is much larger than seems reasonable. Other things being equal, individuals who felt that their financial situation in 1979 was better than 1978 are more satisfied than others, while those who felt that their situation in 1979 was worse than in 1978 derive a great deal less satisfaction from a given level of income. Those who expected their financial situation to change in 1980, whether for better or worse, are considerably less satisfied than others at similar income levels. In fact, respondents 'subjective assessments of financial change (retrospective for 1979 compared to 1978 and prospective for 1980 compared to 1979) are more important in accounting for variation in satisfaction with income than current income and family size. > Deriving the scales . The purpose of an equivalence scale is to establish the relative incomes necessary to provide equal "satisfaction", "welfare", or "utility" for fa«ilies in differing situations. In the present context we wish to construct a scale taking into account family size and age of head and will use non-aged families of size four as the reference family. Restricting ourselves for the moment to simply family size effects among the non-aged, we need to know the ratios of income variable (the exponentiated product of the variable and its coefficient divided by the coefficient of income) . The equivalence scales for non-aged and aged families, taken directly from equations (6) and (7), are shown below. Note that the values for the aged are too low to be given much credence. This point will be elaborated on further in the later sections of the paper. Age of reference per; Under 65 65 or over such that satisfaction with income (Sy) is the same for families of size one to three and five or more as for families of size four, that is: Using equation (1), less the term representing the set of control variables which will be reintroduced below (see (7)), ■■ |8 + P.lnFSj + 3 B lnYj One 61 Two 78 Three 90 Four 100 Five 108 Six 116 Seven 122 Eight 128 Nine 134 Nine or more 4 143 ( — ) - Not calculated. 'Based on an average family size of 10.9 for the nine and over group. Combining (3) and (4) as simplifying and arranging the family size on opposing sides, suggested by (2), terms for income and and dividing through ,(lnFS t - InFSj) Finally, taking the antilog of each side gives or the ratios of income necessary to leave families of size "j" as satisfied with their incomes as are families of size four. Introduction of an additional variable or variables can be accomplished by inserting the terms 3 C, + + 3 k Cn., from (1) in (3). The result, after the simplification and rearrangement represented by steps (5) and (6), is ,.(4-) •ra the equivalence ratio net of the effect of a given control variable(s) divided by the effect of that Results for the non-aged scale compared with other approaches. Studies using direct subjective measures .-- Eight scales estimated using direct subjective measures are shown in table 1, together with the scale first presented in this paper. Six different data sets are involved, and all three types of direct subjective measures are represented: living level measures (6 scales); income metric measures (1 scale), and current income rating measures (2 scales) . * Of course in some sense the scales are not fully comparable. Important differences in study design and implementation cannot be controlled for. Thus, it would be unrealistic to expect complete uniformity In the various scales. However, there are Important commonalities among the studies. For example, all scales but [7], for which age effects are controlled, pertain specifically to units headed by an Individual under age 65, and all but [8], which employed family-size dummies, used a natural log specification for unit size. The key attribute of these scales is their steepness. The formal way to assess their "steepness" is to compare their income-unit size elasticities, that is, the percentage change in income required to compensate for a given percentage difference in family size. 8 The elasticities for all nine of the scales are given at the bottom of table 1. Despite all the differences among the studies, there does seem to be considerable uniformity among the scales, as the elasticities for eight of the nine range in a relatively narrow band of .31 to .39. The elasticity for the "steepest" scale is .47, indicating that it is on the order of 20 - 50 percent "steeper" than the others. 8 To give the flavor of what these income-unit size elasticities mean in practical terms, the eight "flatter" scales indicate that roughly 25 - 30 percent more income is required to maintain equivalent well-being as family size doubles (as 33 3 O 9) S >> o OS > (I 1 -O > i-i (t»01O O lit N00OOSIOHC 3 O O IH rt C J t- o it ^ O id t- eo h t 9 o> © o> t- ^r o 5 t- 00 O O ' «OrlO»«C (OOOIOOHf -M « * <-> U 3 C •o 2 « S C V CO "O tt) « S bt 0) +J E CO .C 3 -1 W O 1 00 TJ B) TJ 5 £ (B O ( s 5 fc. •- t 34 from 2 to 4 or 4 to 8 persons). For the BNS minimum income scale, about forty percent more income is required to yield an equivalent level of welfare across such family size differences. In sun, this preliminary version of an ISDP income satisfaction scale appears to be quite similar to other "subjective" scales estimated to date for the U.S. 7 There also appears to be at least some regularity amomg the scales estimated so far on the basis of direct subjective assessments. How do these "subjective" scales compare to those derived using traditional approaches? Comparisons with scales based on dietary need and consumer behavior . — Seven different "objective" scales are shown in table 2. Three widely used scales are shown in the first three columns of the table. The Thrifty Food Plan, developed by the USDA (col. 2), is a dietary adequacy scale and represents the ratio of food plan costs necessary to maintain equal nutritional adequacy by family size (Kerr and Peterkin 1976; Peterkin 1976). The Federal Poverty Line scale (col. 1) is largely based on an earlier version of the USDA Thrifty Food Plan, the Economy Food Plan (Orshansky 1965). As reflected in the poverty line scale values for units size five and above, it was slightly less steep than the Thrifty Plan. However, the most pronounced differences between the food plan scale and the poverty scale are for two, and more especially one-person, units. As noted earlier these differences stem from intuitive adjustments Bade to the poverty scale in order to account for supposed "diseconomies" experienced by small units in the consumption of nonfood items. The BLS scale (col. 3) was derived by estimating the total expenditure levels at which the percent of total expenditures devoted to food remained constant by family size, i.e. it is a classic Engel type scale (Jackson 1968). Interest- ingly, it is very similar to the dietary adequacy scale up through families of size five. The ISO-prop scales shown in columns 4 and 5 are also Engel scales, i.e., they were estimated by holding constant expenditures for food or neces- sities as a percentage of total consumption expenditures for families of different size (Watts 1967). The Iso-prop food scale is quite similar to the Thrifty Food Plan and BLS Family Budget scales. The necessities scale differs from the other four scales in that it is noticeably less steep. The food-based scales indicate that a family of eight would need about seventy percent more income than a four-person family to be equally well-off. The broader based necessities scale suggests that the family of eight would need only about forty percent more. At family sizes below four, of course, the necessities scale is appreciably more compressed than the three food based scales. Because of the ad hoc adjustments incorporated into the poverty line scale for units size one and two, at the lower end the Iso-prop necessities scale and the poverty scale are roughly similar to each other, suggesting, perhaps, the essential reasonableness of those adjustments. The final two scales, estimated using the extended linear expenditure system (ELES) on the basis of all categories of current consumption expenditures, are flatter even than the iso-prop necessities scale (van der Gaag and Smolensky 1982, Danziger, van der Gaag, Taussig and Smolensky 1984). The contrast between the ELES scales and the food based scales is, quite naturally, even more marked. The comparison, then, of these different types of -Selected non-aged equivalence scales estimated using dietary standards, food share, or more broadly based definitions of consumption Federal Thrifty BLS Poverty Food Family Line Plan Budget 6 [1] [2] [3] Iso-prop Index* One Two Three Four Five Six Seven Eight Income-unit size Elasticity 6 . . . 50 33 d 'Scale is for poverty level incomes and is net of age effects rather than for the non-aged per se . b Weighted using ISDP population counts. c Six or more. d Not available. "See note (c) table 1 . 'Estimated using scale values for units of size 1-5 only. S M on® «c -l © 0)0 O 8- Z 0) CO TJ 4-> -rt C objectively based scales suggests that as one moves away from scales based on food (whether nominally grounded in dietary adequacy or an Engel technique) to scales based on more inclusive categories of consumption, family equivalence scales tend to flatten considerably; perhaps to a degree, represented by the ELES scales, that some observers would find hard to accept (Deaton and Muellbauer 1980, pp. 201-206). Nonetheless, the pattern presented is rather marked and unambiguous. Turning now to a comparison of the direct subjective scales to this set of "objective" scales, it is immediately obvious that the subjective scales bear little resemblance to the food-based scales shown in table 2. In fact, with the exception of the BNS minimum income scale, the subjective scales look quite similar to the ELES scales, the flattest in the objective category. The BNS scale, on the other hand, is virtually indis- tinguishable from the Iso-prop necessities scale. However, since there doesn't seem to be any readily identifiable substantive basis for its greater "steepness", this doesn't seem to be especially noteworthy. • Equivalence and the aged One of the main issues which we will be addressing in this project concerns the economic needs of the aged both in comparison to the non-aged and among major subgroups of the aged population itself. Table 3 contains information from ten different scales which distinguish units with heads age 65 or older. Four of the scales were derived using objective measures (columns 1-4) and six, including estimates from the sixth wave ISDP income satisfaction item, are based on direct subjective measures (columns 5-10). There are three important questions in the area of aged equivalence which are addressed by information in the table: 1) are the needs of the aged systematically different from those of the non-aged? 2) How do the needs of aged one and two-person units compare? 3) when living alone, i.e., in single- person units, are the needs of aged men and women different? The table is divided into two parts. The first three rows of the table deal with the relative needs of one and two-person units and males and females indexed to the needs of a non-aged unit of size 2 = 100. Rows 2 and 3 show the equivalence ratios for males and females living alone expressed as a percentage of a non-aged two-person unit. Row 1 gives the average for single-person aged units weighted on the basis of the numbers of aged men and women living alone in the noninstitutional population. The information in rows one to three is then re-expressed in part II of the table (the last three rows) using aged units of size two = 100 as a base. Finally, since the base for the percentages in part one of the table is non-aged units of size two = 100, the row for units of size two (row 4) isolates the overall "age" effect.* Turning to the first issue, what do these scales imply about the needs of the aged compared to the non-aged? It Is often asserted that the aged require less than the non-aged to sustain a comparable level of living (see, for example Danziger, van der Gaag, Smolensky and Taussig 1984). They are able to maintain equivalent nutritional status at lower levels of food consumption, they often live in fully paid for owner-occupied housing, no longer experience the expenses associated with working and, if comparisons are made on the basis of before-tax income, generally a smaller portion of the income of the aged is devoted to taxes. On the other hand, they may experience greater out-of- pocket health expenses, and concerns about economic security could produce a shift in the focus of economic preoccupations from consumption to savings, what do these ten equivalence scales Indicate regarding the relative needs of the aged and non- aged? Indeed, all but the BNS minimum income scale indicate that controlling for family size, the aged require less than the non-aged to attain an equivalent level of economic well-being. Since the standard error for the age coefficient underlying the BNS scale is virtually as large as the coefficient (Colasanto, Kapteyn and van der Gaag 1984, p. 134), no substantive importance need be attached to its failure to conform to the general picture of relatively lower material needs for aged families. Apart from this rather commonplace finding that the aged's "needs" are less than those of the non-aged, the diversity in the estimates is rather striking. It is probably noteworthy that the two "food-based" measures yield the highest aged/non- aged ratios (about 90). The two variants of the ELES scale, while lower than the food based scales, differ somewhat from each other. The version denoted as ELES I (van der Gaag and Smolensky 1982) indicates the aged require only 73 percent as much as the non-aged to be equally well-off, while ELES II (Danziger, van der Gaag, Smolensky and Taussig 1984) indicates that the aged's needs are 84 percent of those of the non-aged. With the exception of the BNS scale, which can be discounted because of the unreliability of the age dimension of the scale, the direct subjective scales are all at or below 75 for the aged. Three of the five are in the 64 to 75 range, while the other two, the 1972 SRC and 1979 ISDP income satisfaction scales, are considerably lower (50 and 27 respectively). In fact, the ISDP income satisfac- tion scale is too low to be taken seriously. Thus it would seem that the wide range presented by the estimates precludes any useful general statement about the relative needs of the aged and non-aged based on these studies. Apparently a good deal more care is needed to lay out the nature of the problem, to insure that variables are defined in a theoretically relevant manner and to take care that the models are correctly specified. The two remaining issues addressed in table 3, the possible impact of gender on needs and the comparative needs of one and two-person aged units, are closely related and central to aged size equivalence issues. This is because the vast majority of aged individuals live alone or with just one other person and approximately four-fifths of aged individuals living alone are women. Gender effects . — Five of the ten scales provide distinctions by sex of head as well as age. Four indicate that the needs of aged women living alone are less than those of aged men. In these four studies, estimates of women's needs vary from only very slightly less than men's (viz. the U.S. poverty thresholds as constituted prior to 1981) to substantially less (the two ELES scales and the ISDP minimum income scale). While the Wisconsin BNS minimum income scale Indicates that women's needs are greater, the standard error for the sex of head coefficient is very large in comparison with the coefficient (Colasanto, Kapteyn and van der Gaag 1984, p. 134). Some limited experimentation with universe an«f variable specifications with the ISDP data have produced coefficients suggesting greater needs for female heads also, and in some cases the effect was even statistically significant. However, given the universe definition employed in estimating the present version of the scale, sex of head was not statistically significant and, as noted earlier, was dropped from the model . Thus, notwithstanding the large gender effects shown in some of the scales, it seems we are some way from being able to demonstrate a consistent sex effect, much less reliably gauge its magnitude. And even if its presence could be firmly established, its precise significance would be far from clear because of the very low incomes of many aged women . One versus two-person units . — Given the problematic nature of gender effects, the question of one-person/ two-person unit equivalence among the aged is necessarily somewhat clouded. The one to two-person ratios for the ten scales are distributed over a considerable range (see row 5 of the table). Six of the scales indicate that the needs of an aged person living alone are between 70 to 79 percent of an aged couple's needs. Indeed, five oj the ten scales yield one-person/two-person ratios in the range of 76-79. The value for the Federal poverty line (79) places it in this group of five. However, the ELES I and BLS scales suggest a nearly per capita relationship (48 and 55). The ELES II and ISDP minimum income scales are also noticeably below the group of six (61 and 63 respectively). The sex effect is evident especially in the two ELES scales and the ISDP minimum income scale and would appear to be largely responsible for the fact that the overall one-person/two-person ratios for these scales are so low. Again, a good deal more work is needed to understand the basis of such widely varying estimates. Some elaborations of the initial model . Further exploration of the age effect and the specification of the family size variables is obviously called for. While in no sense definitive, a minimal effort was possible after return from the meetings. Some interesting results were obtained and will be reported on. Four basic modifications were made to the naive model that we estimated Initially: 1) an Interaction term was added for family size and income, 2) a dummy variable was introduced for one- person aged units, 3) a substantially expanded set of control variables thought to be related to aged/non-aged differences was added, and 4) a crude approximation of after-tax income was substituted for the bef ore-tax variable. The family size/income interaction term was added principally to test if family size effects vary by income level. This possibility has been commonly noted and has received some empirical support (e.g. Watts 1967, Seneca and Taussig 1971, and Deaton 1982). Both theory and empirical evidence suggest that the addition of family members in better off families requires a smaller proportional increase in expenditures (e.g., Deaton 1982, p. 43). Thus, we might expect a size equivalence scale for low income families to be "steeper" than a scale for families with incomes in the middle or upper part of the distribution. The other modifications were made principally to find out something more about the nature of the age effect. The dummy variable for aged single-person units was added because aged singles confront very different conditions from aged couples. They tend to be older and have notably lower incomes. If their reduced circumstances were to be systematic- ally related to lowered aspiration levels, specifications assuming a constant age effect by unit size could prove distortive and might, in part, account for the excessively large age effect estimated from the original simple model. The remaining modifications were undertaken in order to see to what extent the age effect in the initial model could be explained by taking into account some of the factors which might contribute to lower bef ore-tax money income needs for the aged. Given the variables at hand on the analysis file, home tenure, number of earners, value of durables, and amount of other fungible assets (excluding own home) could be added to the initial model . As a matter of general interest, a dummy for work disability among the non-aged was also incorporated. Finally, a few runs were made to experiment with the Impact of using after-tax as opposed to before-tax income as a predictor variable. The results are given in table 4.'° Results for the non-aged. — The interaction term was included in two extensions of the naive model . as a sole addition to the initial variable set (Model II), and together with the dummy for aged units of size 1 (Model IV). The result is given in table 4 (Models II and IV). The coefficient for the interaction term has the expected sign in each instance. It is positive. Indicating that the lower the income level, the higher the elasticity of income with respect to family size and the steeper the resulting equivalence scale. However, the magnitude of the coefficient is affected by the other variables which were added to explore the age effect. For example, when the dummy variable for aged units of size one and the family size interaction term are both present (Model IV), the interaction effect is halved and the t-value drops from the margin of significance at the .05 level (single-tailed test) to well below conventional levels of significance (Model IV compared to Model II). This would seem to indicate that a good deal of the gross interaction affect is attributable to the special circumstances of one-person aged units. The impact of the interaction term on equivalence scales per se for the non-aged is shown in table 5. Coefficients from Model II (the interaction term only added to the initial set of variables) and Model IV (the interaction term and the dummy for aged units of size one both added to the initial variable set) were used to construct scales at four different reference income levels keyed to four- person families (the average income of poor families, the weighted poverty threshold income, median income and twice the median, all in terms of four-person family incomes). 11 The scales derived from the initial model and Model III (the initial model plus the term for aged units of size one) are included in the table for purposes of comparison. Clearly the elasticity of income with respect to family size varies noticeably by income level. The elasticity estimated from the initial model was .36. Using Model II, the average elasticity ranges between .49 for families living in poverty to .19 for those living at satisfaction levels experienced by four-person families with incomes at twice the median for four-person families. Estimates based on Model IV also reveal important 38 Table 4 . —Coefficients for the initial aodel and selected alternative specifications Selected Model coefficients* Initial II b 1 1 I c IV d V e VI f Intercept 1401 .1825 .1278 .1499 .1954 .1734 (6.24) (5.27) (5.39) (4.06) (7.70) (6.71) lnFS(0J -.0224 -.0791 -.0189 -.0474 -.0230 -.0283 (-5.01) (-2.03) (-4.08) (-1.26) (-4.77) (-6.04) lnYOj) 0628 .0565 .0640 .0608 .0481 .0578 (19.59) (11.71) (19.80) (11.39) (12.07) (15.08) lnFS*lnY(«) ... .0082 ... .0041 (1.61) (0.76) Perceived financial change, (1979 vs. 1978): BetterO,) 0715 .0715 .0711 .0711 .0752 .0755 (11.08) (11.08) (11.00) (11.01) (11.85) (11.63) Worse(0 4 ) -.1139 -.1138 -.1137 -.1136 -.1116 -.1141 (-21.01) (-21.00) (-20.96) (-20.96) (-20.92) (-20.88) ( Expected financial change, (1980 vs. 1979): Better(3 5 ) -.0286 -.0291 -.0283 -.0286 -.0231 -.0253 (-4.23) (-4.30) (-4.18) (-4.21) (-3.48) (-3.70) Worse(j3 6 ) -.0508 -.0507 -.0508 -.0508 -.0522 -.0499 (-9.53) (-9.52) (-9.53) (-9.52) (-9.98) (-9.29) Aged(0,) 0763 .0751 .0642 .0646 .0416 .0722 (11.83) (11.59) (8.35) (8.38) (2.84) (11.12) Aged, 1-person unitO,) ... ... .0320 .0293 .0278 (2.90) (2.53) (2.39) R* .263 .263 .264 .264 .297 .253 Number of Cases = 5,067 Note: t-values in parenthesis. 'Coefficients for the additional control variables incorporated in Model V not included in the table. See text and text footnote 10. b Initial aodel plus interaction tern for income and family size. c Initial aodel plus a duaay variable for aged single-person units. d Inltial aodel plus interaction term for income and family size and a dummy variable for aged single-person units, •initial aodel plus a duaay variable for aged single-person units, plus the additional set of control variables. f Initial aodel substituting after-tax for bef ore-tax income. 'Initial aodel substituting after-tax for before-tax income plus a duaay variable for aged single-person units. .1583 (5.69) -.0252 (-5.23) .0596 (15.34) 39 differences in elasticity by income level. While the interaction coefficient is not significant statistically, it does produce Harked differences in the scale. Comparing the extremes, for example, the family size elasticity for poor families is about 1.6 times higher than for families at the highest level . In sum, the data set does offer some support for the notion of interaction between income and family size and illustrates that the effect could be of considerable practical significance. In fact, the contrast noted earlier in steepness between food- based scales and those estimated on the basis of broader consumption sets might be attributable, in part, to the failure to appropriately account for such interaction effects. However, a great deal of the interaction seems to arise from the special circumstances of aged one-person units. After removing this effect, the interaction term still yields scales with noticeably different elasticities by income level, but the interaction term Is not statistically significant. Thus, in future work it would seem advisable to explicitly test for interaction effects of this type. Other more sophisticated ways of pursuing possible interaction effects are available as well, and should be considered (see for example, Deaton 1982, p. 41). Impact on estimates for the aged . — The most con- sistent and important impact on age effects and, consequently, aged equivalent income, is attribut- able to the addition of the dummy variable for aged units of size one (see tables 4 and 6). The sign is in the expected direction, and the coefficient is statistically significant in each of the four models which include the variable (III, IV, V and VII). When the dummy is present, it leads to three results. The estimate for the aged equivalent income is lower for one-person than two-person units. One-person unit equivalent income is also lower than estimated on the basis of the initial model. Finally, the ratio for two-person units increases somewhat over the initial model, but not dramatically (from about 30 to about 36 to 37 when scaled to non-aged incomes = 100). These results suggest a difference in the income - needs relation- ship (needs, of course, as measured by income satis- faction) between one and two-person aged units, but not one large enough to explain away the very large age effect uncovered in the initial model. The attempt to net out the effects of work activ- ity and assets (including owner-occupied housing) had some slight additional Impact. All additional variables were of the expected sign, i.e., the more assets the less income required to reach a given level of income satisfaction; earnings activity and nonwork due to disability among the non-aged were associated with less satisfaction at a given level of bef ore-tax income. 18 However, the combined effect of the additions was not marked and was most evident for two-person units where aged equivalent income increased from 37 (Model III) to 42 (Model V). Allowing for family size/income interaction had slight, if any, effect when taken alone (Model II), and had no visible impact when present in con- junction with the dummy for aged units of size one (Model IV vs. Model III). Use of the approximation of after-tax income also had very little net effect (Initial model vs. Model VI and Model III vs. Model VII) but the results were somewhat mixed. The coefficient for the age 65 plus dummy (j9 T ) did decline slightly (by about 5-6 percent), but the coefficient for Income (j3 ? ) dropped somewhat more (about 10-11 percent). As a result, and contrary to expectations, aged equivalent income on an after- tax basis is essentially the same as on a before-tax basis. Based on the supposition that after-tax Income would be more closely tied to satisfaction than before-tax income, one would have supposed that the Income coefficient would be larger on an after- tax as opposed to before-tax basis. This contrary finding is somewhat puzzling and may arise from the somewhat crude simulation of after-tax income employed. The main point raised earlier in the discussion of table 3 is little affected. The aged/non-aged equivalence ratios stemming from the sixth wave ISDP income satisfaction measure remain the lowest of any considered and do not appear to be credible. Almost certainly an explanation, if one is to be found, lies along different avenues than those touched on here. At the same time, and not surprisingly, the dummy variable for aged units of size one did have a marked effect on the one-person/two-person equiva- lence ratios for the aged. Whereas the initial model yielded a one vs. two-person equivalence ratio of 78, the models Including the extra dummy for aged one-person units (again Models III, IV, V and VII) yielded ratios of no more than 50. Of the ten scales considered earlier, only the one denoted ELES I yielded such a low value. Summary and conclusions . This paper has provided a general discussion of the use of direct subjective assessments of income in equivalence scale applications and has present- ed a heuristic empirical example of the estimation of an equivalence scale using such measures. The scale estimated for this paper was compared with scales obtained by others using direct subjective measures and more traditional approaches. The comparisons focused on four aspects of equivalence: 1) the impact of family size on needs among non-aged families, 2) the relative needs of aged and non-aged families, 3) gender effects among the aged, and 4) the needs of one and two-person aged units. The family size issue was reviewed in terms of the steepness of the scales. As a group, the scales based on direct subjective measures appeared to be noticeably less steep than scales based on dietary needs or food consumption and rather similar to scales based on more broadly defined consumption sets such as necessities and especially total current consumption, i.e., the two ELES scales. However, some limited experimentation with alterna- tive specifications of the model estimated in this paper suggested that this contrast may be due in part to a failure to take proper or full account of possible interaction between family size and income. Review of the findings concerning equivalence issues for the aged was inconclusive. While there was a clear tendency for the needs of the aged, as defined in these studies, to be less than the needs of the non-aged, estimates of the magnitude of the needs differential varied widely. No consistent pattern was discernible in regard to the other two issues central to question of aged equivalence (gender effects and the needs of one vs. two-person aged units) . It would seem that before a reasonably orderly set of empirical results can be expected to emerge for the aged much more work will have to be done to 40 -Non-aged unit size equivalence scales based on selected alternative specifications of the Initial model Unit Size Mode 1 * — 12 3 4 5 6 7 8 IFSE b Initial model 61 78 90 100 108 116 122 128 .36 Model II Income level AP($392/m) 48 71 88 100 110 119 126 133 .49 c PT($616/m) 52 74 89 100 109 116 123 128 .43 c M(S1.882/m) 65 82 93 100 106 110 114 118 .28° 2M($3,763/«) 77 88 95 100 104 107 109 112 . 19 c Model III 66 81 92 100 107 113 118 123 .30 Model IV Income level AP($392/m) 59 78 90 100 108 115 121 126 . Se" 6 PT($616/m) 62 79 91 100 107 113 119 124 .33° M($l,882/m) 68 83 93 100 106 110 115 118 .26° 2M(S3.763/m) 73 86 94 100 105 109 111 115 .22 c INCOME LEVEL KEY: AP - 1/12 the average annual income of poor 4-person families, 1979. PT - 1/12 the annual poverty threshold for 4-person families, 1979. M - One-twelfth the median income of 4-person families, 1979. 2M - One-twelfth twice the median income of 4-person families, 1979. "For description of models see notes to table 4 and text. Income-family size elasticity. 'Average elasticity. Elasticity for the lower part of the scale is higher than for the upper part of the scale. One-person. Two-person . *See notes to table 4 and text for description of models. b At a welfare level equivalent to that of four-person families with incomes at the median for four person families. 41 refine existing independent variables and introduce new ones and to develop More appropriate models. Likewise, considerable additional work remains to be done in order to develop the equivalence scale presented here beyond the frankly preliminary stage. Footnotes. *Dan Radner, Ben Bridges, Clarise Lancaster, Arie Kapteyn and Steve Dubnoff provided a number of very helpful suggestions and criticisms. My thanks to Sharon Johnson for working through the complicated fifth and sixth wave ISDP files to produce the matched analysis extract. Ray Becker and Mike Rozdilski also assisted with the data processing, and Weltha Logan helped with editorial details. The algorithm used to simulate after-tax income for the 1984 Proceedings version of this paper contained keying errors which had a substantial affect on some of the parameter values (namely the intercept and the coefficients for family size and income) for the models which incorporate after-tax income. Fortunately, the errors tended to offset each other and so their net Impact on aged/nonaged equivalence was slight. These errors have been corrected in the current version of the paper. 'The author encourages Interested parties to contact him at the following address: Social Security Administration, 1875 Connecticut Ave. , N.W., Rm. 320-N, Washington, D.C. 20009 *An alternative specification included female head as a control variable. The sign was negative, but not significant (t = -0.81). The model was reestimated with the five control variables listed. 'Based on comparison of the sums of squares uniquely associated with the model variables (as gauged by successively adding each variable as the last in the model), the sums of squares due to current income and family size amount to only about 60 percent of the sum of squares associated with the four subjective change variables. The powerful effect of perceived financial change on current satisfaction with income, especially change for the worse, has been documented earlier by Vaughan and Lancaster (1980) and Dubnoff, Vaughan and Lancaster (1981). ♦Sources for the scales in table 1 are as follows: [1] Rainwater (1974), tables 5-2 and 5-4, pp. 102,105; [2] & [3] Dubnoff and Strate (1984), table 2, p. 18; [4] & [7] Colasanto, Kapteyn and van der Gaag 1984, pp. 127-138; [5] Danziger, van der Gaag, Taussig and Smolensky (1984), table 2, p. 53; [6] Dubnoff (1982), table 1, p. 10; [8] Dubnoff, Vaughan and Lancaster (1981), table 2, p. 351; [9] This paper. "The exponent on the right hand side of equation (6) is in fact the negative elasticity. The elasticities appearing at the foot of tables 1 and 2 function in the same fashion, but because they are positive, they operate on the ratio FSj/FS 4 instead of FS 4 /FSj as in (6). 'While the labeling of the BNS minimum income scale as "steeper" than the others is somewhat arbitrary, its elasticity is more than two standard deviations above the mean elasticity of the nine scales presented in the table. 'The fact that this estimate is preliminary is to be stressed. Limited experimentation with alter- native models and variable definitions, some of which is reported on in the last section of the paper, indicates that a range of elasticities may be obtained. At this early stage of our research we have not developed a clear basis for selecting one of these alternatives over another. •In light of the findings reported in the last section of the paper, perhaps this statement is too unequivocal. Unlike most of the other scales, the BNS minimum income scale pertains to a poverty level income, and my own analysis yields an average elasticity of .43 for a poverty threshold welfare level when the dummy for aged units of size one is left out of the model (the model for the BNS scale contained no such dummy). Clearly, the Dubnoff-Strate results indicate that subjec- tive scales estimated for poverty income welfare levels are not necessarily "steeper". One could also protest that the ISDP minimum income scale is not "steeper" than the others, but it is also true that the income levels associated with the scale are very close to the median rather than at the poverty level. In any case, the difference between the BNS minimum income scale and the BNS WFI scale, essentially estimated at median income, has also been encountered in Dutch studies (Goedhart, et al . , 1977). although the difference is not as pronounced. Until recently, members of the original Leyden group had not made much of the difference, but recently some attention has been given to this matter (personal communication, Arie Kapteyn) . •Strictly speaking this statement is only true for scales in which age effects do not vary by unit size. This condition holds for all scales estimated with a dummy specification for age, as is the case for all but two of the scales. While the poverty line age effect was not explicitly derived on the basis of a dummy specification, the aged/non-aged ratios for one and two-person units are the same (.90). However, for the BLS scale the aged/non-aged equivalence ratio for 1-person units is .72. as opposed to the ratio of .93 based on a comparison of two-person units as shown in the table. >°An after-tax specification of income is clearly the variable of choice for most applications, but the available variable assumes use of the standard deduction only for the Federal income tax and thus underestimates after-tax income levels for about the upper third of the distribution. As this project advances we hope to develop a better measure of after-tax income and employ it more generally in our analyses. ll The income (Y) for a family of size "j" equivalent to that of a given reference level income for a family of four (Y,) may be derived from the exponentiated value of the following expression: P,(lnPS t - lnFSj i„v:il ♦ -4-rinFS.n where 6 is the coefficient for the family size/income interaction term. Details on the derivation of this formula are available from the author. •The t-values for all but one of the additional variables (living in means-tested public or publicly subsidized housing with a t-value of 42 1.41) range between 1.8 and 7.8. While the magnl- tude of a number of the coefficients seems rather large, this is probably attributable, at least In part, to the fact that the income definition employed in Model V is before-tax and so variables for homeownership and work presumably reflect tax effects In addition to whatever independent effects they may exert. References. Andrews, F.M. and S.B. Withey (1976). Social Indicators of Well-Being. New York:Plenum Press. Colasanto. D. , A. Kapteyn and J. van der Gaag (1984) . "Two Subjective Definitions of Poverty: Results from the Wisconsin Basic Needs Study." Journal of Human Resources, vol. 28. no. 1. pp. 127-138. Danziger, S., J. van der Gaag, E. Smolensky and M. Taussig. (1984). "Income Transfers and the Economic Status of the Elderly." appears as chapter 7 in M. Moon, ed. Economic Transfers in the United States. Chicago: University of Chicago Press. Danziger, S. , J. van der Gaag, M. Taussig and E. Smolensky. (1984). "The Direct Measurement of Welfare Levels: How Much Does It Cost to Make Ends Meet?." Review of Economics and Statistics, vol. 66. no. 3, pp. 500-505. Deaton, A. (1982). "Inequality and Needs: Some Experimental Results from Sri Lanka." in Y. Ben-Porath. ed . Income Distribution and the Family. A supplement to volume 8. Population and Development Review, pp. 35-49 Deaton, A. and J. Muellbauer. (1980). Economics and Consumer Behavior. Cambridge: Cambridge University Press. Dubnoff, S.J. (1982). "Validating Family Size Equivalence Scales." Boston: Center for Survey Research. University of Massachusetts, mimeo. Dubnoff, S.J. and J.M. Strate (1984). "How Much Income is Enough? Measuring the Income Adequacy of Retired Persons Using a Survey Based Approach." Boston: Center for Survey Research. University of Massachusetts. Dubnoff, S.J., D.R. Vaughan and C. Lancaster. (1981). "Income Satisfaction Measures in Equivalence Scale Applications." in The 1981 Proceedings of the Business and Economic Statistics Section. American Statistical Assoc- iation: Washington, D.C. pp. 348-352. Goedhart, T. , V. Halberstadt. A. Kapteyn and B.M.S. van Praag. (1977) . "The Poverty Line: Concept and Measurement." Human Resources, vol. 12, no. 4, pp. 503-520. Jackson, C. (1968). Revised equivalence scale for estimating income or budget costs by family type. Bulletin No. 1570-2. U.S. Department of Labor. Bureau of Labor Statistics. Kakwani, N. (1977). "On the Estimation of Consumer Unit Scales." The Review of Economics and Statistics, vol. 59. no. 4. pp. 507-510. Kapteyn, A., and B.M.S. van Praag. (1976). "A New Approach to the Construction of Family Equivalence Scales." European Economic Review. vol. 7. pp. 313-335. Kerr, R. and B. Peterkin (1976). "The Effect of Household Size on the Cost of Diets that are Nutritionally Equivalent." in The Measure of Poverty. Technical Paper XII. Food Plans for Poverty Measurement. Washington, D.C: Department of Health and Human Services, pp. 64-87. Muellbauer, J. (1980). "The Estimation of the Prais-Houthakker Model of Equivalence Scales." Econometrica . vol. 48. no. 1. pp. 153-175. Nicholson, J. (1976). "Appraisal of Different Methods of Estimating Equivalence Scales and their Results." Review of Income and Wealth, series 22. no. 1. pp. 1- 11. Nlcholoson, J. (1949). "Variations in Working Class Family Expendi- ture." Journal of the Royal Statistical Society A. 112. pp. 359-411. Orshansky, M. (1965). "Counting the Poor: Another Look at the Poverty Profile." Social Security Bulletin, vol. 28. No. 1. pp. 3-29. Peterkin B. (1976). "USDA Food Plans, 1974." in The Measure of Poverty. Technical Paper XII. Food Plans for Poverty Measurement. Washington, D.C: De- partment of Health and Human Services, pp. 3-31. Pollak, R. and T. Wales. (1979). "Welfare Comparisons and Equivalence Scales." American Economic Review, vol. 69. no. 2. pp. 216-221. Prais. S.J. and H.S. Houthakker. (1955, 2nd edit. 1971). The Analysis of Family Budgets. Cambridge: Cambridge University Press. Rainwater, L. (1974). What Money Buys: Inequality and the Social Meanings of Income. New York: Basic Books. Rainwater, L. (1973). "Poverty, Living Standards and Family Weil- Being," Studies in Public Welfare. Paper No. 12. Sub-Committee on Fiscal Policy. Joint Economic Committee of the Congress. Sen, A. (1982). Choice, Welfare and Measurement. Cambridge, Mass.:The MIT Press. Seneca, J. and M. Taussig. (1971). "Family Equivalence Scales and Personal Income Tax Exemptions for Children." The Review of Economics and Statistics, vol. 53. no. 3. pp. 253-262. Sydenstriker, E. and W. King (1921). "The Measurement of the Relative Economic Status of Families." Quarterly Publication of the American Statistical Association, vol. 17. pp. 842-857. United States Department of Agriculture. (1975). "The Thrifty Food Plan." in The Measure of Poverty. Technical Paper XII. Food Plans for Poverty Measurement. Washington, D.C: De- partment of Health and Human Services. U.S. Department of Labor. Bureau of Labor Statis- tics. (1948). Workers Budgets in the United States: City Families and Single Persons. BLS Bulletin 927. van der Gaag, J. and E. Smolensky (1982). "True Household Equivalence Scales and Charac- 43 teristics of the Poor in the United States." Review of Income and Wealth, series 28. no. 1 pp. 17-28. van Herwaarden, F.G. , A. Kapteyn and B.M.S. van Praag. (1977). Twelve Thousand Individual Welfare Functions of Income. " European Economic Review, vol. 9. pp. 283-300. van Praag, B.M.S. (1971). "The Welfare Function of Income in Belgium: An Empirical Investigation." European Economic Re- view, vol. 2. pp. 337-369. van Praag, B.M.S. and A. Kapteyn. (1973). Further Evidence on the Individual Welfare Function of Income: An Empirical Investigation in the Netherlands. European Economic Review, vol. 4. pp. 33-62. Vaughan, D. and C. Lancaster. (1980). "Applying a Cardinal Measurement Model to Normative Assessments of Income: Synopsis of a Preliminary Look." in The 1980 Proceedings of the Survey Research Methods Section. American Statistical Association: Washington, D.C. Vaughan, D. and C. Lancaster. (1979). "Income Levels and Their Impact on Two Subjective Measures of Well-being: Some Early Speculations from Work in Progress." in The 1979 Proceedings of the Social Statistics Section. American Statistical Association: Washington, D.C. pp. 271-276. Watts, H.W. (1980). "Special Panel Suggests Changes in BLS Family Budget Program." Monthly Labor Review, vol. 103. no. 12 (Dec. ) . pp. 3-10. Watts, H.W. (1967). "The Iso-Prop Index: An Approach to the Determination of Differential Poverty Income Thresholds." The Journal of Human Resources, vol. 2. no. 1. pp. 3-18. Yeas, M. and C. Lininger. (1981). "The Income Survey Development Program: Design Features and Initial Findings." Social Security Bulletin, vol. 44. no. 20, pp. 13-19. 44 Courtenay Slater, CEC Associates The number and variety of SIPP papers being presented at these meetings — In advance of actually having seen any data from SIPP — certainly whet one's appetite for what is to come once data actually become available. The three papers on which I will be commenting raise two general points to which I want to draw attention. One is the special difficulty of measuring the wealth of the wealthy. The other concerns the complexities of defining income and the importance of taking noncash income Into account. A category of noncash income deserving particular attention is the implicit income derived from owning and occupying one's own home. Measuring the Wealth of the Wealthy The Lamas and McNeil paper presents evidence of excellent response rates being obtained in the first few rounds of SIPP data collection. It also illustrates the first of the general points I want to make, the one about measuring the wealth of the wealthy. SIPP will provide better data on the asset of lower and middle income groups than on those held by the wealthy. This is not a criticism of SIPP — a single survey cannot do absolutely everything. Rather, It is a caution to be borne in mind about the need for careful studies to evaluate the quality of the asset data. And it may even be an argument for the occasional use of enhanced samples of upper income households. There are several reasons for anticipating diminishing accuracy of the asset data as one moves up the income scale. The assets of the wealthy are more varied and held In more complicated forms. The chances of forgetting some assets increase and so do the incentives for concealment. The wealthy are habituated to not telling the government any more than they must. In addition, the variety of forms in which the assets can be held and the rapidity with which new investment vehicles evolve make it impossible for the SIPP questionnaire to specifically cover them all. The SIPP questions are detailed and comprehensive, but there will always be some new forms of commodity futures, stock options, or other esoteric investments which won't get picked up, or which won't be accurately valued. The pattern of response rates described in the Lamas-McNeil paper reinforces one's prior expectation of poorer response rates among the wealthy. Overall, response rates promise to be very good. I say "promise", because the Lamas- McNgll data on response rates is based in part on a pre-test, together with an assumption that, if callbacks had been completed in the pretest (which they were not) , the information would have been supplied in a high percentage of cases. Let us suppose, however, that actual response rates equal the Lamas-McNeil projections. Nonresponse among those ages 45 to 64 will be about three times as great as for those under age 34. Nonresponse for college graduates will be twice that of those with less than a high-school education. In other words, the age and education groups known to have the highest average incomes have the highest nonresponse rates. These response rates pertain to asset ownership. When the questions about asset amounts were asked, nonresponse was especially high regarding stocks and bonds, types of assets held predominately by upper income households. To sum up, the percentage of total wealth missed by the SIPP survey is likely to be substantially greater than the percentage of nonrespondents to the survey. There will be a need for studies which can quantify this gap. A further problem with respect to valuing and interpreting assets is that asset values fluctuate. ValueS of some assets, such as common stocks, can fluctuate suddenly and by large amounts. The Dow Jones Industrial Average Increased 52 percent from June 1982 to June 1983. This, of course, was a genuine increase in wealth for stock holders and one of which they generally were aware. But the 8 percent rise in the Dow between July 30 and August 2, 1984 may or may not turn out to have any meaningful degree of permanence. Individual stocks can fluctuate by far more than the market averages. How accurately will survey respondents be able (and willing) to value assets such as common stocks? And how meaningful would an accurate valuation as of a specific date be anyway? Changes in asset values represent changes in wealth, but for assets whose value fluctuates widely and freqently around the underlying trend, some smoothing of short run fluctuations will be desirable for many analytic purposes. This question will be of importance both for cross-sectional comparisons of families who hold their assets in different forms and for analysis of changes in wealth over time. The message for the Census Bureau is that the SIPP data should be made available in a way which allows the user the flexibility of using alternative methods of asset valuation. Owner-Occupied Housing For the majority of middle-income families, the family home represents their most substantial investment. In comparing the income and wealth of families which do and do not own their own homes, failure to include the implicit income received from living in an owned home produces misleading comparisons. This can be illustrated by comparing the incomes of two hypothetical retired couples. Both have identical incomes from pensions and social security. The A's live in their own fully-paid-for home, which has a market value 45 of $100,000. The B's have sold their similar home for $100,000 and invested the proceeds in tax-exempt bonds which provide them with income of $10,000 a year. On a simple comparison of cash income the B's are $10,000 better off. However, they pay $10,000 per year in rent on their Florida apartment, an expense the A's do not have. In fact, the A's and the B's have equivalent incomes and equivalent wealth, which they have chosen to utilize in different ways, and a comparison based only on money income is quite misleading. My example, of course, is greatly over- simplified. I have left out the taxes, maintenance expenses, and so forth that the A's incur living in their owned home. Introduction of these and other complexities would be required in any real life statistical presentation, but this would not alter the need to include the value (income) derived from home ownership in any valid comparison of the relative well-being of renters and home owners. The Bureau of Economic Analysis and the Bureau of Labor Statistics have wrestled with this problem for years, and have found ways of handling it. BEA includes the imputed value of the shelter provided by owner-occupied homes in the personal consumption expenditure component of GNP and the equivalent imputed income in national income. BLS now uses rental equivalence to estimate home ownership costs for the Consumer Price Index. With the new emphasis on noncash income, the Census Bureau also needs to develop ways of including income from home ownership in its income tables. This issue was brought to mind by both the Radner and the Vaughan papers. Radner isolates that group of elderly who have both low incomes and limited assets and compares them to the universe of all elderly households. This is a very useful cross-classification, and one would like to see how it would look with an expanded income definition which includes noncash income. Looking only at their cash incomes the financial picture for the low income, low net worth group is grim, indeed. The gap between this group and other elderly households would be narrowed by the inclusion in income of food stamps, medicaid, and other means-tested noncash assistance, but it would be widened by including imputed income from home ownership. If the income concept is widened, as it should be, it should be done comprehensively to include the noncash income received from all major sources, not .lust from government assistance. The role of noncash income also needs to be considered in interpreting the Vaughan paper. Vaughan asserts that "the aged require less income than the nonaged to sustain a comparable level of living. For example, they are able to maintain equivalent nutritional status at lower levels of food consumption, they often live in fully paid for owner occupied housing, no longer experience the expenses associated with working and generally a smaller portion of their incomes is devoted to taxes." If this statement is correct at all, it is correct only because the income definition on which it is based is pre-tax money income. This is the concept for which data is most readily available, but it is not the most appropriate concept for making comparisons of income needs (or, as Vaughan terms them, "family equivalence scales"). The better concept would be after-tax, after-transfer money and non-money income, incuding imputed income from owner-occupied housing. I suspect such a comparison would show that it costs the elerly more, not less, to achieve any given standard of living. Many of these costs are met by Medicare and Medicaid, however, and others by the in-kind return realized by continuing to utilize the house, furnishings, and other consumer durables purchased earlier in the life cycle. It is only because a smaller proportion of their living costs are met from current money income that the elderly may be able to achieve a given standard of living with less "income," narrowly defined. These three papers on SIPP are tantalizing in their promise of the value of the data becoming available. They also serve to remind us of the conceptual and analytic work still to be done if SIPP data is to be given the most useful and meaningful presentations possible. 46 SURVEY OF INCOME AND PROGRAM PARTICIPATION: SESSION II This section is comprised of five papers presented in this session which was sponsored by the Section on Social Statistics. TOWARD A LONGITUDINAL DEFINITION OF HOUSEHOLDS David Bryon McMillen and Roger A. Herriot, Bureau of the Census Introduction and Background Data collection and analysis in the social sciences generally focus on cross-sectional surveys such as the Current Population Survey (CPS). Conseauently, most of our concepts and data analysis tools are structured around point estimates of some phenomenon or characteristic. To the extent that we try to develop longi- tudinal concepts and measures of social phenomena, that is to say viewing events across time rather than at one point 1n time, we con- flict with these cross-sectional structures. It 1s the goal of this paper to confront that cross-sectional /longitudinal conflict and attempt some reconciliation. More specifically, this paper attempts to develop longitudinal definitions of households and families which are useful for observing these units across time and for constructing aggregate characteristics across that time period, while not creating serious conflict with our cross-sectional con- structs of household type and composition. We begin this exercise by examining cross-sectional household concepts from the CPS and recounting the deficiencies of that perspective. Next we will examine several types of lonqitudinal definitions, identify the type that is usually cited as most useful, and describe three defin- itions within that framework. In the third section of this paper we will evaluate the defin- ition options available in terms of utility as well as what is possible given the data at hand to implement such a definition. Next we will illustrate how this definition might be used in calculating aggregate household character- istics and in tabulations of the number of house- holds, household types, and household character- istics. Point Estimates and Longitudinal Measures The household definitions used in the CPS serve as adequate measures for the Intended crosssectional purposes. Indeed, few people argue that these definitions create a problem when estimating the number of households by type at the time of the survey. However, when those definitions are used in conjunction with other variables, problems begin to develop. Discussions on measuring annual household income from the CPS center around the retro- spective nature of the measure. Household members as of March are asked to recall their income for the previous calendar year, and the income of all members are aggregated to create a household income. The problem centers on the varying lengths of household membership and the unvarying than to provide familiarity. Static definitions, a person 1s a member of the house- hold for part of the year and, thus, contributes income for only part of the year, that person's entire annual Income 1s Included 1n the house- hold Income. Similarly, persons who are not members of the household at the time of the sur- vey are not included in calculating the annual household income, even though they may have con- tributed income for most of the year. This type of criticism 1s often used to question the adequacy of the CPS Income measures; however, 1t 1s better viewed as an example of the problems created by combining a point estimate of house- hold composition with a longitudinal (annual) measure of Income. Inevitably, the compromises necessary to combine such cross-sectional and longitudinal constructs produce a less than Ideal measure. Similar criticism of the CPS household data can be made. If we examine consecutive March measures of the distribution of households by type we observe little change. The CPS measure of households masks most of the interesting change in the distribution of house holds. For example, in recent years the number of married couple households has changed at a ..rate of less than 1 percent a year, or about 200,000 house- holds. Concurrent with that indistinguishable change are over 3 million marriages and divorces, not to mention changes in household type as a result of the death of one member. The small net change creates the appearance of stability, while masking considerable activity. Again, the problem is not so much the Inadequacy of the data, but rather the difficulty of measuring longitudinal events with point estimates. When criticism is leveled against a particular measure, the problem often is not the measure but rather the incongruity between the measuring Instrument and the time frame being considered. The examples used above are annual measures, but the same problem exists regardless of the length of time. Most social measurement is discrete while time is continuous. The goal of course is to get to the point where the difference between the two is trivial and can be easily ignored. In summary, much of the criticism of CPS measures can be attributed to this discrepancy between the time reference of the social measurement and the cross-sectional survey Instrument. One solution to the problem is to decrease that difference by repeatedly measur- ing the phenomenon in question during the year. Those observations can then be aggregated to produce measures which cover a number of time Intervals. It 1s from this perspective that the design for the Survey of Income and Program Participation (SIPP) has developed. The design of this survey 1s to interview the household every 4 months over a 2 1/2-year period, and to collect in those interviews monthly data on household composition, Income, labor force participation, and a number of other characteristics. Those monthly data can then be aggregated to larger temporal units such as quarterly or annual measures. However, with the Idea of aggregating monthly units comes the problem of defining which units should be agoreg- ated across time and which should not. That is to say, which households are the same over the period, which exist at the beginning of the period but not at the end, and which exist at the end but not at the beglnnlnq. Without such a definition, aggregating above the person level 1s Impossible. 49 Defining households across time is an Issue that has been debated for several years without resolve; however, it is necessary that the Census Bureau decide which of the many proposed methods will be used for the publication series from the Survey of Income and Program Participation (SIPP). Thi s paper moves one step nearer to that decision by summarizing the proposals on how longitudinal households should be defined and recommending a system to be used. In addition, this paper will begin to Identify what conceptual and processing problems remain un- solved given the definition chosen. Several proposals have been offered for defin- ing longitudinal households. Griffith (1978) outlines six measures, one of which is the tradi- tional Current Population Survey (CPS) defini- tion. Griffith also proposes that the Census Bureau use several definitions 1n tabulating households from SIPP. Others who had proposed variant measures Include two from Davey (1980), Crosby (1979), Lane (1978), * Smith-Yeas (1981). Yeas (1981) in summarizing past work identifies four keys for labeling definitional methods: static; dynamic; staticdynamic hybrids; and attribute methods. In the following section we will discuss several types of longitudinal defin- itions for households. Types of Longitudinal Household Definitions Static definitions of households fix the household composition and characteristics at a given point in time and calculate other attri- butes from that point. These definitions are the standard cross-sectional perspective on households common to the CPS and other similar surveys. Using a point estimate of household composition, other attributes are calculated assuming that the composition chosen existed for the full period. Thus, some estimate of annual income for each member is aggregated to produce an annual household Income, regardless of whether members were there for the full period or joined the day before the interview. This type of household definition is the logical out- growth of cross-sectional surveys where Inter- views are conducted at one point in time and aggregates of past events are a function of respondent recall. This type of definition coincides with the instantaneous conception of a household which we use from day to day. Static definitions are both useful and fami- liar for cross-sectional surveys; however, they serve little purpose 1n longitudinal surveys other than to provide familiarity. Static definitions, for a number of reasons, Ignore the dynamic activity common to households — households are formed by marriages and dissolved by divorces, children leave home and set up their own house- hold, or move in with relatives, and so on. It 1s difficult to justify the expense and com- plexities necessary to measure these dynamics 1f we then suggest to Ignore them 1n defining house- holds. It is useful to portray static defini- tions here, however, for they represent one end of the definition spectrum. Dynamic definitions of households occupy the other end of the spectrum. These definitions recognize change as Inherent 1n observslng households across time, and attempt to in- corporate that change Into the definition. Thus, household characteristics and attributes become valable to measured as households change, are created, and dissolve during the period of obser- vation. In other words, these definitions attempt to minimize the extent to which dynamic concepts are forced into static categories. Needless to say, dynamic definitions are better suited to a longitudinal survey such as SIPP; however, such definitions are difficult to devise and to carry out. One of the first difficulties encountered with dynamic definitions is that they produce measures which are not readily familiar to many of those who use census data. The most common illustra- tion of this point uses household size. Static definitions produce measures of household size such as 2 or 3, which are 1nti=uitively meaning- ful. That is, they fit with our instantaneous image of households because they represent the household size at one point in time— the survey date. Dynamic definitions produce measures of household size which look more like averages across a number of households— 2.4 or 3.2 members 1n the household. These measures are summary statistics of the household experience, summariz- ing across time. In other words, dynamic defini- tions force us out of that instantaneous view of households and into thinking about them as something which change across time; our stat- istics produce a summary of that change. Yeas (1981) suggests several ways of handling the problem of household size-rounding, using modal size, etc.— however, it may be best to reeducate the reader to think of annual household character- istics as the aggregate of a number of discrete experiences. There are other more troublesome problems to be dealt with in developing dynamic household definitions. I will deal here only with definitional problems, acknowledging that there are also measurement problems to be con- sidered later in this paper. Unlike many demographic variables, there are several aspects of dynamic households for which there 1s no definition or consensus as to what constitutes a change in type. For example, if a husband and wife divorce, there are several ways we can account for this on our household ledger. We could count this as the dissolution of the husband/wife household and the formation of two new households. This results in a net increase of one household in existence at that point in time and an increase of two households when countinq the number of households existing during the period. Alternatively, we could allow one household to be the continuation of the husband/ wife household. Again we have a net increase of one household in existence, but because of the continuing household, we increase only by one the number of households existing during the period. To generalize, a household may experience a number of changes across time and we can converse easily about the discrete events. However, we do not have a clear concept of when those changes result In the formation of a new household and the dissolution of an old household. One extreme 1s to say that any change to the household com- position results in a new household. At the 50 other extreme are those who say that this 1s an Issue without resolution and suggest that we abandon the measurement of household character- istics except as they pertain to Individuals. In other words, before we can Implement a dynamic definition of households, we must first develop a set of continuity rules or accounting principles which identify cases of household dissolution, household formation, and cases where two house- holds at two points in time are identified as the same household. Most longitudinal household definitions that have been proposed fall somewhere between the static and dynamic extremes. Each acknowledges the difficulty of developing continuity rules, and proposes some static-dynamic blend to flness those problems. A numher of cross-sectional/ dynamic hybrid definitions have been proposed. One set of these definitions 1s quasi-dynamic and acknowledges that a set of continuity rules has yet to be developed. Another set is basically a static system designed to avoid the continuity dilemma. Neither of these alternatives is parti- cularly attractive. In the latter case, most of the alternatives create as many problems as they solve. In the former case, if we are going to develop a set of continuity rules, then there is little need for a hybrid definition. Attribute-type definitions are drawn from the work done on the Panel Study of Income Dynamics (PSID). The goal in these definitions is some- what different than in the previous discussion. Rather than attempting a longitudinal defini- tion of households, this system calculates a series of cross-sectional households at some smaller time interval, and then ascribes the characteristics of the household to each in- dividual. Measures for some longer time In- terval are then calculated by aggregating the series of point estimates across each individual to represent that person's household experience during the period. This system will yield the number of persons who lived in "households" with a monthly income of $1000 to $1500 during the year; however, it is more difficult to derive the number of households with an annual household income of $12,000 to $18,000. In fact, without an additional set of assumptions, this system does not produce an accounting of households across time. In order to develop household statistics within the attribute system 1t is necessary to assume, for example, that the house- holder at the end of the period will represent the household experience. Then the household attributes ascribed to that person are aggregated to produce household characteristics. Those aggregrated attributes represent the house- holder's experience during the period, but not necessarily the experience of the other persons in the household at that time. As can be seen, an assumption such as this contains many of the weaknesses of using a static or cross-sectional definition of households, with few obvious advantages at the household level. In summary, there are four types of household definitions which have been proposed for use with longitudinal surveys: cross-sectional; dynamic; cross-sectional /dynamic hybrids; and the attribute system. The cross-sectional approach is clearly Inappropriate since it Ignores the dynamic nature of the data. The attribute system Incorporates the dynamic aspects of the data but dodges the issue of developing continuity rules for households. Consequently, this system, by ignoring the social structure of households, produces many of the same problems raised 1n criticism of the CPS measure of annual household income. It 1s clear that a dynamic definition 1s the most desirable alternative, but agreement on just how that definition ought to be formed 1s elusive. In the following section, I will discuss dynamic definitions of households and present three sets of continuity rules. The first, proposed by Norton (1982), defines change, rather than continuity, as movement between major types of households. The second is a reciprocal majority rule system developed by Dicker and Casady (1982). The third, developed by Siegel (1982), sets out a set of demographic principles to which continuity rules should conform and then develops a continuity rule within that framework . A Dynamic Definition of Households A dynamic definition of households is much easier to describe than to execute. Dynamic household definitions do little more than acknow- ledge change in household composition or type and determine that the change must be incorpo- rated. In other words, dynamic definitions acknowledge a set of accounts and to some extent set up the framework for those accounts, but usually do not explicate the principles by which the ledger 1s filled. For SIPP, 1t is suggested that we tabulate the changes that occur in household composition and type during the period covered by a panel . That 1s to say, we must acknowledge that households change, are formed, dissolve, and sometimes stay the same. In the simplest form then, our accounts will record the formation and dis- solution of households from our original sample as well as changes in size and type. These tabulations of change will use as often as possible the standard descriptors, such as family and nonfamily households, and the categories associated with those types. Our dynamic defin- ition begins with the static CPS definition defined at the beginning of the panel and then traces the changes that occur to those households across the duration of the panel. Norton (1982) suggests that we define a household as changed when its membership changes in such a manner as to classify it as a different type of house- hold. He proposes that we acknowledge change between the following types of households: I) married-couple household; 2) male family-house- hold; 3) female family-household; 4) male non- family-household; and 5) female nonfamily-house- hold. Thus any change which results in a house- hold falling into a different category results 1n the dissolution of one household and the formation of another. To Illustrate this system consider a husband-wife household which experiences a divorce. Norton's scheme would consider the husband/wife household dissolved and two new households formed. The two new households would be family households 1f there were children present and nonfamlly households otherwise. A second longitudinal definition has been pro- posed by Dicker and Casady (1982) for use with the National Medical Care Utilization and Expenditure Survey (NMCUES). In a slight departure from the definitions discussed here, Dicker and Casady focus on families rather than households; however, that does not pose any serious problems. Like others who have approached this problem, Dicker and Casady begin with the realization that there is not consensus on when families begin or cease to exist; rather, such transitions are in part a function of the problem being investigated. The NMCUES model for defining longitudinal families requires that antecedents and descendent families or, in their terms, predecessor and successor families be defined reciprocally. That is to say, any rules defining relationships across time must be applied to both families simultaneously. Dicker and Casady next demonst- rate that when applying these rules you wind up with links between a number of households. That is to say, any family is likely to have more than one predecessor and more than one successor family. Thus, the problem lies in defining which of the possible pairs will be defined as the longitudinal family. As with most longi- tudinal definitions, the system eventually reduces the decision to what will be defined as the same and what will be defined as chanqe. Dicker and Casady chose to define sameness by a majority rule. The successor family which receives the majority of members from the pre- decessor family is identified as the "principal predecessor." These two families then form the linked or longitudinal family unit. Finally, in cases where families split evenly, 1t should be noted that the NMCUES model does not define a longitudinal unit, but rather dissolves the predecessor family and considers all successor families as newly formed. Five rules of relationships focus Siegel's (1982) development of household demography. The first two state that a household can have only one descendent and one antecedent that is identi- fied as the same household. That is to say, when a household splits, only one of the subsequent households can be Identified as "the same" house- hold. The third and fourth rules identify house- holds which are not the same as some precedinq or succeeding household. Households which are not the same as any antecedent household are newly formed; a household not the same as any descendent household is dissolved. The final rule, one of transitivity, states that 1f A 1s the same as B and B 1s the same as C, then A must be the same as C. All that remains to com- plete this set of accounting principles 1s a definition of sameness. The rule 1s offered that two households separate 1n time and having the same householder are the same household. Continuity based on following the householder has been criticized because of the somewhat arbitrary way 1n which the householder 1s defined, and because 1t creates what some consider un- reasonable change within a continuing household. The most frequently cited example of such change 1s following the male after a divorce when the children remain with the female. An alternative to Siegel's householder rule which 1s consistent with his demography of households 1s to follow the principal person. The principal person is the female 1n the married-couple household and the householder 1n all other households: this is the concept used in developing household weights 1n CPS. By following the principal person, we alleviate the problem cited above. Of course the problem now occurs when the children stay with the male following a divorce, a much less frequent event. FIGURE 1 Norton Siege! Dicker Time 1 ABcd* ABcd* ABcd* Time 2 A* Bed* A Bed* A* Bed Time 3 Ac* Bd Ac Bd Ac Bd Time 4 Acd B* Acd B Acd B ♦New household Let us consider briefly the strengths and weaknesses of these three systems focusing on two Issues: 1) the number of households created across time, and 2) the extent to which the definition promotes or discourages longitudinal analysis. Norton's system comes the closest to maximizing change and, as a result, creates more households than the others. Consider the divorce example from above, but two children remain with the female. Both Siegel and Dicker and Casady would produce a total of two households resulting from the divorce. Norton's system produces three households: 1) the original married-couple household; 2) a male nonfamily-household; and 3) a female family-household. Let us continue following these people and assume that the child- ren leave the female one at a time and join the male (see figure 1). In Norton's scheme, the first move by a child would produce the dis- solution of the male nonfamilyhousehold and the creation of a male family-household. Our longi- tudinal count of households now stands at four. Neither Dicker and Casady nor Siegel would pro- duce new households as a result of the children moving, moving. When the second child moves, the female family-household is dissolved and a female nonfamily-household is created. The male family-household remains unchanged. Over these four observations, Norton's system produces five households; both Siegel and Dicker and Casady produce only two by allowing the contin- uation of a household across these changes. Let us then look at those continuing households. The continuous household for Siegel's householder rule starts as a four-member married-couple household, dwindles to one member—the male, and increases to two with the addition of one child and then to three with the addition of the second child. On the other hand, Dicker and Casady' s continuous household begins as the four-person marrledcouple household and 1s transformed by the divorce to a three-person female-headed household, then to a two-person and, subsequently, a one-person nonfamily-household. It should be noted that these two continuous households follow opposite courses after the divorce. The continuous household under the principal -person 52 rule would be identical to Dicker and Casady's continuous household. We should stop at this point and examine yet realistic example of household change. First, 1f we are interested in counting households (the number or percent of households 1n poverty during the year, for example), then a con- tinuity system such as Norton's, which allows for continuity in only the most trivial cases, creates a much larger number of households. Suppose the female half of our mythical house- hold was 1n poverty after the divorce. By Norton's count, during that year we would have 20 percent of the households in poverty. Dicker and Casady and Siege! would show 50 percent of their households in poverty. A second observation to be made here is that, regardless of what sort of continuity rule we adopt, we will observe households which contain a wide variety of change. The question we must ask is whether we accept that households under- go such change and remain Intact. As noted above, each of the three systems has its constituency and its detractors. Siegel's system is criticized because of the disjunctures that can occur following a divorce. For example, the continuous household will follow a male householder who divorces his wife even though the wife and children remain in the housing unit as a group. Similarly, Norton's scheme is criticized because of the lack of attention paid to continuity. Dicker and Casady are criticized by the mechanical nature of the majority rule. Why, it is asked, should one person make all the difference in whether a family is designated new or continuous? None of the definitions of longitudinal house- holds offered in the literature has proved viable. However, in the process of discussing this issue with several demographers and economists, it became clear that any definition which labels a transition from a family household to a nonfamily household as continuous causes problems; although there are some cases where the transition from nonfamily to family house- hold occurs within a continuous unit. The most obvious of these is the marriage of two persons who have been living together. Drawing on that experience, we determined that we should develop a longitudinal definition of families separate from that for nonfamily households. We begin with the CPS definition of a family as two or more persons, one of whom is the house- holder, related by birth, marriage, or adoption, and residing together. To make this cross- sectional definition dynamic, we must add the time dimension or develop a continuity rule. Thus, a longitudinal family 1s defined as two or more related persons, at least one of whom is the householder or spouse of the householder, who had the same household experience over two or more consecutive months. We further stipulate that no more than one core family unit with children can continue from a previous-month family. Three levels of criteria are offered to distinguish cases where both parents and children split Into two or more households. The first-level cri- terion, for continuity, is that the family with the most child-months 1s identified as continuous. The second level, to distinguish between families with the same number of child-months, 1s the family with the most family-months. In the third level, 1f two potential continuing units tie on both of the above criteria, then the continuing unit will be assigned randomly. Two elements have been added to the CPS definition: 1) the time dimension, and 2) the Inclusion of spouse as part of the continuity criteria. Let us examine this definition more carefully. Consider again our four-step example of divorce and then the movement of two children one at a time from one parent to another. Following the separation, the Bed family would be the continu- ing family because 1t contains two or more members of the Initial family, one of whom is the householder or spouse. The A household would be new because of the transition from family to nonfamily status. Following the move- ment of the first child, c, the Ac family is considered newly formed because of the transition from nonfamily to family status. Finally, the movement of d from the Bd family to "the Ac family results 1n a new nonfamily household, B. Using the notation from figure 1, we have: 1. ABcd* 2. A* Bed Ac* Acd Next, we must confront defining continuity for nonfamily households. A nonfamily household is a householder living with nonrelatives only. For these cases, we have adopted a 50-percent rule. As long as the householder and 50 percent or more of the household is the same at two points in time, the household is considered a continuous household. The distinction between this and the majority rule 1s that, rather than creating new households for even splits, this rule provides for continuity. This definition of family and nonfamily households provides the basic parameters. What remains is to develop a set of programming specifications to implement this definition. Other possible longitudinal units or groups exist in relation to federally funded support programs. For example, food-stamp units and AFDC units are defined independently of the household and, in fact, households may contain more than one of these units. Longitudinal units for these programs will be defined on the basis of the person in whose name the program application is filed. For example, in a husband-wife, two- child family, the male is defined as the food- stamp recipient. If he leaves, that food-stamp unit 1s dissolved; a new one is formed if the female reapplies and is found eligible. Perspectives on Household Characteristics In this section, I will address the uses of this longitudinal definition of households and argue that we need to tabulate household data from SIPP using at least two types of longi- tudinal definitions. The need for two types of definitions 1s a function of the kind of house- hold Information needed. I am arguing that we need to use both a dynamic longitudinal and an attribute-type house- hold definition, because we are interested in 53 both the experience of households and of In- dividuals 1n households. There are some characteristics like annual household income which we need to examine both as a characteristic of the household and of the individual. This 1s only to say that there are multiple meanings associated with the concept of income. In CPS, where we have only one way of obtaining Income data, we attach all of those meanings to that single measure. SIPP allows us to decompose that measure and look at the components more carefully To summarize, I have argued that to fully appreciate the household dynamics we observe in SIPP and to portray that activity over a year, we should provide two types of tabulations. The first tabulates household characteristics using a lonqitudinal definition and examines how changes in some characteristics result 1n changes in others. The second type of tabulation examines how household characteristics affect individuals across time. REFERENCES J.E. Griffith, Impact of a Longitudinal Survey on the Data Base. (ISDP Working Paper #1, 1978.) E. Davey, Memorandum for the record. 12/24/80. B. Crosby, Memorandum to Lane, Yeas, and Mahoney. 7/79. J. P. Lane, Four Alternative Ways of Defining "Annual" Analysis Units. (ISDP Working Paper #6, 1978.) Smith- Yeas, Developed by Yeas based on reports of proposal by J. Smith. 1981. M. A. Yeas, Income Analysis Units for Longi- tudinal Files. 1981. A. J. Norton, Notes prepared for 11/18/82 meet- ing of SIPP Household Task Force. P. Slegel, Notes prepared for the 10/81 meet- ing of the Census Advisory Committee on Population Statistics. M. Dicker & R. J. Casady, A Reciprocal Rule Model for A Longitudinal Family Unit. Paper presented at the 1982 meetings of the American Statistical Association. 54 LIFETIME WORK EXPERIENCE AND ITS EFFECTS ON EARNINGS: DATA FROM THE INCOME SURVEY DEVELOPMENT PROGRAM John M. McNeil, U.S. Bureau of the Census and Joseph J. Salvo, NY Department of City Planning earnings was the lack of data on lifetime work experience. More recently, however, a number of studies have been published which exploit the important data which has been made avail- able from the National Longitudinal Surveys of Labor Market Experience (NLS) and the Michigan Panel Survey of Income Dynamics (PSID). Suter and Miller (1973) were among the first to analyze the retrospective work history data from the NLS. They studied a cohort of women who were 30 to 44 years of age in 1967. Work experience was based on a question which asked about the total number of years in which the person 3 had worked at least 6 months. Suter and Miller concluded that there was a close associ- ation between earnings and length of work experi- ence among this cohort of women. Mincer and Polachek (1974) extended the analysis of the NLS retrospective data. They specified two reasons why discontinuous work history patterns might lead to lower earnings. First, interruptions in market work lead to lower levels of accummulated human capital. Second, interruptions cause a depreciation in existing human capital. That is, time spent away from market work has a cost beyond the effect of foregone experience. In their analysis, Mincer and Polachek found that the amount of time spent at home had a negative impact on earnings even when experience was also included in the earnings equation. They concluded from their analysis that a depreci- ation effect does, in fact, exist. This finding was challenged by Sandell and Shapiro (1978) on the grounds that the NLS data used by Mincer and Polachek were subject to various coding errors. They replicated certain of the Mincer-Polachek research using a correc- ted NLS file and concluded that the original study had overestimated the depreciation effect. Corcoran (1979) conducted an analysis of the effect of experience and interruptions on earn- ings using retrospective data from the PSID. One of the major advantages of the PSID data set was that the sample, in contrast to the NLS samples, was representative of the female population 18 to 64 years of age. Corcoran found yery little evidence of a depreciation effect. There was no effect for White women and only a minor effect for Black women. Cor- coran also argued that restricting the analysis group to women 30 to 44 years of age is likely to overestimate depreciation because many women in this group have recently reentered the labor market and are likely to be affected by misinformation about job opportunities. More recently, Mincer and Ofek (1982) used NLS data for 30- to 44-year-old married women to reaffirm the depreciation hypothesis. In an analysis of longitudinal (rather than retrospective) data from the NLS, they found that reentry wage rates were lower than wage rates at the time of labor force withdrawal. Furthermore, longer interruptions carried greater wage penalties. They also found, how- ever, that wage rates tended to grow rapidly upon return to work. The observed amount of The extent to which persons remain attached to the labor force over the course of their working-age years has important economic and social implications. Differences in labor force attachment between men and women has been cited as one major reason why women earn less than men. This study presents data from the 1979 Income Survey Development Program (ISDP) on lifetime work interruptions and examines the relationship between work inter- ruptions and earnings. Descriptive data show- ing the extent to which men and women have experienced work interruptions are presented, followed by an analysis of the impact of work interruptions on earnings. The study concludes that work interruptions explain only a small proportion of the earnings differential between men and women. The 1979 ISDP was a panel survey of approxi- mately 9,000 households that were visited at 3-month intervals over a period of a year and a half beginning in February 1979. The survey, part of the development stage of the new income survey called the Survey of Income and Program Participation (SIPP), was a joint effort of the U.S. Department of Health and Human Ser- vices and the U.S. Bureau of the Census. The third wave questionnaire contained a section on personal history and within that section were questions on lifetime work interruptions. The questions (reproduced in Figure A) asked whether the person had ever been away from work for 6 months or longer for each of three reasons: (1) because he or she was not able to find work, (2) because he or she was taking care of home or family, and (3) because he or she was ill or disabled. Beginning and ending dates were recorded for each interruption. A maximum of four interruption periods were identified for each of the three possible reasons for interrupting. A major reason for the interest in data on lifetime work experience is the desire to use such data in the analysis of male-female earn- ings differentials. The tenets of human capi- tal research have traditionally stressed the importance of work experience patterns as a determinant of earnings. The descriptive data presented in the first part of this report con- firm that the lifetime labor force attachment of women is weaker than that of men. Because of interruptions for familial reasons, women have a much higher overall rate of work inter- ruptions than men and they spend a much higher proportion of their potential work years out of the labor force. Such findings have led at least some social scientists to posit that traditional familial responsibilities are one major reason why women earn less than men. This section will describe selected studies of the relationship between work interruptions and earnings and will present an analysis based on the 1979 ISDP data. Previous Research A major constraint in early efforts to ex- amine the relationship between experience and depreciation, they concluded, is dependent upon the length of the interruption and the length of time spent back in the labor force. ISDP Data The effect of work interruptions on earnings was examined by using the data described ear- lier to construct variables representing inter- ruptions and experience. These variables were included in regressions which related hourly earnings to a set of explanatory variables. The universe for this part of the study consi- sted of all persons 21 to 64 years of age with wage and salary income during the quarter pre- ceding the interview. Separate regressions were run for men and women, with the log of hourly earnings as the dependent variable. 1 The interruption and experience variables used in the regressions include the following: UNEMP = 1 if person had ever experienced an interruption due to an inability to find a job; otherwise. DISAB = 1 if person had ever experienced an interruption due to illness or disability; otherwise. TIME-AWAY = Duration of all interruptions 2 as proportion of potential work years. 3 EXPER = Number of potential work years minus minus duration of all interruptions. 4 EXPERSQ = The square of EXPER FT = 1 if the jobs the person has worked at have usually or always been full-time jobs; otherwise. The interruption variables were specified in the above form because it was hypothesized that earnings could be affected by the existence of an interruption as well as by the length of an interruption. Because interruptions due to un- employment or disability had a relatively small effect on the proportion of potential work years spent away from work, they were entered as zero-one dummy variables. Because interruptions for familial reasons had a very strong effect on the amount of time spent away from work, they were allowed to enter the equation through their effect on the TIME-AWAY variable. The general experience variable, EXPER, was entered in its own form as well as in its squared form, EXPERSQ. The inclusion of the squared form was intended to capture the nonlinear effect of experience on earnings. (The returns to ex- perience tend to flatten after some point.) The education variables included in the re- gression were designed to take advantage of the ISDP personal history questions on highest degree obtained, vocational training, and types of courses taken in high school. They included the following: EDUC1 = With an advanced degree EDUC2 = With a bachelors' degree EDUC3 = High school graduate (reference > group) EDUC4 = Not a high school graduate, with a vocational training certificate EDUC5 = Not a high school graduate, no voca- tional training certificate COURSES = Number of selected academic courses completed in high school Finally, a set of variables representing mari- tal history were included: MARR1 = Married, no marital disruption (reference group) MIVRR2 = With a marital disruption (ever widowed, divorced or separated) MIVRR3 = Never married The basic results of the survey are pre- sented in tables A and B, the means for all variables are shown in table C and the regression results are shown in table D. Results are shown for White women and men as well as for all women and men in order to facilitate comparisons with previous studies. Results are also shown for men and women 30 years of age and over with no familial inter- ruptions as an alternative method of examining the influence of work interruptions. The large differences between the sexes in the degree of work attachment are highly visible in table C. Men had, on the average, about 19 years of work experience and had spent only about 2 percent of their potential work years away from work. Women, on the other hand, had 14 years of work experience and had spent about 20 percent of their potential work years away from work. There were small or insignificant differences between men and women in the mean values of the other experience and interruption variables and in the mean values of most of the education and marital history variables. Men, however, were more likely than women to have received advanced degrees and a larger pro- portion of women than men experienced. The average hour earnings of all women was $4.38, about 63 percent as high as the average hourly earnings of $6.92 for all men. The regression results confirm the importance of experience as a determinant of earnings. The general experience variables EXPER and EXPERSQ are highly significant for both men and women and are important relative to other variables in the determination of hourly earnings. Attachment to fulltime work also has a signifi- cant effect on earnings. The coefficients of the experience variables show that the returns to experience are greater for men than for women . The interruption variables, in general, have a negative effect on earnings, but the effect is not particularly strong or consistent. The coefficient of TIME -AWAY is significant for both men and women in the equation for persons of all races, but is significant for women only in the equations for White men and women. Interruptions due to illness or disa- bility (DISAB) have a significant negative effect on earnings in five of the equations, but interruptions due to inability to find work have a significant negative effect in only two of the equations. That an earnings equation contains both experience and interruption variables that are significant is evidence that a depreciation effect does exist. In the equation for men of all races, the experience variables EXPER, EXPERSQ, and FT are highly significant and the interruption variables UNEMP and DISAB are also significant. In the equations for women of all races, the experience variables and the interruption variable TIME-AWAY have highly significant effects on earnings. The conclusion is that a depreciation effect does exist and information about work interruptions will improve those models which attempt to explain earnings. The coefficents of the education and marital history variables are in line with expectations, but two findings should be noted. First, the coefficient of EDUC4 for men is less negative than the coefficient of EDUC5. This finding suggests that a vocational training certificate has a positive effect on earnings. Second, the coefficient of COURSES is highly signifi- cant even though other measures of educational attainment are also present in the equation. That is, for the purpose of explaining earn- ings, it is important to know about the types of courses taken in high school even when we already have information about years of school completed and highest degree obtained. Table C shows that the mean earnings of women are only about 62 percent of the earnings of men even when the group under study is com- prised of persons 30 years of age and over with no familial interruptions. This dif- ferential exists even though women in this universe have approximately the same mean years of experience as men. Table D shows why the large differential exists even when the mean values of experience are so close. Among the men in this group, the coefficient of EXPER is highly significant, but among women, the coefficient is not significant. In general, standardized regression co- efficients reveal that the work interruption variables are less important than either the general experience or education variables as determinants of earnings. This holds true for both men and women. So, while the work inter- ruption variables do show that a depreciation effect exists, general work experience and education are more critical determinants of earnings. The earnings equations which have been devel- oped for this report can be used to examine the extent to which differences in work history (experience and interruptions) are related to the earnings gap between men and women. That is, given the coefficients of their own equa- tion, what would the earnings of women be if they had the same mean values as men for the variables measuring experience and interrup- tions. Table E shows that the effect of assigning to women the mean experience and interruption values of men is to reduce the earnings gap by only 12 percent. Problems in retrospective measures of work ex - perience and work interruptions One of the goals of SIPP is to develop a data base that can be used to investigate the relationships among income, program participa- tion, and personal history including work history. A certain amount of work history will be obtained as persons are followed during their time in the panel, but persons spend only 2 1/2 years in the panel. Some work history data may be obtained by matching survey records with Social Security earnings records, but matching records takes time and the amount of work his- tory data that can be obtained from Social Security records are limited. Until 1978, the Social Security record contained information on earnings during a quarter which were subject to the Social Security tax. Therefore, if a person's earnings met the Social Security tax limit in the first quarter of the year, no ear- nings data would appear for the remaining quarters. Since 1978, the record contains an- nual data on covered and noncovered earnings. When the personal history supplement was designed for the third wave of the 1979 ISDP, the problem was to develop a set of questions that could be completed in 2 or 3 minutes and that would provide an indication of lifetime work attachment. The approach adopted was to attempt to identify periods lasting 6 months or longer when the person did not work. The ISDP work history questions are reproduced in Figure A. There are obviously very great problems in trying to measure lifetime work experience in a brief set of questions. The data from these questions do have a considerable amount of face validity, but it seems reasonable to suppose that the data are also characterized by response problems. One way of identifying possible problem areas is to cross-classify current age by age at first reason-specific interruption. If there is no significant memory loss, then one would expect that the proportion of persons reporting that a first reason-specific interruption took place while they were in a particular age interval would be independent of their current age. Since a cross-classification shows that memory loss is a significant factor in the reporting of first interruptions due to an inability to find work. Persons 21 to 29 were much more likely than older persons to report that such an interruption occurred before their 25th birthday. There is some evidence of memory loss in the reporting of first-time interrup- tions due to disability, but not to the same degree as interruptions due to an inability to find work. There is no evidence of memory loss in the reporting of first-time inter- ruptions of female interruptions for familial reasons. (The above conclusions are based on the assumption that the age groups had similar experiences.) The ISDP results were taken into considera- tion when the time came to design the SIPP ques- tions on work history. In an effort to reduce the problem of memory loss, respondents were asked to begin with the earliest 6 month inter- ruption and work forward. The sequence also attempted to determine the total number of interruptions, then, for each period of inter- ruption, determine the duration of and reason for the interruption. Because the SIPP se- quence asks for the total number of interrup- tions and contains a "Don't Know" box for dura- tion of interruption, we expect to be able to do a better job of imputing for item nonre- sponse. Future work We have finished the field collection oper- ation for the third wave of SIPP, the wave con- taining the work history data and are in the process of designing processing specifications so that a file can be prepared which contains no item nonresponse. 57 There are some differences between the work history data collected in ISDP and the SIPP work history data. First, the SIPP sample size is 20,000 households, about twice the size of ISDP. Second, the SIPP data on beginning and ending dates of interruptions should be more complete than similar data from ISDP. Third, unlike the ISDP third wave file, the SIPP file will have data on job and occupation tenure. The SIPP file should be somewhat more useful than the ISDP file, and should allow users to expand the analysis by considering other variables (e.g. job and occupation ten- ure) and by considering the timing of work interruptions not just their existence and du- ration. FOOTNOTES * Hourly earnings were calculated by dividing total earnings for the 3-month period by the total number of hours worked. 2 A maximum of four interruption periods could be identified for each of three possible reasons for interrupting. 3 Potential work years were defined as age minus years of school completed minus 6. 4 The ISDP data on employer-specific or job- specific measures of work experience (e.g., tenure with most recent employer/at most recent job) were collected in the fifth wave of the survey and were not available for this study. REFERENCES [1] Corcoran, Mary. "Work Experience, Labor Force Withdrawals and Women's Wages: Empirical Results Using the 1976 Panel of Income Dynamics." Women in the Labor Market, pp. 216-245. Edited by Cynthia B. Lloyd, Emily S. Andrews, and Curtis L. Gilroy. New York: Columbia University Press, 1979. [2] Mincer, Jacob, and Ofek, Haim. "Inter- rupted Work Careers: Depreciation and Restoration of Human Capital." The Journal of Human Resources 17 (Spring 1982): 3-24. [3] Mincer, Jacob, and Polachek, Solomon. "Family Investments in Human Capital: Earnings of Women." Journal of Poli- tical Economy 82 (March/April 1974): S76-108. [4] Polachek, Solomon William. "Potential Biases in Measuring Male-Female Discri- mination." Journal of Human Resources 10 (Spring 1975): 205-229. [5] Sandell, Steven H., and Shapiro, David. "An Exchange: The Theory of Human Capital and the Earnings of Women: A Reexamina- tion of the Evidence." Journal of Human Resources 13 (Winter 1978): 103-117. [6] Suter, Larry E., and Miller, Herman P. "Income Differences Between Men and Women." American Journal of Sociology 78 (January 1973): 962-974. 58 Table A. Work Interruption History by Race, Spanish Origin, ani Selected Characteristics: Males Characteristic Total (In thousands Percent with one or more Interruptions lasting 6 months or more due to— reasons surveyed Inability to find •work Illness or disability Mean percent of potential work years spent away from work for reasons surveyed Males 21 to 64 years who ever worked RACE AND SPANISH ORIGIN* White Black Spanish origin YEARS OF SCHOOL COMPLETED Less than 12 12 to 15 16 and over AGE BY YEARS OF SCHOOL COMPLETED 21 to 29 years Less than 12 12 to 15 16 and over 30 to 44 years Less than 12 12 to 15 16 and over 45 to 64 years Less than 12 , 12 to 15 16 and over Professional, technical or managerial Sales or clerical Craftsmen Operatives Laborers Service 49,381 5,627 3,220 14,171 29,761 11,896 16,048 2,314 10,104 3,630 19,106 3,809 10,278 5,019 20,674 8,049 9,378 3,247 15,040 6,621 12,825 10,254 5,832 3,457 24.2 40.2 34.9 40.1 24.7 11.0 20.5 40.7 20.8 6.9 23.4 36.7 25.2 9.6 11.9 41.6 28.5 17.8 14.7 20.6 28.8 32.5 37.9 25.5 15.2 35.0 22.7 24.9 17.3 7.9 18.0 35.5 18.5 5.5 16.2 24.5 17.8 6.5 17.7 22.1 15.6 12.7 10.2 13.8 18.7 20.8 27.6 14.8 2.1 1.5 1.5 1.7 .8 3.0 10.7 10.7 15.8 20.3 9.3 3.4 7.4 8.5 18.2 8.0 2.0 18.2 25.1 17.1 4.3 13.6 11.1 2.3 2.3 2.9 3.9 5.6 4.1 0.1 0.3 0.3 0.4 0.2 0.2 0.2 0.2 0.2 1/Persons of Spanish origin may be of any race* 59 Table B. Work Interruption History by Race, Spanish Origin, and Selected Characteristics: Females Total (1n thousands) Percent with one or lasting 6 months more Interruptions or more due to- Mean p potent years s from w reasons srcent of al work 3ent away Drk for Characteristic All reasons surveyed Inability to find work Family reasons Illness or disability surveyed Value Standard error Females 21 to 64 years who ever worked RACE AND SPANISH ORIGIN! White 57,258 49,812 6,402 3,014 13,740 34,805 8,713 16,804 1,948 11,650 3,206 19,445 4,060 12,366 3,018 21,011 7,733 10,789 2,489 11,723 23,782 8,447 950 10,543 71.9 73.0 63.1 75.0 79.5 73.3 54.3 53.1 70.6 56.8 29.1 77.5 79.8 79.8 65.1 81.7 81.5 83.7 73.8 61.0 75.2 74.8 78.3 74.2 14.2 12.4 27.4 23.6 21.7 12.7 8.6 17.0 23.2 18.4 8.3 12.3 20.4 9.9 11.4 13.8 22.0 9.8 5.6 9.5 10.7 22.2 21.9 19.7 64.1 66.8 43.8 62.4 68.5 66.3 48.6 42.5 61.7 44.9 22.2 72.3 73.6 75.3 58.5 73.8 67.6 79.0 70.6 55.4 69.4 62.9 67,8 63.4 9.2 8.3 17.5 12.9 20.1 6.6 2.6 3.5 5.9 4.1 .1 6.6 12.3 5.7 2.9 16.1 27.8 10.3 5.4 5.4 5.9 14.5 15.1 16.5 30.9 32.7 17.6 27.6 33.5 31.5 24.2 20.7 30.9 22.3 8.9 34.3 34.2 34.8 32.2 35.8 33.7 37.7 34.3 24.4 33.8 29.4 39.7 32.1 0.2 0.2 Black 0.4 0.6 YEARS OF SCHOOLING 0.3 12 to 15 0.2 0.4 AGE BY YEARS OF SCHOOL SCHOOL COMPLETED 0.3 0.8 12 to 15 0.3 0.5 0.3 0.6 12 to 15 0.3 0.7 0.3 0.4 12 to 15 0.4 0.8 OCCUPATION GROUP OF USUAL JOB Professional, technical, 0.3 Sales or clerical 0.2 0.4 1.2 0.4 1/Persons of Spanish origin may be of any race. 60 ""■ 2 01 C O) — O—'iC «■ c* - o "-'csj \e a\ • © «-« OONOSlfiCO - csj lo r*. oo oo lOonNoi »-< evj i vo —> to cm va *»-Ofs 1 >,*. *» f— — evi ro c7>t— o io ec a: f- a en t- — < < coo t- <- XZ t-= _| 61 Table 0. Coefficients of Regression of Log of Hourly Earnings on Specified Explanatory Variables (Standard errors 1n parentheses) Variable Employed males Employed females Employed White males Employed White females Males 30 and over with no familial Interruptions Females 30 and over with no familial Interruptions UNEMP -.039 (.018) .002 (.018) -.028 (.021) .002 (.021) -.078 (.018) .014 (.041) DISAB -.125 (.023) -.040 (.028) -.144 (.025) -.088 (.032) -.143 (.025) -.183 (.044) TIME -AWAY -.312 (.122) -.128 (.025) -.068 (.145) -.155 (.028) (X) (X) EXPER .03515 (.00175) .02278 (.00184) .03791 (.00189) .02495 (.00200) .03382 (.00306) .00937 (.00600) EXPERSq -.00058 (.00005) -.00042 (.00005) -.00065 (.00005) -.00046 (.00005) -.00056 (.00005) -.00014 (.00012) FT .216 (.032) .112 (.016) .254 (.035) .099 (.018) .363 (.064) .372 (.048) EDUC1 .336 (.023) .358 (.028) .338 (.023) .322 (.030) .327 (.028) .301 (.053) EDUC2 .179 (.016) .218 (.018) .181 (.018) .209 (.021) .231 (.021) .260 (.046) EDUC4 -.069 (.039) -.146 (.048) -.002 (.044) -.120 (.067) -.026 (.044) -.415 (.092) EDUC5 -.195 (.016) -.190 (.018) -.173 (.016) -.179 (.018) -.185 (.018) -.244 (.035) COURSES .038 (.005) .044 (.005) .034 (.005) .052 (.005) .045 (.005) .070 (.009) MARR2 -.023 (.014) .016 (.014) -.038 (.014) .038 (.014) -.009 (.016) -.035 (.030) MARR3 -.192 (.016) -.009 (.018) -.141 (.018) -.008 (.018) -.279 (.030) .029 (.035) Constant 1.318 1.112 1.282 1.098 1.172 .993 R2 .24 .18 .22 .19 .20 .28 - Represents zero. X Not applicable. n o ^JC o O fc» — O CIl- t/1 > OJ O en — £ s s s 3 re 0) .e -w re _ — f— «* o OJ •«- S a. s c .*-> co. = 3 4-> — Q. re 3 3 «_ O C 3 OIU S U IO U L. (_ 5 C OJ C C C Ol O o ^ X n. a -a C OJ "re -o c >>«- c c c OJ re < 11 c* JZ Oi <0 c "re "e c = c o U «*) to Q. OJ 3 0J * O *J OJ ■■- c • c e c oj c oj 0) ai c- *j oj — <»- X 2 £ t- • Q. 5C 2 .c — IS. ii So II |.£ o t_ o c C C C (O *C C 3 I- -3 l_ "5 S".» >. • 3 0) 3 <— O— O £> s t- = ™ OJ — 63 PANEL SURVEYS AS A SOURCE OF MIGRATION DATA Donald C. Dahmann, U.S. Bureau of the Census PREVIOUS USES OF PANEL SURVEYS FOR GEOGRAPHICAL MOBILITY RESEARCH Panel surveys, the technique of using repeat- ed interviews with a sample of individuals that remains constant, have been employed as a re- search strategy to assist in understanding processes with an inherent capacity for change for at least half a century. Virtually all of the earliest uses of panel analysis were employ- ed to investigate change in opinions, and particularly in preferences for political cand- idates (Levenson, 1978; Rice, 1928; Lazarsfeld and Fiske, 1938; and Berelson, Lazarsfeld, and McPhee, 1954). From these beginnings, various forms of panel analysis, and more generally longitudinal analysis, have come to be utilized in a wide variety of social science research applications, and recently were singled out by the Office of Federal Statistical Policy and Standards as an approach to data collection and analysis to which "much attention should be devoted... in the development of [federal] statistical programs for the 1980's" (U.S. Department of Commerce, 1978; p. 321). The real blossoming of panel surveys, both in terms of numbers of surveys and their size, occurred during the past two decades (and prin- cipally during the 1970s). They were employed as a means of monitoring and evaluating the effects of a variety of federally sponsored large-scale experiments and programs, and to enhance our understanding of the factors involved in a variety of social and economic processes, e.g., educational attainment, soc- ialization, and labor force participation. The principal federally sponsored experi- ments including panel analysis components were the several Income Maintenance Experiments, the various Experimental Housing Allowance programs, and the Urban Homestead ing Demon- stration. Uses of panel analysis in each of these programs for geographical mobility research are discussed in the next section. Several large-scale panel surveys were also initiated during this period in response to the need for basic information on educational attainment, labor force participation, and social stratification processes (Borus, 1982). Major surveys in this second group include the National Longitudinal Survey of Laobr Market Practices, begun in 1966 and sponsored by the U.S. Department of Labor with distribution through Ohio State University's Center for Human Resource Research (popularly referred to as the Parnes data; Parnes, 1974; Center for Human Resources Research, 1974; Bielby, Hawley, and Bills, 1977; Parnes and Rich, 1980; Leigh, 1982; Daymont, 1983); the Panel Study of Income Dynamics, conducted by the University of Michigan's Institute for Social Research for the U.S. Department of Health and Human Services since 1968 (Duncan and Morgan, 1982); Project TALENT, conducted by the American Institutes for Research (Wise and Steel, 1980); Youth in Transition Project, conducted by the University of Michigan's Institute for Social Research (Bachman and O'Malley, 1980); National Longitudinal Survey of High School Seniors, conducted by Research Triangle Institute for the National Center for Education Statistics (Eckland and Alexander, 1980); the Wisconsin Longitudinal Study (Sewell and Hauser, 1980); and Explorations in Equality of Opportunity, initiated by the Educational Testing Service (Alexander and Eckland, 1980). Questions that these latter surveys were designed to address rarely included geograph- ical mobility as a major topical area. Much attention in terms of questionnaire design or questions asked however, was not required as geographical mobility data flowed as a natural outcome of the follow-up of panel members. Thus, general information on geographical mobility derived of the panel design was suf- ficient for relating movement with labor force participation, educational attainment, social and occupational stratification processes, household changes with passage through the life course, shifts in housing consumption, etc. The Experimental Housing Allowance Programs and the Income Maintenance Experiments, on the other hand, were both specifically interested in geographical mobility as an integral element of evaluating the effectiveness of the programs. Both programs, for instance, were concerned with the effect upon, and role of residential mobil- ity in relation to patterns of housing consump- tion, and the Income Maintenance Experiments were further concerned with the role of migra- tion as a response to income guarantees among lower-income households. The fact that infor- mation on the various forms of geographical mobility flowed directly from the panel design of these surveys without specific geographical mobility questionnaire items demonstrates the capacity of panel surveys to assist our under- standing the role of geographical mobility in a variety of social and economic circumstances. Experimental Housing Allowance Program The Experimental Housing Allowance Program, initiated by the Housing and Community Develop- ment Act of 1970, was undertaken to establish empirical evidence of the effects of housing allowances, and of the transfer of small amounts of unrestricted funds to lower-income households on housing consumption. The follow- ing questions highlight some of the experi- ment's major research goals. Would households spend the money on housing? Would the money be used to improve conditions of their current dwelling? Would households move to other neighborhoods? What would be the local housing market impact of such an infusion of funds. To answer such questions, three programs — Housing Allowance Supply Experiment, Housing Allowance Demand Experiment, and Administrative Agency Experiment — eventually enrolled more than 30,000 households at twelve sites throughout the country at a cost in excess of $160 million (Friedman and Weinberg, 1982; 1983; Bradbury and Downs, 1981; Struyk and Bednick, 1981). Evaluation of these individual programs pro- duced several longitudinal analyses, but only one that utilized individual households as its 65 unit of observation. Other longitudinal anal- yses took housing units and neighborhoods as their units of observation (Hillestad and McDonald, 1983), or obtained retrospective geographical mobility data for individuals (McCarthy, 1983). Individuals in both control and test groups of the Housing Allowance Demand Experiment were traced for three years to ob- serve, among other things, actual patterns of Residential mobility and what changes, in terms of housing consumption and residential disper- sion (and therefore desegreation) , may have resulted from such moves (Rossi, 1981; Hamilton, 1983). Similar panel analyses were utilized in evaluating the effects of one other important housing experiment initiated at about the same time, the U.S. Department of Housing and Urban Development's Urban Homesteading Demonstration. This effort, begun as a demonstration and later established as a regular program, transferred HUD-owned properties to local control in 23 cities (U.S. Department of Housing and Urban Development, 1977). Its research agenda in- cluded inquiry into the effects of residential mobility on patterns of local housing consump- tion, this time with specific reference to the displacement of low income households from housing they could no longer afford as the result of the HUD owned properties being returned to the market (Schnare, 1979). Income Maintenance Experiments Another of the recent massive social experi- ments that included panel analysis as a re- search component is the series of income main- tenance experiments, the first large scale social experiments to be conducted in the United States, (begun in 1967) — New Jersey Income Maintenance Experiment (Kershaw and Fair, 1976; Watts and Rees, 1977; Pechman and Timpane, 1975), Rural Income Maintenance Ex- periment (Bawden and Harrar, 1975; Palmer and Pechman, 1978), Gary Income Maintenance Experi- ment ( Journal of Human Resources , 1979), and the Seattle and Denver Income Maintenance Experiment ( Journal of Human Resources , 1980). Each of these four programs addressed various aspects of one basic issue: how much would a nationwide guaranteed income cost, and to what extent would families reduce their labor force participation (and therefore earnings) in re- sponse to such payments? As with the massive housing allowance experiments, research agendas of the income maintenance experiments did not include geographical mobility as a specific focus. Nonetheless, the panel designs employed in the evaluations, which traced families (households) over a three-year period, produced their own mobility data. Analyses were under- taken of both of the two major forms of geo- graphical mobility: (1) migration, specifically rates of movement from the experimental site to other labor markets and, (2) residential mobi- lity, change of one's dwelling to consume a different bundle of housing services (quality of dwelling, neighborhood services, etc.). Data to investigate both of these extremely important outcomes of the decision to move were readily derivable from the panel design of the surveys used to evaluate the experiments. Findings with regard to migration (spatial adjustments in labor force participation) may be summarized as follows: (1) migration out of the experimental site's labor market was signi- ficantly increased for married white males and females but not for married black males and females and, (2) outmigration was to locales with generally lower wage rates and with better living environments. Work hours in the new locations were generally less than previously, suggesting either that persons worked fewer hours in their new locations because of their additional income or that their search for a "satisfactory" job in the new locale took some time (Keeley, 1980). With regard to residential mobility, it was discovered that (1) households moving to a new address generally improved their housing situa- tion (Wooldridge, 1977; Kaluzny, 1979), and (2) the effects of income assistance as a means of enabling renter households to move into a home of their own were mixed (Wooldridge, 1977, Poirier, 1977). Panel Study of Income Dynamics A more archetypical panel study, at least in traditional terms, is the Panel Study of Income Dynamics (PSID) conducted by the Institute for Social Research of the University of Michigan for the U.S. Department of Health and Human Services. Initiated during the same period — Great Society Era of the 1960s— as the Income Maintenance Experiments, this panel survey is now in its 17th year of collecting annual information from a representative national sample of about 6,000 families and 15,000 individuals (Morgan and Smith, 1969). The Survey has produced a massive body of data, a massive array of findings (Duncan and Morgan, 1982,; Morgan, 1974; Duncan and Morgan, 1975-1980; and Hill, Hill, and Morgan, 1981), and even outlived its original sponsoring federal agency, the Office of Economic Oppor- tunity. Panel Study of Income Dynamics data have been utilized to address questions in both of the two major realms of geographical mobility research. It has also been utilized to con- sider more basic geographic mobility research questions — timing of moves through the life course, relationships between desires or expectations to move and actual movement, and other similar questions that are intrinsic to the geographical mobility processes. The rich set of personal attribute and attitudinal variables in the Michigan panel has enabled residential mobility research to be framed in behavioral terms, whereby households are seen as possessing specific desires and preferences with respect to moving. Structural elements of the participation system — income levels and purchasing power, housing costs, forced reloca- tion, etc. — in this frame of reference then serve to assist or hinder actual patterns of mobility, and therefore preference fulfillment (Roistacher, 1974; 1975; Goodman, 1974; Duncan and Newman, 1975; 1976; Newman and Duncan, 1979; and Newman and Owen, 1982). Use of these panel data have enabled researchers to examine the characteristics of migrants in various interregional migration 66 systems during the 1970s (Kim, 1980), patterns and consequences of repeat moves (Newman and Ponza, 1981), and to initiate structuring of general models of mobility (Morgan, 1977). Migration as an act resulting in the readjust- ments of labor markets has also been explored both in terms of causes, such as the effects of unemployment on movement (DaVanzo, 1978), and of consequences for individuals, in terms of income and occupational change (Harris, 1981). TOE SURVEY OF INCOME AND PROGRAM PARTIC - IPATION AS A SOURCE OF MIGRATION DATA The Survey of Income and Program Participa- tion (SIPP), a general purpose, large scale (25,000 household), national representative sample survey, has been undertaken by the U.S. Bureau of the Census primarily to provide: (1) improved data on the economic situation of individuals and households and (2) information on federal and state income transfer and social program participation. Individuals are inter- viewed every four months for the life of a panel. In the case of the first (1984) panel, this will result in a total of 9 waves of interviews for three — quarters of the panel and 8 waves for the remaining quarter. Initial interviews for the 1984 panel were conducted in October 1983 (with a reference period of July- September 1983); the final wave of interviews for the this panel will be conducted in May 1986 (with a reference period of January — April 1986). Current plans for a second (1985) panel call for a somewhat smaller sample size ( 20 , 000 households ) and eight waves of interviews, which will begin in January 1985. The earlier review of geographical mobility research topics explored with data from panel surveys presages the types of research we might expect from SIPP. In terms of duration of the panel, SIPP data will be much like those that were derived of the several large scale experi- ments — both cover a period not exceeding three years. In terms of geographical mobility re- search therefore, data from both sources may be used to explore change over only a relatively short period of time. Several SIPP character- istics make it a close relative of the Panel Study of Income Dynamics as well: (1) its sample is national (though larger) rather than being limited to selected settlements, and (2) more waves of interviews are being conducted, which will provide better data for establishing joint incidence of movement with other — life course, employment, etc. — events. The geo- graphical mobility research derived of the panel surveys discussed earlier suggests that attention must be given to two specific aspects of such surveys: first, the periodicity of waves and overall duration of the panel; and second, the substantive nature of data col- lected during each wave. SIPP is unique in the short span of time between waves — four months. This design characteristic makes it particularly valuable as a means of matching residential shifts and migration with other life events such as marriage, divorce, expan- sion and contraction of household size in general, loss of job, change in job, and the like. No previous national survey has provided such a fine temporal scale for establishing the joint-incidence of geographical movement with important employment and life events. The fact that each panel collects data for 2 1/2 years presents both advantages and disadvantages. As SIPP panels will not be maintained for years unto decades, as have the several major panel surveys focusing on changes in labor force participation, educational attainment, and social mobility through the life course, analysis of the role, conse- quences, and duration of effects of geograph- ical mobility through major stages of the life course is not feasible. Nonetheless, 2 1/2 years (plus the fact that a large number of waves will be conducted) is quite sufficient to establish both immediate and some intermediate- term effects associated with geographical mobility on topics of concern such as the spatial restructuring of labor markets. The duration of SIPP panels also provides a reason- able amount of time to relate the -expectations of individuals toward mobility to actual pat- terns of movement. Once we have accustomed ourselves to the fact that the act of tracking those who move in a panel survey provides migration data, then what must be considered in addressing geograph- ical mobility questions is the basic substance , of the questionnaire administered prior to and following the move. In the case of SIPP we are provided with a wealth of relevant migration- related information: labor force participation and employment, industry and occupation, work history, education, health conditions and disability, household composition, and, of course, income. As the same questionnaires are administered at the same times to nonmovers, the opportunity exists for comparing the situa- tions of movers and nonmovers directly. In consideration of these several points, what should we be looking to SIPP for in terms of geographical mobility research? First, I think that we can expect better data. For decades the Current Population Survey (CPS) has served as our national metric establishing levels of movement among the various components of the nation's settlement system, among regions, and among subpopulations of the nation's peoples. All CPS geographical mobil- ity data are collected retrospectively — some- times asking respondents to refer to an event that occurred one year ago, sometimes five years ago. These data, like all retrospec- tive data, are subject to biases introduced by the distorting effects of memory loss, disson- ance reduction (rationaliazation) , and the like. How does SIPP data compare with CPS data? Will SIPP's multiple waves of data col- lection enable us to specify the overall effects of repeat movers on mobility statistics in a way that cross-sectional data do not? What will differences between the two survey's geographical mobility data tell us? The fact that information on movement (and non-movement ) and a wide array of life events will be col- lected almost as they occur (specific to within four months) is one of SIPP's best fea- tures from the perspective of geographical mobility research. Our capacity to specify the relationships between movement and such events as the loss of a job, termination of the receipt of unemployment benefits, marriage, divorce, etc. has never been better. A set of supplemental migration questions, which should be administered to all individuals for a least one (preferably early) wave of in- terviewing, should also be considered. First, respondents should be asked a set of mobility preference questions, to relate desires and expectations of movement with patterns of actual mobility. Secondly, retro- spective questions on one's general residential history should be asked so that subsequent moves may be related to previous patterns of mobility and locations. One further aspect of SIPP's design that should not be overlooked when thinking about geographical mobility research is its capacity to provide information on the locales of origin and destination of movers (and nonmovers as well). The ability of SIPP to provide infor- mation on conditions in both the labor markets that migrants leave, and those to which they move, is of particular concern when wishing to understand the spatial differentiation of labor markets and to ascertain the causes of subnati- onal (regional) patterns of employment growth and decline. What, for instance, are the re- lationships between sending and receiving markets in terms of unemployment rates, wage levels, etc.? Are these structural situations consequential in terms of employment? Are dif- ferent mechanisms operating for blue collar and white collar migrants that such differences articulate? These are some of the questions that should guide attempts to maximize the utilization of SIPP data for geographical mobility research. CONCLUSION The Survey of Income and Program Participa- tion enables us to explore new questions con- cerning both of the major forms of geographical mobility — residential mobility and migration. With particularly good income and public program participation data, good specification of the timing of movement with significant life events, and (potentially) good market characteristics data, SIPP is ideally suited to address a multitude of housing consumption questions. With good information on participation in federal and state sponsored programs, excep- tionally good income data, and (potentially) good information on the characteristics of labor markets, SIPP promises to be an incom- parable research tool for questions that have heretofore simply gone unasked regarding mi- gration, and particularly as it relates to readjustments of the spatial dimension of labor markets. We must also be prepared to take advantage of the serendipitous benefits of timing. In this regard, the availability of SIPP data and recent advances in analytical techniques pro- vide us with opportunities that were nonexis- tant even a decade ago. With regard to analy- tical techniques I am thinking particularly of those developed during the 1970s for analyzing categorical data (Goodman, 1978; Bishop, Fienberg, and Holland, 1975;), and their speci- fic application to the analysis of change in mobility and panel data (Hauser, 1979; Duncan, 1981; Fienberg, 1980). The richness of SIPP data provide a wonderful opportunity to fully utilize the analytical advances brought about by these techniques to answer a myriad of geographical mobility questions. The Great Society programs of the 1960s pushed social scientists as never before to ask questions about American society and its economy. In response to these demands, new and better data were collected, new analytical techniques were developed, and new research agendas established. Much was learned from these efforts about the causes, the roles performed by, and effects of geographical mobility on the nation's economic and social structure. SIPP represents a logical outcome of advances in social science data collection that began in the 1960s and an important new opportunity for geographical mobility research. I invite your comments on ways that we at the Census Bureau can enhance this new survey's utility for answering your geographical mobil- ity questions. REFERENCES Abt Associates. 1983. Annual Housing Survey National Longitudinal File [Codebook] . Cambridge, Mass.: Abt Associates. Alexander, Karl L. and Bruce K. Eckland. 1980. "The 'Explorations in Equality of Educational Opportunity' Sample of 1955 High School Sophomores." In Alan C. Kerckhoff. ed. Research in Sociology of Education and Socialization . Vol . 1 . Greenwich : JAI Press . Pp. 31-58. Bachman, Jerald G. and Patrick M. O'Malley. 1980. "The Youth in Transition Series: A Study of Change and Stability in Young Men." In Alan C. Kerckhoff. ed. Research in Sociology of Education and Socialization . Vol. 1. Greenwich: JAI Press. Pp. 127-160. Berlson, Bernard, Paul F. Lazarsfeld, and William N. McPhee. 1954. Voting: A Study of Opinion Formation in a Presidential Campaign^ Chicago: University of Chicago Press. Bilsborrow, Richard E. and John S. Akin. 1982. "Data Availability versus Data Needs for Analyzing the Determinants and Consequences of Internal Migration." Review of Public Data 10 (December): 261-284. Bishop, Yvonne M. M. , Stephen E. Feinberg, and Paul Holland. 1975. Discrete Multivariate Analysis: Theory and Practice . Cambridge : MIT Press. Bawden, Lee and William Harrar. eds. 1976. Rural Income Maintenance Experiment: Final Report . Madison: Institute for Research on Poverty, University of Wisconsin-Madison. Bielby, William T., Clifford B. Hawley, and David Bills. 1977. "Research Uses of the National Longitudinal Surveys." Special Report No. 18. Madison: Institute for Research on Poverty, University of Wisconsin. Borus, Michael E. 1982. "An Inventory of Longitudinal Data Sets of Interest to Economists." Review of Public Data Use 10 (May): 113-126. Bradbury, Katerine and anthony Downs. eds. 1981. Do Housing Allowances Work ? Washington, D.C.: Brookings Institution. Center for Human Resources Research. 1974. National Longitudinal Surveys Handbook . Columbus: Center for Human Resources Research, Ohio State University. DaVanzo, Julie. 1978. "Does Unemployment Affect Migration? — Evidence from Micro- data." Review of Economics and Statistics 60: 504-514. Baymont, Thomas and Paul Andrisani. 1983. "The Research Uses of the National Longi- tudinal Surveys: An Update." Review of Public Data Use 11: 203-310. Duncan, Greg J. and James N. Morgan. 1975- 1980. Five Thousand American Families — Patterns of Economic Progress . Volumes 3 to 8. Ann Arbor: Institute for Social Research, University of Michigan. IXincan, Greg J. and James N. Morgan. 1980. "The Incidence and Some Consequences of major Life Events." In Greg J. Duncan and James N. Morgan, eds. Five Thousand American Families — Patterns of Economic Progress . Ann Arbor: Institute for Social Research, University of Michigan. Pp. 183-240. Duncan, Greg J. and James N. Morgan. 1982. "Longitudinal Lessons from the Panel Study of Income Dynamics. Review of Public Data Use 10: 179-184. Duncan, Greg J. and Sandra J. Newman. 1975. "People as Planners: The Fulfillment of Residential Mobility Expectations." In Greg J. Duncan and James N. Morgan, eds. Five Thousand American Families — Patterns of Economic Progress . Volume 3. Ann Arbor: Institute for Social Research, University of Michigan. Pp. 279-318. Duncan, Greg J. and Sandra J. Newman. 1976. Expected and Actual Residential Mobility." Journal of the American Institute of Planners 42: 174-186. Duncan, Otis Dudley. 1981. "Two Faces of Panel Analysis: Parallels With Comparative Cross-Sectional Analysis and Time-Lagged Association." In Sanuel Linhardt. ed. Sociological Methodology 1981 . San Francisco: Jossey Bass. Pp. 281-318. Eckland, Bruce K. and Karl L. Alexander. 1980. "The National Longitudinal Study of the High School Senior class of 1972." In Alan C. Kerckhoff. ed. Research in Sociology of Education and Socialization . Vol. 1. Greenwich: JAI Press. Pp. 189-222. Ferber, Robert and Werner Z. Hirsch. 1978. "Social Experimentation and Economic Policy: A Survey." Journal of Economic Literature 16 (December 1978): 1379-1414. Fienberg, Stephen E. 1980. "The Measurement of Crime Victimization: Prospects for Panel Analysis of a Panel Survey." The Statistician 29: 313-350. Friedman, Joseph and Daniel H. Weinberg. 1982. The Economics of Housing Vouchers . New York: Academic Press. Friedman, Joseph and Daniel H. Weinberg, eds. 1983. The Great Housing Experiment . Urban Affairs Annual Review, Volume 24. Beverly Hills: Sage. Goodman, John L. 1974. "Local Residential Mobi- lity and Family Housing adjustments." In James N. Morgan, ed. Five Thousand American Fami- lies — Patterns of Economic Progress . Volume 2. Ann Arbor: Institute for Social Research, University of Michigan. Pp. 79-105. Goodman, Leo. 1978. Analyzing Qualitative- Categorical Data . Cambridge, Mass.: Abt Books. Hamilton, William L. 1983. "Economic and Racial /Ethnic Concentration." In Joseph Friedman and Daniel H. Weinberg, eds. The Great Housing Experiment . Urban Affairs Annual Review, Vol. 24. Beverly Hills: Sage. Harris, Richard J. 1981. "Rewards of Migration for Income Change and Income Attainment: 1968- 1973." Social Science Quarterly 62: 275-293. Hauser, Robert M. 1979. "Some Exploratory Methods for Modeling Mobility Tables and Other Cross-Classified Data." In Karl F. Schussler. ed. Sociological Methodology 1980 . San Francisco: Jossey Bass. Pp. 413-458. Hill, Martha S., Daniel H. Hill, and James N. Morgan . 1981. Five Thousand American Fami- lies — Patterns of Economic Progress . Volume 9. Ann Arbor: Institute for Social Research, University of Michigan. Hillstad, Carol E. and James L. McDowell. 1983. "Neighborhood Change." In Joseph Friedman and Daniel H. Weinberg, eds. The Great Housing Experiment . Beverly Hills: Sage. Journal of Human Resources . 1979. Issue devoted to "The Gary Income Maintenance Experiment." 14: 431-506. . 1980. Issue devoted to "The Seattle and Denver Income Maintenance Experi- ments." 15: 463-722. Kaluzny, Richard L. 1979. "Changes in the Consumption of Housing Services: The Gary Experiment." Journal of Human Resources 14: 496-506. Keeley, Michael C. 1980. "The Effect of a Negative Income Tax on Migration." Journal of Human Resources 15: 695-706. Kershaw, David and Jerilyn Fair. eds. 1976. The New Jersey Income-Maintenance Experiment . Vol. 1. New York: Academic Press. Kim, Joachul. 1980. "Characteristics of Mi- grants Within the Framework of Current Migra- tion Direction in the United States: Some Evidence from Micro-Data Analysis." Policy Sciences 12: 355-378. Lazarsfeld, Paul F. and Marjorie Fiske. 1938. "The 'Panel' as a New tool for Measuring Opinion." Public Opinion Quarter - ly_ 2: 596-612. Leigh, Duane E. 1982. "The National Longi- tudinal Surveys: A Selective Survey of Recent Evidence." Review of Public Data Use 10: 185-201. Levenson, Bernard. 1978. "Panel Studies." In William H. Kruskai and Judith M. Tanur. eds. International Encyclopedia of Statis- tics . New York: Free Press. Pp. 683-691. McCarthy , Kevin F. 1983. "Housing Search and Residential Mobility." In Joseph Friedman and Daniel H. Weinberg, eds. The Great Housing Experiment . Beverly Hills: Sage. Morgan, James N. ed. 1974. Five Thousand American Families — Patterns of Economic Progn s. Vols. 1 and 2. Ann Arbor: Institute for Social Research, University of Michigan. Morgan, James N. 1977. "Some Preliminary Investigations for a Model of Mobility." In Greg J. Duncan and James N. Morgan, eds. Five Thousand American Families — Pat - terns of Economic Progress . Volume 5. Ann Arbor: Institute for Social Research, University of Michigan. Pp. 349-368. Morgan, James N. and James D. Smith. 1969. A Panel Study of Income Dynamics: Study Design, Procedures, and Forms — 1968 Inter- viewing Year (Wave I) Vol. 1. Ann Arbor: Institute of Social Research, University of Michigan. Nelson, Dawn, David McMillan, and Daniel Kasprzyk. 1984. An Overview of the Survey of Income and Program Participation . SIPP Working Paper Series No. 8401. Washington, D.C.: U.S. Bureau of the Census. Newman, Sandra J. and Greg J. Duncan. 1979. "Residential Problems, Dissatisfaction, and Mobility." Journal of American Planning Association 45: 154-166. Newman, Sandra J. and Michael S. Owen. 1982. Residential Displacement in the U.S., 1970- 1977 . Ann Arbor: Institute for Social Research, University of Michigan. and Michael Ponza. 1981. "The Characteristics of Housing Demand in the 1970s: A Research Not." In Martha S. Hill, Daniel H. Hill, and James N. Morgan, eds. Five Thousand American Families — Pat- terns of Economic Progress . Ann Arbor: Institute for Social Research, University of Michigan. Palmer, John L. and Joseph A. Pechman. eds. 1978. Welfare in Rural Areas: The North Carolina-Iowa Income Washington, D.C.: Brookings Institution. Parnes, Herbert S. 1974. "The National Longitudinal Surveys: New Vistas for Labor Market Research." American Economic Review 65: 244-249. and Malcolm C. Rich. 1980. "Perspectives on Educational Attainment from National Longitudinal Surveys of Labor Market Behavior." In Alan C. Kerckhoff. ed. Research in Sociology of Education and Socialization Vol. 1. Greenwich: JAI Press. Pp. 161-188. Pechman, Joseph A. and P. Michael Timpane. eds. 1975. Work Incentives and Income Guarantees: The New Jersey Negative Income Tax Experi- ment . Washington, D.C. : Brookings Institu- tion. Poirer, Dale J. 1977. "The Determinants of Home Buying." In Harold W. Watts and Albert Rees. eds. The New Jersey Income-Maintenance Experiment . Vol. 3. New York: Academic Press. Pp. 73-91. Quantitative Methods in Knopf. 1974. "Residential Mobility: Planners, Movers and Multiple Movers." In Greg J. Duncan and James N. Morgan . eds . Five Thousand American Fami- lies — Patterns of Economic Progress . Vol. 3. Ann Arbor: Institute for Social Research, University of Michigan. Rossi, Peter H. 1981. "Residential Mobility." In {Catherine Bradbury and Anthony Downs, eds. Do Housing Allowances Work? Washington, D.C: Brookings Institution. Pp. 147-183. Rossi, Peter H. and Katherine C. Lyall. 1976. Reforming Public Welfare: A Critique of the Negative Income Tax Experiment . New York: Russell Sage Foundation. Schnare, Ann B. 1979. Household Mobility in Urban Homesteading Neighborhoods: Implica- tions for Displacement . Washington, D.C: U.S. Government Printing Office Sewell, William H. and Robert M. Hauser. 1980. "The Wisconsin Longitudinal Study of Social and Psychological Factors in Aspirations and Achievements." In Alan C Kerckhoff. ed. Research in Sociology of Education and Socialization . Vol. 1. Greenwich: JAI Press. Pp. 59-99. Struyk, Raymand J. and Marc Mendick, Jr. 1981. Housing Vouchers for the Poor: Lessons from a National Experiment . Washington, D.C: Urban Institute. U.S. Department of Commerce, Office of Federal Statistical Policy and Standards. 1978. "Longitudinal Surveys." In A Framework for Planning U.S. Federal Statistics for the 1980 's Washington, D.C: U.S. Government Printing Office. Pp. 321-325. U.S. Department of Housing and Urban Develop- ment. 1977. The Urban Homesteading Catalo- gue . 3 volumes. Washington, D.C: U.S. Government Printing Office. U.S. Department of Labor, Bureau of Labor Statistics. 1980. Using the Current Popu- lation Survey as a Longitudinal Data Base . Bureau of Labor Statistics Report No. 608. Washington, D.C: U.S. Government Printing Office Watts, Harold and Albert Rees. eds. 1977. The New Jersey Income-Maintenance Experiment . Vols. 2 and 3. New York: Academic Press. Wise, Lauress L. and Lauri Steel. 1980. "Educational Attainment of the High School Classes of 1960 Through 1963: Findings from Project TALENT." In Alan C. Kerckhoff ed. Research in Sociology of Education and Socialization . Vol. 1. Greenwich: JAI Press. Pp. 101-126. Wooldridge, Judith. 1977. "Housing Consump- tion." In Harold W. Watts and Albert Rees. eds. The New Jersey Income-Maintenance Experiment . Vol. 3. New York: Academic Press. Roistacher, Elizabeth. nomic Progress . Vol. 2. Ann Arbor: Institute for Social Research, University of Michigan. Roistacher, Elizabeth. 1975. "Residential 70 SIPP AND CPS LABOR FORCE CONCEPTS: A COMPARISON by Paul Ryscavage, Bureau of the Census Background The Survey of Income and Program Participation (SIPP) is a new Census Bureau survey designed to give policymakers, researchers, and the public an in-depth look at the economic situation of persons and households in the United States. Its primary purpose is to collect data on the kinds and amounts of income received by persons and the extent of their participation in government income transfer programs, such as Social Security and Aid to Families with Dependent Children. The full scope of SIPP as a source of information on the well-being of our society, however, is still being realized. One important byproduct of SIPP is information on the labor force activity of individuals. Work- ing or not working is frequently associated with one's economic situation and also one's partici- pation or nonparticipation in social welfare programs. An obvious illustration is the rela- tionship bet-ween job loss and the receipt of unemployment insurance payments. In the development of the SIPP labor force questions, an effort was made to make them con- ceptually similar to those in the Current Popu- lation Survey (CPS) which is the survey used to collect the Federal government's official labor force statistics. The CPS, in operation since 1940, was developed for the sole purpose of esti- mating the numbers of employed and unemployed persons in the country. At the core of the CPS labor force data is the "activity concept."!/ Basically, the concept amounts to identifying persons' activities in relation to the labor market during a specific period of time. In the CPS the period of time is one week. Persons in the adult population can then be sorted into three mutually exclusive groups depending on their activity during the week: working, not working but seeking work ,and neither working nor seeking work. While many refinements have been made to the activity concept and the operation of the CPS through the years, the keystone of the Nation's employment and unemployment estimates--acti vity during a specific reference week--has not not been changed. The concept and the CPS have been reviewed periodically by Presidential ly ap- pointed commissions to insure their soundness. The most recent review was by the National Com- mission on Employment and Unemployment Statis- tics (NCEUS) in the late 1970's.l' Although the Commission recommended some modifications of de- finitions used in the CPS, it pronounced the basic activity concept as sound.!/ Compared to the CPS, SIPP is in its infancy. Its genesis was the Income Survey Development Program begun in the mid-1970' s by the Department of Health, Education, and Welfare. 1' Despite its newness, SIPP has great potential for not only casting light on the nature of income dynamics, but also on how labor force activity is related to it. Indeed, the NCEUS suggested there was a need "to link labor force experience with income data" so as to add a qualitative dimension to labor force statistics.!/ SIPP data will show on a regular basis how well the labor market is pro- viding for the economic well-being of workers and their households. An obvious question among labor force analysts is how will the SIPP and CPS labor force data compare? Although we can't answer that question at this time because SIPP labor force data are still being processed, we can compare SIPP and CPS labor force concepts. ^1 More specifically, we can examine how the activity concept is applied in both surveys. We begin first by briefly re- viewing some of the survey design characteristics of the SIPP and CPS and then compare specific SIPP and CPS labor force definitions. A conclud- ing section of the paper discusses potential uses of SIPP labor force data. Survey Design Characteristics of SIPP and CPS Labor force analyses (as well as other kinds of analyses) are frequently limited because the data being analyzed come from surveys with unique survey design characteristics. For example, small sample size often creates difficulties for analysts. Three survey design features of SIPP and CPS which are important from an analytical standpoint are discussed below. Samples. Significant differences exist in the sample size and design of SIPP and CPS. SIPP is a longitudinal panel survey comprised originally of 26,000 households located in 174 areas around the country. The sample is divided into four ro- tation groups and households in each group are interviewed every four months for approximately two and one-half years. The first rotation group was interviewed in October 1983, and interviews were conducted in the second, third, and fourth rotation groups in November, December, and January, respectively. This staggered sample de- sign produces a cycle or wave of interviewing and takes four months to complete after which the ro- tation groups are reinterviewed in the same se- quence. The Census Bureau plans to introduce another panel of approximately 20,000 households in January 1985 and another 20,000 household pan- el in January 1986. Consequently, SIPP's sample size will grow as panels are overlapped, increas- ing the reliability of the estimates. The CPS is basically a cross-sectional surey, but it also has a longitudinal dimension.!/ It is a much larger survey being composed of 60,000 households located in 629 areas across the coun- try. The CPS sample is divided into eight rota- tion groups but unlike the staggered sample de- sign of SIPP, all rotation groups are in opera- tion in a single month. The longitudinal aspect of CPS results from the rotation group pattern in which a household is in the sample for four con- secutive months, out for eight and then back in for four more months. This pattern allows three- quarters of the households to be the same from month-to-month and one half to be the same over the year. This is important because labor force analyses of CPS data conducted by the Bureau of Labor Statistics (BLS) concentrate on month-to- month and year-to-year changes. Two problems for labor force analysts who use household survey data are biases resulting from the sample's design and from interview nonre- sponse. Rotation group bias has always been a problem in the CPS and it has received much at- tention over the years.!/ Theoretically, each CPS rotation group should produce the same estimates, except for random differences due to sampling variability. The estimate of unemployment from the first rotation group, however, is usually greater than the estimate based on all rotation groups.!/ (Recently, the difference has averaged about six percent.) The reason for the difference has never been isolated. Because all SIPP rota- tion groups in a SIPP panel have been in the sample for the same amount of time, this type of bias will not be immediately observable. It will be possible to observe after the introduc- tion of the 1985 SIPP panel in January 1985, how- ever, since rotation groups of different sample ages will then be in operation at the same time. A second bias problem involves survey nonre- sponse--unit or total nonresponse and item non- response. In the CPS, the unit noninterview rate has hovered around the four to five percent mark in recent years; item nonresponse rates vary by item, but in the March CPS income questions gen- erally have the highest nonresponse rate. 12/ The Census Bureau has developed noninterview adjust- ments and imputation schemes for dealing with these problems. While the first panel in SIPP is less than a year old, it appears that the unit noninterview rate for the first SIPP interviews is about the same as in the CPS. (A cumulative noninterview rate will be available from SIPP as subsesquent waves of interviewing is completed.) Item nonresponse in SIPP is presently being in- vestigated at the Census Bureau.ll/ Because both labor force and income questions are asked at the same time, the quality of the SIPP labor force data may be affected. Survey eligibility and coverage . Respondent eli- gibility and coverage are somewhat different in SIPP and CPS. In SIPP all household members 15 years of age and over are eligible to be interviewed and all eligible persons are inter- viewed if present at the time of the interview. If an eligible person is not home, a "proxy" in- terview is obtained from a knowledgeable person, otherwise a return visit is scheduled. In the CPS the age of eligibility is 16 years and over (data are also collected for 14 and 15 year olds); one adult household respondent may answer the ques- tions for all household members. Telephone interviewing is also handled differ- ently in the two surveys. Telephone interviews in SIPP must have prior regional office approval, except in the case of information not obtained in the course of the interview. In the the CPS, telephone interviews are permitted in the second, third, fourth, sixth, seventh, and eighth month in which the households are in sample. Another difference concerns the treatment of the Armed Forces. In the monthly CPS, members of the Armed Forces living in households are not eligible for interview. In SIPP, however, such individuals are interviewed as long as they are stationed in the area and usually reside at the address visited. (Both surveys exclude inmates of institutions, such as persons in prisons or con- valescent homes.) Lastly, and most significant for many analy- ses, members of households in the SIPP sample who move between interviews are followed and further interviews attempted. Sample persons, however, are not followed when they have been institution- alized, become a member of the Armed Forces, move outside the United States, or move more than 100 miles from a SIPP sampling area. In the CPS, movers are not followed and this has been a con- straint on many longitudinal labor force analyses. 11/ Reference periods. A fundamental difference be- tween SIPP and CPS--one that will probably ac- count for differences in labor force estimates between surveys -- is the length of the reference period. CPS interviews are conducted in all rota- tion groups each month in the week containing the 19th and all questions about labor force activity are asked in reference to the previous week which contains the 12th, the survey week. (As will be discussed, this one week reference period is ex- tended to four weeks in the case of jobseeking.) Depending on the respondent's answers to the questions, household members are classified into one of three mutually exclusive groups, employed, unemployed, or not in the labor force. SIPP interviews are conducted in one of the four rotation groups each month during the first two weeks of the month. The labor force, income, and program participation questions relate to the four previous months. Indeed, the labor force questions actually refer to individual weeks dur- ing the four month period. During this time a person could have worked, looked for work, and been outside the labor force at different times. In other words, labor force classification in SIPP is not necessarily mutually exclusive as it is in the CPS.il/ Recall problems are potentially a greater problem in SIPP than in the monthly CPS since respondents are recalling activities over a much longer period. For persons with a marginal at- tachment to the labor market, for example, teen- agers, it may be very difficult to remember job market activities. Despite the long recall period in SIPP, it is not inordinately long. In the sup- plement to the March CPS, persons are asked about their labor market activities in the previous calendar year--a reference period extending back 3 to 15 months. 11/ The annual work experience statistics have been published by the BLS and Census Bureau for years. SIPP and CPS Labor Force Definitions Because the reference periods in SIPP and CPS are of different lengths, the activity concept is applied differently in both surveys. In the CPS, persons are asked a specific activity-type ques- tion relating to the week containing the 12th of the month. In sorting out the possible labor mar- ket-related activities into mutually exclusive groups, a priority scheme is necessary since some individuals may have been involved in more than one activity. The first or highest priority is assigned to working. As long as a person worked for pay or profit for one hour or more (or 15 hours or more without pay in a family operated business), the person is considered employed even though he or she may also have looked for work or gone to school or done something else during that week. The next highest priority is given to those persons who had a job during the survey week, but were temporarily absent from it. Although this 72 relaxes the activity concept slightly, it permits a more accurate counting of the numbers of per- sons with actual job commitments. These persons may have been on strike or ill or on vacation or absent for some other personal reason, but since they had a job to return to they are classified as employed. The third priority is assigned to persons whose activity was looking for work. If a person in the survey week neither worked nor had a job but looked for one at some time within the past four weeks (and was currently available to take one) he or she is considered unemployed. Once again the activity concept is relaxed to cover persons who may not have looked for work contin- uously because they were waiting to be recalled from layoff or were waiting to start a new wage or salary job within 30 days. These persons too would be classified as unemployed. All other in- dividuals not fitting into this classification scheme are considered to be not in the labor force. Accordingly, the Nation's civilian non- institutional population age 16 and over can be sorted into the familiar labor force categories shown below. CPS Labor Force Categories Civilian noni nstitutional population age 16 and over Civilian labor force Empl oyed Unempl oyed Not in the labor force In SIPP persons are not asked a specific act- ivity question relating to the previous four months because it is very possible (even more so than in the CPS) that an individual may have worked, looked for work, or done something else during the period. Instead, the initial question on the SIPP questionnaire concerns whether or not an individual had a job or business at any time during the previous four months. In other words, the activity concept is tied to an individual's either having or not having a job in the refer- ence period. For those who had jobs, subsequent questions are asked about how long persons had their jobs in the reference period, whether they had been absent from them and why, and whether they looked for work or were on layoff when they did not have jobs. For persons who did not have jobs during the entire period, questions are asked if they looked for work or had been on layoff and if so, for how long. Unlike the CPS where a priority scheme is re- quired to classify individuals into mutually ex- clusive labor force categories, in SIPP individ- uals may have experienced more than one labor force status in the four month reference period. For example, a person may have had a job for the entire period but was temporarily laid off for one month of the period; or a person may have had a job for two months and then quit to look for another one; or a person may have been out- side the labor force for a month, looked for a job for a month and then, having found one, worked for two months. Consequently, the SIPP labor force categories shown below reflect mul- tiple labor force statuses. SIPP Labor Force Categories Noninstitutional population age 16 and over.... Persons with some labor force activity in period With a job the entire period Worked all weeks Missed some weeks Spent time on layoff With a job during part of the period Spent time looking for work or on layoff No job during the entire period Spent time looking for work or on layoff entire period Spent time looking for work or on layoff some weeks Persons with no labor force activity in period A closer look at specific SIPP and CPS labor definitions is presented below. (The definitions are discussed under headings common in everyday usage and should not be construed as CPS-specific labor force terminology.) Employment . In both surveys, employment is gener- ally defined as working at a job or business for pay or profit at some time in the reference per- iod. A job is considered to be an arrangement for regular work for pay where payment is in cash wages or salaries, at piece rates, in tips, by commission, or in-kind (meals, living quarters, supplies received). A business is defined as an activity which involves the use of machinery or equipment in which money has been invested or an activity requiring an office or "place of busi- ness" or an activity which requires advertising. Payment may be in the form of profits or fees. Both surveys also consider persons to be employed when they have been absent from their jobs be- cause of illness, vacation, bad weather, labor dispute, and various personal reasons. Unpaid family work is considered employment when it con- tributes to the operation of a farm or business run by a member of the same household. In the CPS, unpaid family work must have lasted for 15 hours or more during the reference week, but in SIPP there is no hours restriction. Unemployment . The definitions of unemployment in both surveys are also very similar. In CPS, per- sons must have been without a job during the ref- erence week and in SIPP they must have been with- out a job for all or part of the reference per- iod; in addition, they must have been available for work, and taken some specific jobseeking activity. Job seeking activity in CPS may have occurred anytime in the previous four weeks, while in SIPP it may have occurred any time dur- ing the four months. If, in either survey, job seeking occurred when the person was working, working would take precedence and the person would be considered employed. Two exceptions to the above rule must be noted. The first is the case of the person who has a job but was laid off and the second is the person who is to begin a new wage or salary job within 30 days. Both persons are considered un- employed. In the CPS these persons must have been available for work, but in SIPP no availability test is applied. Because the CPS is basically a labor force survey, it collects more information about the spell of unemployment than SIPP. For example, CPS gathers information on reasons for unemploy- ment whereas SIPP does not. One can tell from CPS data whether a jobless person has become un- employed because of job loss, such as a layoff; quitting a job to search for another; entering the labor force for the first time; or re-enter- ing the labor force. In SIPP, the only group for whom the reason is known for being unemployed are those persons who report themselves as having jobs from which they are absent because of lay- off. The CPS also asks about the method of job search and how long one has been searching or on layoff. Labor force. The civilian labor force in the CPS is derived by adding the number of persons clas- sified as employed during the reference week to the number who were classified as looking for work or on layoff. The "total" labor force is de- rived by adding to the civilian labor force an independent estimate of the Armed Forces sta- tioned in the United States. The labor force in SIPP (which includes mem- bers of the Armed Forces living in households but not in installations of the Armed Forces) is is referred to as "Persons with some labor force activity." This represents the sum of persons who, during the four month reference period, may have been -- employed during all weeks, unemployed during all weeks, employed and unemployed during all weeks, employed and outside the labor force during all weeks, unemployed and outside the labor force during all weeks, and employed, unemployed, and outside the labor force during all weeks. In other words, any one with some contact with the labor market in the four month reference period. U nemployment rate . The unemployment irate from Fhe CPS i s one of the most well known statistics in the Nation. It is derived by dividing the number of unemployed persons by the civilian labor force (or total labor force). In SIPP a similar rate, or proportion, could be calculated. Unlike the CPS unemployment rate definition, however, the numerator in the SIPP definition is composed of persons who may have been unemployed during all weeks, employed and unemployed during all weeks, unemployed and outside the labor force during al 1 weeks , and employed, unemployed, and outside the labor force during all weeks. In other words, the numerator is composed of "Persons with some unemployment." Dividing the sum of these groups by persons with some labor force activity—the denominator--wi 1 1 yield ;he proport unemployment Not in the labor force. In both the CPS and SIPP, persons who have had no association with the job market during the reference period (in SIPP, for all or part of the reference period) are consi- dered outside the labor force. The CPS further identifies their major activity as in school, keeping house, unable to work, and so on. This is not done in SIPP. The CPS inquires in the fourth and eighth rotation groups about previous work experience, intentions to seek work again, desire for a job, and reasons for not looking. This makes it pos- sible to estimate the number of "discouraged workers." Discouraged workers in the CPS are de- fined as persons who want a job but are not seek- ing work currently because: 1) they believe no work is available in their line of work or area; 2) they could not find any work; 3) they lack the necessary schooling or training, skills, or ex- peience; 4) employers think they are too young or old; and 5) they have other personal handicaps in finding a job, such as transportation problems. An effort is made to identify discouraged workers in SIPP also, even though it is difficult to recall a state of mind. For those persons who did not work or look for work in at least part of the four month reference period but said they wanted a job and could have taken one, a question is asked as to why they were not looking. The reasons for not looking are very similar to those in the CPS questionnaire. Hours of work. In the CPS a question is asked about the number of hours some one worked during the reference week at all jobs. This question is asked of all rotation groups and includes workers who have more than one job. In addition, in two of the eight rotation groups a question is asked about the hours "usually" worked at the worker's main job. This information is part of the CPS data collected on workers' earnings. A similar set of questions is found in SIPP. Everyone who worked is asked about their usual weekly hours on all jobs during the four month period. Subsequent questions inquire about usual weekly hours for the primary job and any others. Full-time and part-time employment. Full-time em- ployment in both surveys is defined as employment of 35 hours a week or more while part-time em- ployment is anything less than 35 hours. Both surveys seek the reasons for part-time employ- ment, that is, whether it was due to economic reasons or other factors. Economic reasons in- clude slack work, material shortages, repairs to plant or equipment, start or termination of a job during the week, and the inability to find full -time work. "Other" reasons include labor disputes, bad weather, one's own illness, vaca- tion, keeping house, no desire for full-time work, and full time worker during only part of the season. In the SIPP questionnaire the reasons for part-time employment are not as numerous but it is still possible to identify some economic reasons for part-time employment. Uses of SIPP Labor Force Data SIPP was designed primarily as an income sur- vey and the data from it will be used to address issues related to income security and social wel- fare programs. With the inclusion of questions on labor force activity, however, this survey has potential for labor force analysis and topics related to it. In addition, because of SIPP's sample design both cross-sectional and longi- tudinal data can be obtained from the survey pro- viding analysts with more flexibility in their analyses. For example, it is possible to calcu- late monthly averages of the labor force data from SIPP waves since labor force activity is tracked (week -by -week) over a four month period; on the other hand, by linking all the SIPP waves it is possible to follow the labor force activity of individuals over two and one-half years. While the CPS will continue to be the primary source of information on the country's labor sup- ply and the current unemployment situation, SIPP labor force data will complement the basic CPS information in many ways. The following is a discussion of some of the applications of SIPP cross-sectional and longitudinal labor force data. Labor market related economic hardship. For many years economists have tried to measure the econo- mic hardship caused by labor market problems, whether they be demand oriented (unemployment due to insufficient jobs) or supply oriented (low wages because of insufficient skills and educa- tion). The economic literature contains many references to subemployment indices, employment and earnings inadequacy indices, and labor market hardship measures of one variety or another.!^./ The NCEUS in 1979 examined this subject and re- commended that the BLS publish an annual report "... containing measures of different types of labor market related economic hardship result- ing from low wages, unemployment, and insuffi- cient participation in the labor force. "iZ/ Using data from the March CPS, the BLS has produced such reports but they are not as comprehensive as they might be because of data limitations (for example, neither the hourly earnings for part- year workers nor the problems of discouraged workers are discussed. ).!§/ SIPP labor force and income data should be able to fill the gap. For example, one cross- sectional table specification might show employ- ment problems incurred by individuals cross- classified by their position in the household income distribution. Problems of unemployment, low hourly wages (below the Federal minimum), discouragement, and involuntary part-time employ- ment could be isolated to help in formulating applicable policies. This information, in combi- nation with income information, is available on a current basis only from SIPP. Labor mobil ity and turnover. Given the longitud- inal feature of SIPP's sample design, not only can the income flows and program participation activities of individuals be monitored for two and one-half years (and periods of shorter dura- tion), but so can their labor force activities. At the time of each SIPP interview, information is obtained on the labor force activity of each household member age 15 and older during the prior four months. Any changes in labor force status during this period are reflected in the data^ Stitching together the data collected in each of the eight or nine interviews will provide data users with a profile of labor market activity for a two and one-half year period. One change in labor force status that labor economists have been interested in recently is the one which occurs after a spell of unemploy- ment. Some have argued that many outcomes of spells of unemployment are withdrawals from the labor force and not reemployment. For example, two economists, using CPS gross flow data, esti- mated that 45 to 50 percent of all unemployment spells end by labor force withdrawal .Iz/ Other economists have argued that the relative short- ness of the average unemployment duration shows that persons can quite easily find their usual type of employment in a short period of time. ±9/ With SIPP labor force data it will be possible to identify job terminations, observe spells of unemployment, and determine not only their dura- tions, but their outcomes. SIPP labor force data should also be useful in calculating rates of job separation and acces- sion. The measurement of the amount of job sepa- ration and accession is an important element in understanding our basic employment and unemploy- ment statistics. Since the discontinuance of the BLS's labor turnover series, researchers have been hard pressed to find other data sources which would shed light on the dynamics of the labor market .11/ While it will not be possible from the SIPP labor force data to identify the precise nature of the separation (layoff, quit, discharge) or accession (new hire, recall), ag- gregate separation and accession rates could be calculated. These rates could be monitored over the business cycle. Summary SIPP is principally an income survey, but it contains questions on labor force activity as well. SIPP labor force data will supplement the labor force information from the CPS, the Federal government's official source of labor force sta- tistics. Like the CPS, SIPP uses an activity con- cept for sorting the Nation's population into those persons involved in the job market from those who are not. A major difference between the two surveys is the length of the reference periods; in the CPS it is one week and in SIPP it is four months. The different length of time for which labor market activities are surveyed will be an important factor in SIPP and CPS labor force comparisons. Nevertheless, while the CPS will continue to tell us how many persons are employed and unemployed, SIPP will be able to tell us how well the labor market is providing for these workers and their households. NOTE: In addition to the references footnoted, the SIPP Interviewer's Manual and CPS Interview - ers Reference Manual were Used i n the prepara- tion of this paper. 1/ For those interested in the origins of the activity concept, see John N. Webb, "Concepts Used In Unemployment Surveys," Journal of the American Statistical Association , March 1939, pp. 49-61. For a history of the CPS, see John E. Bregger, "The Current Population Sur- vey: a historical perspective and BLS' role," Monthly Labor Review , June, 1984, pp. 8-14. 1/ Sei Counting the Labor Force , National Com- mission on Employment and Unemployment Statis- tics, (U.S. Government Printing Office) Labor Day, 1979. An earlier review was made in the early 1960's. See Measuring Employment and Un - employment , President's Committee to Appraise Employment and Unemployment Statistics, (U.S. Government Printing Office) 1962. 1/ See Counting the Labor Force, p. 2. £/ See Martynas A. Yeas and Charles Lininger, "The Income Survey Development Program: Design Features and Initial Findings," Social Security Bulletin , November 1981, pp. 13-19. 3/ See Counting the Labor Force, p. 1. §J Labor force estimates from the ISDP were com- pared to t:he CPS by Bruce Klein. He found that employment estimates in ISDP were slight- ly higher than in CPS and unemployment esti- mates in ISDP were considerably lower than in CPS. One reason for the latter was that per- sons on layoff were not counted in the ISDP. See Bruce W. Klein, "Comparing Labor Force Measures in ISDP with CPS," Technical , Con - ceptual , and Administrative Lessons of the In - come Survey Development Program (ISDP) , Papers presented at a Conference (Martin H. David, ed.), Social Science Research Council, New York, New York, 1983, pp. 229-239. If For a discussion of the longitudinal nature of the CPS see, Using the Current Population Sur - vey as a Longitudinal Data Base , Report 608, Bureau of Labor Statistics, August 1980. 8/ See Barbara Bailar, "The Effects of Rotation Group Bias on Estimates from Panel Surveys, " Journal of the American Statistical Associa - tion , Vol. 70, March 1975. 1/ See the Current Population Survey: Design and Methodology , Technical Paper 40, Bureau of the Census, January 1978, p. 83. 12/ Ibid., p. 87. 11/ See John Coder and Angela Feldman, "Early In- dications of Item Nonresponse on the Survey of Income and Program Participation," a paper to be presented at the 1984 Joint Statistical Meetings, Philadelphia, Pa., August 1984. 12 / See, for example, Francis W. Horvath, "Track- ing Individual Earnings Mobility With the Current Population Survey," Monthly Labor Review , May 1980, pp. 43-46. H/ A similar situation prevails in the March sup- plement to the CPS where persons are asked about their work experience in the previous calendar year. 14 / The retrospective bias in the March CPS work experience data has been the topic of research in recent years. For example, see Richard D. Morgenstern and Nancy S. Barrett. "The Retro- spective Bias in Unemployment Reporting by Sex, Race, and Age," Journal of the American Statistical Association . June 1974, pp. 355- 357. For more recent research see, Francis W. Horvath, "Forgotten Unemployment: Recall Bias in Retrospective Data," Month ly Labor Review nc/ Marcn 1982 > PP- 40-43. _ ' 12/ The Bureau of Labor Statistics calculates a similar rate from the annual work experience data collected in the supplement to the March CPS. It is referred to as "the percent with unemployment." 12/ For example, see Herman P. Miller, "Subemploy- ent in Poverty Areas of Large U.S. Cities " Monthly Labor Review . October 1973, pp. 10-17- Sar A. Levitan and Robert Taggart, Employment ' and Earnings Inadequacy: A New Soc ial Indica- tor. (Baltimore: ihe Johns Hopkins University Press; 1974); T. Vietorisz, R. Mier, and J. Giblin, "Subemployment: Exclusion and Inade- quacy Indexes," Monthly Labor Review . May 1975, pp. 3-12; and Robert Taggart, Hardship --The Welfare Consequences of La bor Market Problems: A Pol 1C y Discuss ion Paper . T he W.E Upjohn Institute for Employment Research" 1982. jk, See Counting the Labor Force , p. 60. ISf The latest BLS report is Linking Employment Problems to Economic Status . Bulletin 2201 u -^- Department ot Labor, Bureau of Labor Statistics, June 1984. — / See Kim Clark and Lawrence Summers, "Labor Market Dynamics and Unemployment: A Reconsid- eration," Brookings Paper s on Economic Activi- ?n/ t£, No. 1, 19/9, pp. l§-72. £ u ' See Martin Feldstein, "The Importance of Tem- porary Layoffs: An Empirical Analysis," Brook- ings Papers on Economic Activity No. 3 1975 ? Pp: 725-744. ±±! For a statement of the need for labor turnover data, see Robert E. Hall and David M. Lilien The Measurement and Significance of Labor Turnover," in Counting the Labor Force, Appen - dix Vol. 1 (Concepts and Data Needs ). Nation- a1 Commission on Employment and Unemployment Statistics (Washington D.C., 1979), pp. 577- 600. For an example of separation data created from the CPS see S. Haber, E. Lamas, and G. Green, "A New Method for Estimating Job Sepa- rations by Sex and Race, Monthly La bor Review June 1983, pp. 20-27, and ATTan Eck, "New' Occupational Data Improve Replacement Esti- mates," Monthly Labor R eview. March 1984 pp. 3-10. 76 MATCHING ECONOMIC DATA TO THE SURVEY OF INCOME AND PROGRAM PARTICIPATION: A PILOT ST'JDY Sheldon E. Haber, The George Washington University Paul M. Ryscavage, Douglas K. Sater, Victor M. Valdisera, The Bureau of the Census The new Survey of Income and Program Participation (SIPP) will undoubtedly become a major source of data on a wide variety of aspects of the well-being of our nation's households, families, and individuals. The very richness of SIPP suggests the desirabi- lity of augmenting it with micro-level esta- blishment and enterprise data from the econo- mic censuses and other data files maintained by the Bureau of the Census, since the mar- ginal cost of merging these data with SIPP is relatively small and the potential gain in knowledge is very large. One area where the payoff relative to cost of enhancing SIPP is sura to be substantial and significant is that pertaining to the behavior of labor markets. A list of some of the areas in which a SIPP-economic data file can yield new insights includes the following topics: The relationship between capital and wage rates Labor mobility Low wage workers and low wage firms Measuring the effects of minimum wage legislation Structural unemployment Identifying high tech workers and high tech firms Implications of the transition from a goods to a service economy Unions and the labor market The substitutability of capital and labor Productivity analysis Besides the substantive knowledge to be gained by merging SIPP demographic and econo- mic data, there are externalities associated with merging these data sets. First, it will be possible to verify the accuracy of the size of firm estimates given by respondents in sur- vey data. An additional, indirect benefit of linking SIPP and economic data stems from the fact that the former is a representative sample of the working population. Matching on work place will yield a stratified sample of firms where the probability of selection is inversely proportional to firm size. By weighting the number of firms in each size group, estimates for the entire population of firms can be derived. The sample of employers would be contained in a single data set versus the diversity of data sets in which the economic data are now found, with the same format across employers. These advantages plus the manageable size of the sample should provide valuable insights into the structure of produc- tion within and across sectors of the economy at a point of time and over time. 1. SIPP and the Economic Data Files In merging demographic and economic data, it is necessary to know the information contained in the various files to be linked and how each file is constructed. In this section, we briefly describe four data sets which might be incorporated into a SIPP-economic data file. SIPP contains demographic and program re- lated data. Economic data are found in the Standard Statistical Establishment List (SSEL), the Longitudinal Establishment Data (LED) file, and the enterprise statistics (ES). The SSEL covers all establishments and companies with employees and yields current information on employment and payroll. The LED, as its name implies, contains longitudinal data but is restricted to manufacturing establishments. The ES, on the other hand, covers companies in the construction, mineral, manufacturing, wholesale trade, and retail trade industries, and most service industries. The SSEL is a complete directory of estab- lishments in single and multi-establ i shment enterprises with one or more employees, irre- spective of industry. The SSEL links parent companies, subsidiaries, and their establish- ments. It contains information on approxi- mately 4.7 million enterprises and 5.7 million establishments. The importance of the SSEL is that it is a current file containing a complete list of establishments and companies with paid em- ployees. While the SSEL contains a narrow range of economic data, these data impart valuable information. For example, the SSEL contains the address of the physical location of establishments which is useful for merging the demographic and economic data, since it is a primary link in identifying an indivi- dual's place of work. Employment and payroll figures yield an estimate of average annual earnings, thereby indicating whether an em- ployer is a low or high wage employer. JL/ Sales and employment figures provide a proxy measure of productivity. Operational status information can be utilized to identify esta- blishments which have become inactive. Addi- tionally, the SSEL contains longitudinal in- formation. Currently, establishment and com- pany data are carried for two years in the SSEL. The LED is a longitudinal micro-data base containing data at the establishment level from the Annual Survey of Manufactures and the Census of Manufactures. The LED provides a much broader range of information about establishments than the SSEL. For each manufacturing establishment, value added per production worker, which is a measure of labor productivity, can be calcu- lated. For the larger establishments with 250 or more workers, information is available on depreciable assets and rented machinery so that capital/labor ratios can be computed. Also, a better measure of labor compensation, including fringe benefits, can be obtained. Like the Census of Manufacturers, the enterprise statistics (ES) are collected 77 every five years. These data cover enterprises whose primary activity is in an in-scope indus- try. For each enterprise, the data are consol- idated over all operating units. The informa- tion contained in the ES is similar to that in the Census of Manufactures, except that fringe benefit, asset, and related data are available only for companies with 500 or more workers. 2. Some Applications of Micro- Demographic and Economic Data In this section, three applications of a SIPP- economic data file are discussed to illustrate how this data set can help explain policy issued relating to earnings and employment. A. Low Wage Workers and Low Mage Firms While survey data such as the CPS provide insights into the characteristics of low wage workers, they provide no information about low wage firms. Despite this lack of informa- tion, an £ priori description of low and high wage firms can be formulated. All else being the same, low wage firms will be labor inten- sive and, hence, tend to be smaller than high wage firms. And because recruitment and hiring cost relative to the level of wages will tend to be high, low wage firms will also advertise less for labor and employ fewer screening de- vices to weed out suitable workers; thus, their work force will be of lesser quality than their high wage counterparts. Less qualified work- ers, on the other hand, e.g., younger workers and those who are less educated, will be attracted to low wage firms because their marginal product is less than that required to gain employment in high wage firms. More generally, workers with given characteristics and tastes sort themselves among firms with similar requirements for labor. Corresponding to the greater prevalence of low quality workers in low wage firms, one might expect that in these firms (vis-a-vis high wage firms) a higher proportion of capi- tal expenditures is for used rather than new machinery and equipment; likewise, the pro- portion of depreciable assets retired each year is likely to be smaller in such firms. Furthermore, given that labor is of lesser quality and capital is of an older vintage, it would not be surprising if value added per worker were relatively low in low wage firms. Other characteristics are more easily seen by focusing on high wage firms. To the extent that high wage firms are capital intensive, their need for trained workers is likely to be greater than that of low wage firms. Capital intensi veness suggests greater use of resources to monitor output; hence, a higher proportion of the work force may be needed in supervisory positions. To reduce turnover, which disrupts the production process, high wage firms will substitute future benefits in the form of pensions for current benefits in the form of wages. A SIPP-economic data file would permit verification of these hypotheses. Information about low and high paying firms is important for another reason besides the light it sheds on how production is organ- ized in these two types of firms. Since low paying firms are a source of employment for workers with relatively low productivity, it is of some interest to inquire into the extent to which low pay among workers is attributable to their employment in such firms. In approaching the question of why some workers are paid less than others in this manner, low wage employers can be viewed as providing employment opportu- nities with attendant low earnings, not because they descriminate against certain groups of in- dividuals, but because the production processes that are most efficient for their mode of oper- ation do not require high quality labor and, furthermore, they inhibit their paying high wages. A procedure for verifying this view would be to sector firms according to whether they are low paying or high paying. With this sectoring of firms, one would expect, as indicated above, that the mix of workers and capital is dissimi- lar between the two sectors. Assuming this is so, to what extent are differences in individ- ual earnings in low and high paying firms due to the characteristics of the workers and capi- tal employed in each type of firm? Also, to what extent are workers with similar character- istics renumerated in the same way in each type of firm? The answers to these questions can be obtained from a SIPP-economic data file. B. Structural Unemployment An issue of long standing is what happens to workers who are displaced from their job as a result of structural disequilibria. How long do they remain unemployed vis-a-vis other workers who separate from an employer? What sources of income, including cash and noncash government transfers, do they draw on when they are unable to find work? When they find a job, how do earnings in the new job compare to earnings in the old one? If there is an earnings loss, how much of this loss is recouped, say, after 2 years? A major problem in answering these questions is that workers do not know if they are structurally unemployed. One way of iden- tifying such workers is to ascertain what has happened to the firms in which they were last employed. If the firm has undergone a substantial de- cline in employment or has closed down for a a relatively long period of time, say, longer than the typical recession, one may presume that it has undergone a shock which is typi- cal of the shocks experienced by firms subject to structural disequilibria. It also can be presumed that the employees of these firms experience the aftereffects of such shocks. A SIPP-economic data file would enable one to determine the extent to which firms are subject to severe, long-term shocks as evi- denced by plant closures and substantial reductions in employment, and how such shocks affect their work force. C. High Tech Workers and High Tech Firms Despite the importance of new technologies for improving productivity, there is no widely accepted definition of a high tech industry. Based on a definition which includes indus- tries with a ratio of technology-oriented workers 1! to all workers of at least 1.5 times the industry-wide average, Riche, Hecker, and Burgan (1983) estimate that 13.4 percent of all wage and salary workers were employed in 78 high tech industries in 1982. High tech industries have been cited as having a large group of high and low wage work- ers whereas other industries are comprised of workers who are concentrated in the middle of the earnings distribution. While it is use- ful to know how workers in high tech and other industries differ and the differential growth of employment in the two kinds of industries, it is equally important to know the character- istics which differentiate high tech from other firms and the differential in the rate of growth of the two types of firms. As is self-evident, not all firms in high tech industries utilize the latest technology, and new techniques of production are utilized by firms in industries besides those labeled as high tech. One approach to distinguishing between the two types of firms would be to com- pare the characteristics of the industries denoted on a_ priori grounds as high tech with other industries and then to use this informa- tion to identify high tech firms. To illus- trate this approach, assume that the a priori criterion used to denote high tech industries is the one noted above, namely, that the ratio of high tech to all workers in a given industry to the similar ratio for all industries is higher than some minimum value. Assume also that the high tech industries exhibit high values of the following ratios: capital expen- ditures for new computers to al 1 capital expen- ditures, capital expenditures to asset value, and capital to labor. Given a set of charac- teristics which permit the bifurcation of industries, the multivariate technique of cluster analysis can then be applied to iden- tify high tech firms within both high tech and other industries. The outcome of the cluster analysis is a partitioning of firms into categories, i.e., high tech and nonhigh tech firms, as determined by the data, where each cluster of firms repre- sents a homogeneous set of observations. An advantage of applying the aforementioned two- stage procedure using a SIPP-economic data file is that it provides an independent test of how well the procedure works. For if the approach is successful, the proportion of workers who are technology-oriented among the firms classi- fied as high tech (taken as a group) will be higher than the similar proportion for firms classified as nonhigh tech (again, taken as a group), and the difference in proportions will be greater than the corresponding differ- ence when industries » are classified as high tech and nonhigh tech. Having identified high tech firms, in con- trast to high tech industries, insights can then be obtained as to how production processes in these firms differ from their nonhign tech counterparts. At the same time, it will enable one to better define high tech occupations and how workers in these (and other) occupations in high tech firms differ from similar workers in nonhigh tech firms. 3. The Pilot Study A principal part of the pilot study is designed to assess the availability, sources, coverage, and content of the various economic data files maintained by the Bureau of the Census and to explore study areas and issues to which a data set com- bining micro-worker and firm data would be ap- plied. In the course of this study, specific demographic and economic variables have been identified which should be incorporated into such a data set. Additionally, it was antici- pated that methodological problems inherent in this undertaking would be revealed; indeed, this has been the case. A second phase of the pilot study is to investigate the efficiency of four alternative methods of identifying an individual's employ- er. Each method is based on different infor- mation for searching the SSEL and identifying the employer's census file number (CFN). The first utilizes information on employer name, the state of residence and/or zip code of the employee, and census industry code. The same information is used in the second method; how- ever, additional reference materials, ^e.g. , 1980 Census Company Name and Place of Work lists, Dun and Bradstreet reference books, Standard and Poor directories, and telephone books, will be used to obtain the exact address of an individual's employer. The third method uses the employer's name and exact address if known. In the last method, if the employer's identification number (EIN) is known, it is used in conjunction with the information avail- able in the first three methods to identify the employer's CFN. For each method, match rates and cost information will be developed for a small sample of workers. A third phase of the study is the construc- tion of a pilot SIPP-economic data file in which the SIPP portion of the file would be re- stricted to full -time workers in large manufac- turing establishments; the source of the eco- nomic data would be the LED. The objective in this phase is to calculate match rates between workers in SIPP and their establishments in the LED. Given the importance of the wage determina-. tion process, one of the areas noted above, e.g., low wage workers and low wage firms, would be studied when the pilot work file is completed. Demonstration of the utility of this research endeavor in terms of its contri- bution to the economic literature would con- stitute the final phase of the pilot study. 4. Methodological Problems in Matching Demographic and Economic Data In this sec- tion, attention is focused on two methodolo- gical problems. The first problem deals with procedures for tying workers to their estab- lishment and company. The second relates to the estimation of data, in particular, asset and fringe benefit data, which although avail- able for large establishments and companies, are generally not collected for small ones. Central to the creation of a SIPP-economic data file is the ability to determine the esta- blishment and/or company in which a person is employed. The most promising and least expen- sive way of doing this is to match on firm name and physical address of an individual's place of work. This information will be avail- able in SIPP and is available in the SSEL. Although the physical address is not necessary for identification of an individual's work place, its availability greatly facilitates 79 such identification since a firm may have more than one establishment in a local area. For employers with only one establishment in an area, the firm name and employee's address will typically be sufficient to determine where a person is employed. As noted, for companies with more than one establishment in an area, the firm name and employer's address should be sufficient to identify the place of work. If an employer has more than one establishment in an area and the place of work cannot be determined using the employer's physical address, or no address is available, other in- formation in SIPP can be utilized. Firm name, respondent's address, census industry code, and respondent's estimate of size of establishment and company can be used to identify a person's work place. For example, it is unlikely that a firm manufacturing bottles will have more than one large plant in a local area. Another aid in identifying an individual's work place is the EIN. While a company may have a number of establishments in a local area, its subsidiaries, when identified by their own EIN, may have only one establishment in the area. Thus, the EIN of the employer for whom an individual works can be sufficient to uniquely determine the establishment in which that person is employed. In the event that an unique work place can- not be determined for a multi-establishment firm, the employer's characteristics can be imputed. Data from the SSEL on number of employees and payroll can be averaged over a company's establishments in a local area. From the ES file, average values can be computed for variables not contained in the SSEL. For example, the average capital/labor ratio for a company with a chain of fast-food stores can be used as an estimate of the capita/labor ratio for each store in the chain. Where it is not possible to identify a work- er's firm by name in the SSEL, imputations can be made by averaging over establishments in the same local area and with the same census indus- try code as that of the given employer. Addi- tionally, it may be possible to refine the imputation process by considering information contained in SIPP, e.g., the size of establish- ment in which an individual works and whether the firm has one or more than one establish- ment. As indicated, information on assets and fringe benefits is not generally available for small establishments, but such information is available for a large sample of small esta- blishments in manufacturing. Despite the fact that asset information is not collected for many of the firms in which individuals work, the use of an economic model, including indus- try, firm size, and other variables, may enable one to obtain reasonably accurate estimates of capital for small establishments. Economic theory sugests a number of rela- tionships which influence the amount of capi- tal that a firm employs in its production process. In particular, since capital inten- sity varies with establishment size in closely related industries, it seems reasonable to assume that information about the number of employees in an establishment can be used to further refine estimates of its capital assets. All else being the same, one would expect the smaller an establishment, the lower would be. its capital/labor ratio. II Additionally, holding everything else constant, including establishment size, low wage establishments will substitute labor for capital in order to economize on the use of the relatively expen- sive factor, i.e., capital. Thus, low wage establishments will tend to have a lower capi- tal/labor ratio than high wage establishments. Even among establishments of the same size whose wage rate is also the same, one would expect a lower capital/labor ratio the higher the proportion of production workers among all workers. When the proportion of production workers among all workers is high, or converse- ly, when the percentage of workers who super- vise production is low, this comes about be- cause a firm has few assets, relative to labor, to monitor. Additional relationships between assets and other variables may exist. For example, it may be that newer establishments in an industry are more capital intensive than older ones; likewise, regional variations in entrepreneurial ability may give rise to cor- responding variations in capital intensity. Besides economic relationships, engineering relationships also may be useful in estimating capital intensity. For example, it is plau- sible that an establishment's capital/labor ratio is positively related to purchased electricity per employee. Finally, an economic model can also be util- ized to estimate fringe benefits for small establishments and small companies. It is plausible to assume that fringe benefits in a firm are related to its size, average wage level, legal form of organization, industry, and region where it is located. With a SIPP- economic data file more refined estimates of fringe benefits per employee can then be ob- tained by taking account of the percentage of employees who are covered by life and medical insurance and a private pension plan in a given group of firms, say, (small) high paying establishments in manufacturing. Given this information, the average value of these bene- fits per covered and noncovered worker can be calculated for each establishment in the group. An economic model can also be developed to estimate fringe benefits per covered and non- covered worker in small establishments and companies. With appropriate information in SIPP, such estimates could provide a basis for imputing an important component of private non- cash benefits to individual workers. Although it should be evident from the discussion of this paper, this last illustration is indica- tive of the benefits to be derived from a SIPP- economic data file. FOOTNOTES 1/ The data referenced in this section as welT as the remainder of the paper are avail- able in the economic and (where applicable) in the SIPP data collected by the Bureau of the Census. 2/ Defined as engineers, life and physical scientists, mathematical scientists, engineer- ing and science technicians, and computer specialists. 3/ An estimate of a firm's assets can then be obtained by multiplying the capital/labor ratio estimate by the number of workers in its employ. Riche, Richard; Hecker, Danial ; and Burgan, John, "High Technology Today and Tomorrow: A Small Slice of the Employment Pie," Monthly Labor Review , November 1983, pp. 50-58. COMMENTS BY MARTIN DAVID, University of Wisconsin— Madison 1. Some Problems Common to All Three Papers Selection The SIPP is subject to a different kind of se- lectivity than earlier cross section surveys with which we are familiar. Understanding that selec- tivity is an important part of the research that must be carried out on SIPP before we can be com- fortable with its results. The selectivity that affects migration and labor force measurement comes from the following rules, attrition, rules for handling first wave household non-response, and treatment of the sample universe. The selec- tivity that affects matching comes largely from the design of economic data samples. Each of these problems will be discussed below. Truncation of Measures A number of devices in the questionnaire result in logical exclusion of certain kinds of data. The respondents report detail only on two em- ployers; student status is revealed only for some persons; income is fully reported only for some children. This truncation is highly perti- nent to migration and labor force measures; less important for matching. Attitudinal measures The SIPP is almost entirely lacking in attitu- dinal measures. We need to ask to what degree subjective measures are important for the three areas covered by the papers above. Left censoring Although the SIPP records much current informa- tion, the duration of status at Wave 1 is conspi- cuously lacking. We do not know for how long the respondent was (un)employed prior to the first interview; for how long (s)he has held the cur- rent position; or for how long (s)he has resided in the current residence. Each of these items must be known to understand episodes or events in the lives of real people and will be important conditioning information for analyses of SIPP. 2. Issues pertaining to Migration Dahmann correctly stresses the unique potenti- alities of SIPP for migration statistics. Unlike earlier devices for measuring migration, the SIPP gives high resolution to the timing of migration •vents, and associated economic and demographic circumstances before and after the migration event . By contrast retrospective reports of migration •re severely biased, because measures of migra- tion are only obtained for those individuals who remain in the household universe. Persons who •migrate and persons who die are excluded. If their migration experiences are different from the remaining household population, retrospective measures are distorted. Unfortunately SIPP is not • perfect instrument for recording migration because the sample sys- tematically censors certain types of migration. Firstly, the sample is not updated with new ad- dresses until after 12 months of operation. Thus Immigrants to the household population are not represented, unless they move into existing households. This contrasts with the design of the CPS where the sample Is renewed (a) by the inclusion of a new rotation in each month and (b) by the turnover of families in dwellings which results in new households being sampled at old addresses. One hopes that the Bureau will esti- mate the extent of this problem, which affects both immigrants to the OS population and immi- grants to the household universe in the 12-month period between additions to the panel addresses. A second problem results from following rules. Movers are followed to locations within 100 miles of the sample PSUs. This rule covers 95J of the population, but clearly selects particular types of moves. The city dwellers who tire of urban life are less well represented than persons relo- cating because of new employment possibilities. The Bureau needs to assess the impact of alterna- tive following rules on the types of migration represented. CPS excludes all migration events; ISDP follows migration to the 50-mile radius; SIPP follows more extensively— These rules should be simulated on the retrospective reports of mi- gration so that differences between the measures can be understood as differences in the popula- tion sampled and differences in response for identical populations. We need a careful evalua- tion. The following rules also preclude the following of sample members into the institutional popula- tion. This rule is at odds with the principle that SIPP is a longitudinal panel of individuals living at selected addresses in Wave 1. It is understandable to question whether the instrument designed for the household population can be ap- plied in other settings. However, some of the most important economic transitions may accompany shifts out of the household universe. The design of SIPP needs to be alert to that possibility. It must also permit measurement of persons re-entering the household universe during the life of the panel. I heartily support Dahmann 's plea for a special module of questions that is triggered by observed migration. We need to know the proximate causes of migration, the costs associated with moves, and planning for moves. No vehicle could be more appropriate for such questions than SIPP. It has a sufficient sample size so that the migrating population is large enough to study, and it can measure attitudes about migration at a time when people can easily remember. 3. Labor Force Measurements Selection The SIPP will be less selective of longitudinal labor force measures than matched CPS samples precisely because the CPS does not follow any mo- vers. (In addition, CPS does not maintain ade- quate control over the identities of individuals so that matching is inferential, rather than po- sitively based on an identifier such as the So- cial Security number.) Nonetheless selection re- mains a problem for SIPP. The failure to sample immigrants mentioned above, and the failure to follow household non-response from the first wave both cause systematic selection from the house- hold population at later points in time. 83 The designers of the PSID express as their gre- atest regret that they did not atteapt to follow refusals and obtain conversions. Since we do not have any data on conversion rates following a four-month lapse of time, it is hard to judge the success of such an effort. It is clear, however, that persons who refuse at any time are a portion of the population that requires diligence to sort out because they are likely to differ from coo- perative respondents. Truncation In order to understand the implications of the complex skip structure of the labor force ques- tions I tested the sequence on the assumption that the respondent is a student interviewed in May at the end of the school year. If he was not in the labor force during the reference period, there is no place to record student status. If the student had any work during the school term and was not underemployed, school status is not revealed. Only if additional work is desired and unavailable do we discover student status as a rationalization for not working and not looking for work. This asymmetry will make it difficult to interpret information on youth. Another instance of truncation is revealed by the interviewer instructions. Reports of persons engaged in unpaid work are recorded only when that effort is expended in connection with the business of a family member. Work for relatives outside the household and voluntarism (for any organization) is not captured. These problems are mentioned because they point to a set of conventions established in connection with CPS data that need not be blindly applied to an innovative survey such as SIPP. The purpose of SIPP is to understand broadly about the economic roles of individuals and the relationship of those roles to Federal and State programs for income protection. It is widely re- oognlzed that a student role Is vital to protect- ing the capacity of our economy to produce in the future, and that individuals investing in train- ing are most often contributing to their own fu- ture economic well-being. It does not make sense to construe SIPP so narrowly that student roles can be understood both as complements to work roles and as an exclusive status. unpaid work roles are also non trivial. Persons engaged in those roles obtain skills and satisfaction and contribute to the production of the economy. We should not narrowly construct a notion of socially productive roles so that these activities are not properly recorded. SIPP could do better, without a major Investment in addi- tional measurements. I urge the designers of SIPP to reconsider the labor force sequence in this broader setting— We want to know the contributors and the dependents of the society, not Just the narrow answer to the question: "Did you receive pay for employment last week?" Two other truncation issues should be noted. Omission of employer related data for the third employer during the reference period distorts the employment problems of a group of high turnover employees that we need to know more about. The Bureau needs to Justify this arbitrary truncation of the data set. Secondly, and we have already mentioned its importance, the failure to deter- mine the duration of labor force status at the time of the first wave of interviewing results in left-censoring of all periods of unemployment. This is most unfortunate for the study of spells of unemployment, which are of great interest to the program agencies providing unemployment com- pensation and other income support. Response bias Ryscavage stresses the fact that comparison of the 1984 and 1985 panels will expose bias due to months in sample. This is far less important than the recall biases that can be studied within the rotations of each panel. Ryscavage points to work suggesting little problem with recall bias in the annual labor market experience measures. That fact may not be relevant to the cognitive task facing individuals in the SIPP. The year is a well-understood measure of time. Many activities relate to an annual cycle. We have many assists from nature, tax records, em- ployment rating cycles, and other events that help us structure an answer to a question like "How many weeks did you work last year?". Most of these assists are lacking for the SIPP refer- ence period. That fact implies that each respon- dent must laboriously reconstruct the last 18 weeks to answer the labor force questions. For persons with intermittent employment this may be difficult. I would argue that SIPP places a much larger cognitive burden on the respondent than the CPS reports. This will be reflected in a po- orer report of events more distant in the past, and that response bias needs to be estimated and reported. I do not believe that we can interpo- late extent of response errors from past observa- tions of annual experience in relation to the CPS measure of labor force from the monthly survey. The Four-month Report While it is true that data are collected for a four-month reference period, it is not necessary to constrain the report from SIPP to a four-month report of labor market experience, as Ryscavage suggests. Indeed the likely recall problems with dating events four months ago, make it imperative to study reports for the most recent month. Indeed for each interview there is one report of activity during the CPS reference week of the prior month. A first task for SIPP is to vali- date its measures against CPS for that week. 4. Matching to Economic Data Haber and his colleagues make it clear that a limited set of economic variables for each estab- lishment can be matched to every worker, and a large set of variables match to those workers who are employed in large firms and particular indus- tries. The latter data are updated annually, while data on small establishments come from per- iodic Censuses (selection in time) or samples (seleotion of firms). There appear to be five classes of potential information in Table 1. Overlaid on this censoring of the available data is a problem of incomplete linking data in table 2. Haber appears to be on sound ground in advocat- ing analyses of labor supply using information on firm payrolls, size of work force, and fringe be- nefits. However, his suggestion that assets and capital should be imputed for small firms where only SSEL data are available is methodologically unsound. No aaount of imputation will establish the covarianeea between worker and firs charac- teristics that are censored , Table 1 Characteristic of Data employer (updating cycle, Industry Size in years) 1. All All SSELO) 2. Mfg. NJ250 ASMO) 3. 25HNja CM(5) *. a+UN none 5. In-scope All EC(5) NJ500 Enterprise questionnaire (N is the number of employees; a, an unknown parameter.) Table 2 Linking data in SIPP BIN yes yes no Aggregation required to impute data none none enterprise level 2-digit SIC X N D. by the lack of information in lines 2.-6. In Table 1 for some establishments. The only possi- ble procedure is to study the nature of the se- lection to determine probability of inclusion for firms with different characteristics (Little et al. 1983). Access to linked data One aspect of the linked file that deserves careful thought is access by researchers outside the Census Bureau. It is clear that there is such broad scope for research using a linked file that many students of industrial organization and labor economics will wish to study the available data. Monahan (1983) has set out a model for public access using the LED. That model should be extended to all SIPP and ISDP administrative record linkages. The model for general access should include, in addition: A. On-line access by the public to a test sample (which could be transformed from real data by the addition of noise and/or the reorganization of data from real firms into simulated observations obtained by splicing vectors from several cases. B.Use of a widely available statistical package as an inter- face that users can manipulate to frame their questions . 5. Optimizing the SIPP Resource SIPP is still developing as a data resource. These papers indicate that conceptual effort ex- pended on exploiting the resource will pay off handsomely. The suggestion that SIPP be enhanced with additional questions that are conditional on ehange is extremely important. Hot only ques- tions that are aaked following migration, but also questions that are asked following change in employers, change in family structure, and entry into the universe should be incorporated into the standard core questions. The Bureau is consider- ing an "exit" interview to be administered to persons moving out of the sample universe; this will be extremely important both for following movers and for understanding the population on whom we have data. I should also imagine that agencies would be interested in questions that are conditional on movement on or off social wel- fare programs because that information would pro- vide better clues as to the incentive effects of those programs and the processes by which indivi- duals cope with crises in their own lives. I strongly believe that the Bureau should con- sider extending the SIPP panel beyond 8 waves of interviewing. A probability sample of the indi- viduals in each panel could be selected to be continued for an additional thirty months. If selections are made to include all the poor, per- sons in families undergoing demographic change, persons with unemployment, then the continuing panel would be substantially enriched with per- sons undergoing changes that are of interest to the social welfare and tax programs. The exten- sion of the sample to 60 months provides a suffi- cient set of data that meaningful before-after studies can be made in relation to "rare events such as divoroe, movement off the AFDC roles, or formation of a new household. It seems most desirable to continue discussions of the structure of SIPP and propose conditional questions, panel extension, and linkage to Census and administrative data as substantial enhance- ments to the current design. References R. J. Little, M.H. David, M. Samuhel, and R. Triest. 1983. Imputation models based on the propensity to respond, in PROCEEDINGS OF THE BU- SINESS AND ECONOMICS SECTION OF THE AMERICAN STA- TISTICAL ASSOCIATION. J. Monahan. 1983. Procedures for using the longitudinal establishment file. (Center for Economic Research, Bureau of the Census, Washing- ton, D.C) 85 DISCUSSION Harold Watts, Columbia University The McMillen-Herriot paper presents a very valuable review and explorers guide for a new and relatively uncharted area. It is true that panel data have been available for some time for house- hold samples (15 years or more) and that a liter- ature on the topic, which is cited by the authors, goes back some 5 or 6 years. But the problem is difficult; and statistical, cross-sectional ways of thinking so dominate our approaches that a recognized solution to the longitudinal household problem has yet to be found. Half of the two-part solution proposed here for SIPP has been implemented by the Panel Study of Income Dynamics (PSID). Both logic and exper- ience have supported the usefulness of the "attri- bute" treatment of household variables. Individ- uals do migrate through a succession of house- holds and treating the characteristics of the current household as an individual time-varying attribute provides a necessary perspective for analyzing person s longitudinally. But it does not solve the problem of continuity for a house - hold . The tentative solution offered here for that task distinguishes between family and non- family households, imposes discontinuity whenever a unit changes from family to nonfamily status, and recognizes at most one continuation of a fis- sioning household. This definition seems plaus- ible and viable, but clearly it will take some time until we have enough experience with it to make a judgement about how useful it is, or even about how useful a longitudinal household is once we fully digest how essentially ephemeral it is. In this regard, I would like to reiterate a plea that we more carefully distinguish between "household" or "co-resident family household" and "family." The first two are ephemeral rela- tive to individual lifetimes, at least in our society. But a family, considered as a web of kinship and socioeconomic ties and obligations, spans generations and lifetimes. I am not yet arguing that we should try to measure longitu- dinal family units in this dynastic sense--but lets not call something else a "family" just because we can observe it. In terms of defining Income for longitudinal household units, I am relatively confident that satisfactory solutions can be found as long as we recognize that the income picture 1s a lot more complicated than any "Annual Income" can begin to describe. Income 1_s a flow concept, and we can express any measure of flow as an average annual rate no matter how long or short the period over which it has been observed. In order to aggregate properly, we have to Inte- grate the rates with the durations, and avoid double counting, but that seems as tractlble as other things we now do. The more challenging problem is one of de- scribing aspects of the Income flow that have been totally Inaccessible to the CPS— trends and Instability 1n the rate of flow are quite important for the behavior and welfare of both Individuals and households but are totally obscured 1n an annual Income figure. Con- sequently, I do not feel that problems related to mimicking that very limited description of Income that we have been stuck with for many years should have much impact on the way we choose to define households and describe their income experience over time. Turning to the McNeil-Salvo paper we find a new and Interesting installment of the estimat- ed-earnings-function saga. It provides us with a new estimate of the amount of the male/female earnings differential that can be accounted for by the shorter experience and skill-depreciation that results from intervals out of the labor force. As usual, a relatively small part of the total variation in earning rates is accounted for but experience and schooling seem to have sub- stantial and fairly stable effects for both males and females. As measured by the authors only a small part of the male/female earnings differential can be attributed to the difference in work interrup- tions and consequent reduction in experience that characterizes the female work force. They estimate this by applying the male -averages to female earnings function and find only about 12 percent of the gap accounted for by experience and interruption. If one goes the other way and puts the female averages into the male earnings' model, 34 percent is accounted for ($.87). I'm not sure I know which way is right — but they don't give the same answer. Still, two-thirds of the difference is left unaccounted for and only a small part of that can be attributed to the included education variables. The explanation for the difference lies in the different experi- ence pay-off functions estimated for men and women. Especially for women with no interruption in work, the pay-off for experience is extremely small . I would like to comment more generally on the potential use of SIPP for this kind of study. This particular study does not take advantage of the longitudinal features 1n the ISDP, and I feel strongly that future work of this kind should exploit the power of these data more fully. One of the hypotheses related to the inter- ruption of work phenomenon has to do with rate of recovery of "depreciated" earning rates. ISDP and SIPP both provide several readings, over time, on earnings rates, and hypotheses about rates of change can be confronted directly. A longitudinal data base with historical recall questions can also do much more than indicate current marital status. Duration and cumulative effect of statuses like marriage and recency of change can be brought to bear on more highly articulated models for testing implications of one or another theories about the earnings/ experience relationship. It is unfortunate for these particular issues that the SIPP measure of the length of inter- ruptions remains somewhat crude even in the latest questionnaire. An Interruption "from" 1962 "to" 1962 can be anything from 6 to 12 months 1n duration; from 1962 to 1963, the range 1s 6 to 24 months; from 1962 to 1964, it is 14 to 36 months. Such Imprecision dilutes whatever effect this variable may have. It would seem that a question aimed at getting a start date and duration or an end-point and duration would be equally feasible and provide a more definite Interval of time. 87 SURVEY OF INCOME AND PROGRAM PARTICIPATION; SESSION III This section is comprised of five papers presented in this session which was sponsored by the Section on Survey Research Methods. OBTAINING CROSS-SECTIONAL ESTIMATES FROM A LONGITUDINAL SURVEY: EXPERIENCES OF THE INCOME SURVEY DEVELOPMENT PROGRAM Hertz Huang, Bureau of the Census I. INTRODUCTION In 1975 the Secretary of the Department of Health, Education and Welfare (The Department of Health and Human Services (HHS) predecessor agency) authorized a program, the Income Survey Development Program (ISDP), to resolve technical and operational issues for a major new survey — the Survey of Income and Program Participation (SIPP). Much of the work of the ISDP centered around four experimental field tests that were conducted 1n collaboration with the Bureau of the Census to examine different concepts, procedures, questionnaires, recall periods, etc. Two of the tests were restricted to a small number of geographic sites; the other two were nationwide. Of the two nationwide tests, the more Important data collection was called the 1979 Research Panel. This panel consisted of nationally representative samples which provided a vehicle for feasibility tests and controlled experiments of alternative design features. Information concerning the ISDP may be found 1n Yeas and L1n1nger (1981), David (1983), and the survey documentation now available through the National Technical Information Service (1983). The 1979 Research Panel was a multiple frame sample consisting of a general population (area) sample of 9300 Initially designated addresses drawn from the 1976 Survey of Income and Education (SIE) and some Census Bureau's current survey reserve measures and new construction update, and two 11st frame samples of (a) eligible applicants for the Basic Educational Opportunity Grant (BEOG) Program and (b) blind and disabled Supplemental Security Income (SSI) recipients. The 1979 Research Panel was a longitudinal survey consisting of six waves of Interviewing. The sample was divided Into three interviewing panels. The first panel was first Interviewed 1n February 1979; the second panel war, first Inter- viewed in March; and the third panel was first Interviewed 1n April. Each sample unit was subsequently Interviewed every three months. A sample of addresses was chosen and persons living 1n the sample units (addresses) during the first wave of Interviews were defined as original sample persons. For Interviews subsequent to the first, the sample of addresses became a sample of per- sons; accordingly, original sample people were followed to their new addresses 1n subsequent Interviews (with reasonable geographic constraints — within 50 miles of any ISDP Primary Sampling Unit). Personal Interviews were conducted 1n Wave 1 with all adults (persons sixteen years old and over) at the sampled address. These become the original sample persons. During Waves 2--6 all persons currently residing with an origins'! sample person were Interviewed. This means, for example, that 1f an original sample person moved to a new address with four other adults, then question- naires were atftvr1s;1sterv:c! to everyone at the original sample person's new address. If any original sample person remained at the first wave address, anyone who moved into that address with the original sample person was also Interviewed. Thus, Interviews were conducted with all adults at an address as long as at least one of the adults present was an original sample person. Because of the ISDP rules, persons can be lost from sample because they move beyond the survey's boundaries; 1n addition, people were added to the sample because they became part of the housing unit 1n which the original sample person resides. Obviously, the universe changes continuously through the life of the survey. A great deal of interest exists, however, in developing cross- sectional estimates at the time of each Interview wave. In the absence of drawing a new sample at each interview, any cross-sectional estimates developed for Waves 2-6 are subjeef'to a popula- tion coverage bias. This paper will focus only on the covered population and present some unbiased base weights for cross-sectional estimators for the non-1nst1 tutionallzed U.S. population repre- sented by the longitudinal sample (the population coverage bias will remain, however, since units containing no persons who were 1n the universe at the time of Wave 1 cannot come Into sample). Since the methodology for treating both area sample and 11st frame samples was needed for ISDP 1979 Research Panel, both will be described below. The estimation methods described here are directly applicable to the Survey of Income and Program Participation (SIPP), an overall description of which 1s found 1n Nelson, McMillen, and Kasprzyk (1984) and Herri ot and Kasprzyk (1984). II. THE POPULATION FOR CROSS- SECTIONAL ESTIMATES We begin by defining the general population for which estimates are required. All households existing during the first wave of interviews (February through April 1979) are considered the initial population. Based on the rules adopted for the following Individuals who move, we have essentially a longitudinal sample of persons as well as households for the Initial population. Since no new sample was drawn at any subsequent interview, the sample does not completely repre- sent the non-Institutionalized U.S. population after first quarter of interview. There were persons 1n the following categories at the Initial interview time but became part of the non-institu- tional population at a subsequent wave of inter- viewing: 1) U.S. citizens living abroad, 2} citizens of other countries who subsequently move to the U.S., 3) persons 1n institutions or armed forces barracks. These persons will be called the group R subpopulation which did not have chance to be selected as original sample persons. At a subsequent, wave of interviews, the longitudinal sampie did not Include any household in which all current members were 1n the groyp R subpopul*Uon. However, persons 1n the group R subpopulation who later Joined households that included original persons eligible for sampling 1n the first wave were added to the cross-section?.'' ».Tfiv&r$c. these persons along with newborns will be called "additions" in subsequent waves. In general, "additions" are defined as persons moving into eligible households after the first wave who were not eligible for sampling in the first wave. III. GENERAL CONCEPT Of CROSS-SECTIONAL ESTIMATION Due to the procedures adopted for following movers 1n the 1979 Research Panel, at subsequent Interviews a household could consist of members from more than one household 1n the universe at the time of the first wave. The Inclusion probability of such a household would depend on the inclusion probabilities of the households which the members of the current household were part of at the time of the first interview. The Inverse of the Inclusion probability 1s usually used as the weight of a sample household in estimation. However, because of the sample design of the 1979 Research Panel, the Inclusion proba- bility of a household is a function of Us primary sampling unit, type of sampling frames and the 1975 Income of the household which occupied the housing unit during the SIE Interview. Only the inclusion probability of an original sample household was feasible to calculate. The inclusion probability of an original nonsample household 1s almost Impossible to evaluate, but such households can come Into sample on later waves. Therefore, some alternative weighting procedures needed to be explored. The Idea to be presented 1n this discussion 1s very simple. We will associate observations at any given point 1n time with the known Inclusion probabilities of the original sample households. We will split up observations belonging to a household when current household members come from more than one original household. A portion of the observation 1s then associated with each original household. The fol lowing example will Illustrate the Idea: Assume that A & B are two original households with Inclusion probabilities *a and *b respectively. At the first wave of Interviews, household A consists of five members, a,b,c,d, and e, and the household B consists of three members, f,g, and h. During the second wave of Interviews we find that d,e, and f are living together and form a new household, called house- hold C, while a,b, and c are still 1n household A and g and h are still 1n household B. Household A Household B Two alternatives are proposed, both Involving the division of household C Into two parts; one part 1s associated with household A and the other with household B. a) Multiplicity Approar.hr Bdifid on the rtumbei of ways (tailed multiplicity) that the new household C can be included 1n the sample, the observation (additive, such as counts, income or values) of household C (called Xr,) 1s divided by the number of original households Involved (two 1n this case) and each portion 1s added to the corresponding observation of household A (called Xa) and household B (called Xe). Therefore, 1f both households A and B are original sample households, the cross- sectional estimate, x, for the total at the second wave based on these three households can be expressed as: x - — -E(I »M1 X 1> =£ \1=1 J=l Vj 1 / N r E( ttl ) 1=1 j=i r i"j 1=1 r i Vj=i "j / = 1 V 1=1 1 Therefore x M 1s an unbiased estimator of X. Note also that 1f aj = 1t is not necessary to know n j , so that Wmi can be calculated based on the selection probability only for sample units. Estimator II (Fair Share Estimator): This estimator is motivated by the assumption that all current household members contribute equally to the household in which they reside for the major household characteristic values, such as earnings and welfare benefits. N X F = ,?, W F1 X i As 1n the multiplicity estimator, aj and *j are associated with original households but are reindexed within each current household 1. One can see that x F 1s also an unbiased estimator of X as follows: E(X,) = E =l b 1o\j=l "j / = I X = X Note that if household j was not 1n sample at time t » 1t 1s unnecessary to know the number of current residents from original household j, S-m, 1n x F since aj = 0. Also note that because "additions" are not Included in the weight calculations, they must be Identified and excluded from using either estimator. Comparison of Estlmatorl and Estimator II Both Estimator I, x M , and Estimator II, x F , are feasible to compute. We now compare them with respect, to both operational convenience and reliability. In order to compute x M , the number of original households eligible for sampl 1ng from which the cwrreni residents cam is needed. This Information 1s particularly difficult to obtain at each successive wave of the survey. However, to compute 9, onft only needs to know the number of current residents 1n the household (excluding new addi- tions) and the number of residents from each 93 original sample household. This information could be obtained from the 1979 Research Panel person identifier without collecting additional Informa- tion. The equal contribution from the members of a household 1s a natural assumption. It reflects better the actual share among the household members 1n the absence of knowledge of the actual contri- bution from each member. For example, without knowledge of each person's age, employment status and other needed Information, 1t is more logical to assume that earnings and welfare benefits are equally contributed by household members than any arbitrary way of defining household members' shares. And as will be seen below, that heuristic- ally x can be justified as the approximate minimum variance unbiased estimator under what seems to be natural assumptions given a state of Ignorance about the actual shares of the household members . Assume that at a subsequent wave time t three households are generated from two original house- holds of the first wave of interview (time t ) as follows: ° 1 ^ v — -S /--> — A. 00- time t and X,, j=l, ..., N be that value for household j a\ time t. Using Section III we divide up X, 1n two parts, fX. and (l-f)X. and then associate fX 3 with X. and J (l-f) X 3 wltrt X-. Without loss of generality, assume that a sampie size of 1 was selected at the first wave, t , with probability »., k=l, ..., N(t ). An unbiased estimator of total, X, at time t can be written as a a 2 f Bl (l-f)« 2 where a., 1=1, N(t )1s defined at the beginning of this section. Nc°t1ce that both x and x are special cases of x. The variance of x 1s Var(x, . ., {(i- x, ♦ t- x,) - xj' -iKSjv J? »,)-"}'♦ The remaining terms are not explicitly given here since they are not functions of f. The Var(x) 1s minimized 1f I" 2 V V 1 Since usually not both *\ and * 2 are known and 1n most of the surveys conducted by the 8ureau of the Census, the Inclusion probabilities, n|<, are about the same for all ultimate sampling units (even though they are unequal 1n the 1979 ISDP), one may simplify f to f = (X 2 + X 3 -X 1 )/2X 3 Obviously, a weight defined as a function of survey observations 1s not easy to Implement. To further simplify f, we assume the percentage growth of X from t to t 1s constant for all units Involved and define A 31 a X (t ) - X. a X 2 (t Q ) - X, + X 32 X 3 " X 31 + X 32 where X 3 ^ is the share of X3 belonging to house- hold members from original household 1, 1=1,2. Without Knowledge of both X^(t ) and X2(t ), one might naturally assume that the two Initial households are about the same I.e., Xj ( t ) = X 2 (t ) and reduce f to X 31 /(X 3 i + X 32 ). Now 1f the contribution to X 3 1s proportional to the number of persons from each original house- hold, then f = S 3 i/S 3 o, as defined 1n Wp^ . This result can be extended to any sample size as well as to the case that the new household members are from more than two original households. There- fore, without knowledge of the actual contribu- tion from each household member, Var(Xp) 1s smaller than Var(x M ) under these assumptions. V. PROPOSED ESTIMATORS FOR LIST FRAMES Since persons are the 11st frame sampling units, we can divide all persons 1n the general population into three groups based on their relationship with the list frame under considera- tion. I) Persons who are included 1n the 11st frame (called 11st frame persons). For the SSI 11st frame, this group Includes all the (under 65) recipients of the Federal Supplemental Security Income 1n December 1978; while for the BEOG 11st frame, this group Includes all the eligible applicants of the Basic Educational Opportunity Grant as of September 1978 for school year 1978-79. II) Persons who are not included 1n the list frame but live with a list frame person(s) during the first wave of Interview (February through April 1979). Ill) Persons who are not Included 1n the 11st frame nor do they live with a list frame person(s) during the first wave of Inter- view. Both Group I and II had some chance to be included 1n the 11st frame sample, but Group III did not. The original (first quarter) households which consist of Group I and/or Group II persons will be called 11st frame households. As time went on, some members of Group III moved in and lived with person(s) belonging to Group I or II, Such members of Group HI will be 'additions' for the 11st frame, since they are not Initially eligible for sampling 1n the 11st frame. Note that the type of persons already described as "additions" for the general population (as defined 1n Section II) will also be "additions" for the 11st frame. For the following discussions, we now 94 define two types of "additions" for the 11st frames: the "additions" that come from Group III will be called "Group III addit1ons"and the type of "additions" as defined for the area frame will be called "area frame addition." If a 11st of recipients of a government assistance program 1s used as a 11st frame then Group III 1s usually fairly large. If we con- struct our estimators the same way as we did for the area frame, we will Include many of Group III persons in our estimates at time t of subse- quent Interviews. Consequently, we wouldn't really know what "subpopulatlon" we were estimat- ing. In our opinion, 1t 1s not feasible to define such a subpopulatlon at time t. Without new sample drawn each wave from the updated 11st, a proper cross-sectional estimate for a 11st frame subpopulatlon at time t 1s not likely, especially If the turnover rate of the 11st frame members 1s high. Therefore, we will restrict our cross- sectional estimates to be based on only the original 11st frame sample persons (that 1s, the list frame persons selected for list frame sample plus all the persons who reside with them during the first quarter of Interview) and the "area frame additions." In so doing we know that at any time t, the target population we are estimating consists of the original 11st frame subpopulatlon (that 1s Groups I and II) and the type of "addi- tions" as defined 1n the area frame. Note that the original 11st frame subpopulation 1s deter- mined by persons who were on the 11st at the time of sample selection. They may not be on the 11st by the time of Initial Interview. In the 1979 1SDP panel, a household may have a multiple chance of being selected for the 11st frame sample 1f more than one member of the 11st frame persons live 1n that household at the first wave of Interview. (Some effort was made to reduce multiple chance of selection for those households 1n SSI frame.) Therefore, the concept of the base weight for the first wave of Interview 1s no longer trivial. N L (t ) Similar to the area frame, we define X(t ) = £ X-t (t ) as the population parameter to be estimat- ed from a 11st frame sample at time t , where X-|(t ) 1s the value of the characteristic for the 1 th unit 1n the 11st frame subpopulatlon, which Includes both Group I and II defined at the beginning of this section. Let <*i = 1 1f 11st frame person 1 1s 1n the sample, = otherwise (note that a\ = for all non-list frame persons) i-t = the probability that 11st frame person 1 1s selected for the 11st frame sample for the first wave of Interview (time t ) = Pr (cm = 1) = E( tt1 ) 9 j - the number of 11st frame persons (Indexed by 1) 1rt the J th household at time t . Then the base weight at time t for the J tn household and its residents 1s defined as = the total number of 11st frame persons living 1n the original (time t ) list frame households which the current residents of the k tn household come from. = the total number of current residents at time t; S|?.) months is then defined to be one which is continuous for each of the n-1 corre- sponding pairs of consecutive months. (It has not yet been decided if this approach will actually be used in SIPP.) For each of the definitions below the condi- tions for which household B at month t+1 is the continuation of household A at month t are stated. One condition that we require that all the definitions share is that A and B are either both family households or both non- family households. The other conditions are: No Change Definition (NC) . A and B have the same household members. Same Householder Definition (SH) . A and B have the same householder. As an alternative, householder could be replaced by principal person in this definition without altering any of the statements made about it in sub- sequent sections, provided the final estima- tion procedure in Section 3 is also modified accordingly. (The householder of a house- hold is, roughly, the person who owns or rents the housing unit. The principal person is the wife In a married-couple household, and the householder in all other households.) Reciprocal Majority Definition (RM) . The major- 1ty of Individuals who are both household members of A at time t and in the universe at time t+1 are members of B at time t+1, and the 105 majority of individuals who are both house- hold members of B at time t+1 and in the uni- verse at time t are members of A at time t. (This type of longitudinal definition was originally developed by Dicker and Casady (1982) for use in the National Medical Care Utiliza- tion and Expenditure Survey (NMCUES).) We will now clarify several other terms. A household is said to be in existence over a time interval of n>2 months if it is longitudinal over that time interval. Its period of existence is the longest such time interval. In the case of a household which is defined cross-sectional ly for a month t, but is not longitudinal over either of the two month intervals containing t, then the period of existence of the household is defined to be one month. If t\ and tg are any pair of months, and longitudinal estimates are to be made over the interval [t^ , tgL then the following two pos- sibilities will be considered in subsequent sections for the universe of households for which estimates will be produced. Restricted Universe(R) . The set of all house- holds in existence over the entire interval [ti, t 2 ]. Unrestricted^ Universe(U) . The set of all household in" existence tor one or more months in [ti, t?]. Each sample panel is interviewed eight times. Each of the eight rounds of interviews takes four consecutive months to complete and is known as a wave . Finally, we define an original sample person to be a person that was in sample during the first wave and will be at least 15 years of age by the end of the panel . 3. UNBIASED WEIGHTING PROCEDURES In this section we present five weighting procedures for computing estimates of totals or proportions for longitudinal households that would be unbiased in the sense that the expected value of the estimator over all pos- sible samples is the parameter of interest assuming no data are missing or in error, and perfect frame coverage. Modifications and adjustments of these estimation procedures necessary because of the unrealistic nature of these assumptions are considered in the original paper, but are omitted here due to lack of space. Except for the Continuous Household Members procedure, which will only be applied to the restricted universe, all the procedures will be stated for the unrestricted universe. To apply them to the restricted universe simply zero weight each household which is not in continuous existence over the time interval of interest. Furthermore, unless otherwise stated, all the procedures will be applied to all four longitudinal definitions defined in Section 2. First we will explain why a common method of estimation, weighting by the reciprocal of the probability of selection 1s not feasi- ble for our purposes, and hence the need to consider N alternative procedures. Let X = I x^ be a 1=1 parameter of interest, where x-j is the value of the characteristic for 1-th unit in a popula- tion of size N. Typically in survey work, to estimate X a sample would be drawn in such a manner that the i-th unit has a known positive and X would be estimated by X = I WjX{ , (3.1) i=l where 1 w i = 1 if the i-th unit 1s in sample, (3.2) ' otherwise. Unfortunately for household and family estima- tion in SIPP, both cross-sectionally and longi- tudinally, such an estimation approach is not practical. For example, cross-sectionally a household is interviewed and used in the esti- mation process for a given month if and only if at least one household member is an original sample person. Consequently, to use (3.1) and (3.2) as an estimator it would be necessary to determine the probability that at least one member of the current household is an original sample person. It would be operationally impossible to determine this probability, since it would first be necessary to determine the first wave households for all current household members and then compute the probability that at least one of these first wave households was selected. Fortunately though, it is not necessary that w^ satisfy (3.2) in order that (3.1) be unbi- ased. In fact if w, is any random variable associated with the i-th unit in the population satisfying E(wi) = 1, (3.3) then (3.1) is unbiased, that is E(X) = X. Thus, defining unbiased longitudinal household and family weighting procedures reduces to defining random variables w^ satisfying (3.3). Before we present the longitudinal weighting procedures we will state what, for purposes of this paper, a cross-sectional household weight is, since most of longitudinal weighting proce- dures will be defined in terms of cross-sec- tional weights. The first wave cross-sectional weight for a sample household is taken here to be the reciprocal of the probability of selec- tion. For all nonsample households in the uni- verse this weight is defined to be zero. For any month after the first wave a different def- inition is necessary because of possible changes in household composition. So, the cross-sec- tional household weight for any such month is defined to be the mean of the first wave cross- sectional household weights for all persons in the household that month who will be at least 15 years of age by the end of the panel and who were in the universe during the first wave. This type of weighting procedure is currently being used in SIPP to produce cross-sectional estimates, hence the name. It is readily verifiable that the weights satisfy (3.3). We also will leave It to the reader to veri- fy that the weights for each of the longitudinal procedures to be presented satisfy (3.3) and 106 hence lead to unbiased estimators. Beginning Date of Household Procedure (BH) . Each longitudinal household receives a single weight valid for any time interval that con- tains at least part of the period for which the household existed, namely the cross-sectional weight for the household at the beginning date of the household. In particular, if there were no original sample persons in a household at Its beginning date then its longitudinal weight would be zero. This approach to longi- tudinal household estimation was previously used in the NMCUES (Whitmore, Cox and Folsom 1 982 ) . Beginning Date of Time Interval Procedure (BI) . Each longitudinal household receives a longitudinal weight valid for all time intervals with the same beginning date, namely the cross- sectional weight for the household at the begin- ning date of the time interval. Longitudinal households that form during the time interval are assigned the cross-sectional weight for the household at its beginning date, as in the preceeding procedure. Continuous Household Members Procedure (CM) . The following procedure will only b~e applied to the restricted universe, as defined in Section 2. For any time interval for which the household is in existence the longitudinal weight to be assigned is determined by the set of persons that are members of the household throughout the time interval. The longitudinal household weight is the cross-sectional weight that would be assigned to a household consisting of this set of persons; that is, the average of the first wave weights of these people. A lon- gitudinal weight of zero is assigned to the household if there are no original sample per- sons who are members throughout the time inter- val. The procedure is slightly biased because a longitudinal household with no members con- tinuously present throughout a time interval has no chance of receiving a positive weight, thereby making satisfaction of (3.3) impossi- ble. Since we believe this situation will rarely occur, at least for the longitudinal household definitions considered here, we expect this bias to be very small. Average Cross-Sectional Household Weight Procedure (AW) . Each longitudinal household receives a longitudinal weight valid for a specific time interval, namely the average of the monthly cross-sectional weights for the household over the intersection of the life of the household and the specified time interval. Note, there are many procedures, like AW, that entail the averaging of weights, both household cross-sectional weights and person longitudinal weights. We will examine only one of these procedures here, as an example of this type of longitudinal household weighting procedure. Householder Weight Procedure (HW) . The follow- ing procedure will be applied only to the No Change and Same Householder Definitions, since it is appropriate only for definitions that allow for a single householder during the household's existence. (Generalizations of this procedure which are not so restricted 1n their applicability exist but will not be considered here.) The procedure assigns a single weight valid for any time interval that contains at least part of the period for which the house- hold existed, namely the first wave cross- sectional household weight of the householder's first wave household. A longitudinal weight of zero is assigned to the household if the householder was not an original sample person. As will be seen in Section 5, this procedure is clearly the one of choice when the Same Householder Definition 1s used. If that type of definition is used with householder replaced by principal person then a similar modification of this estimation procedure with householder replaced by principal person would be appro- priate. 4. POTENTIAL ADVANTAGES AND DISADVANTAGES The ideal unbiased weighting procedure would provide a single set of weights applicable to any time interval, require no more data than were collected, and possess the minimum vari- ance among all unbiased procedures. Unfortu- nately, no such procedure exists. The proce- dures described in Section 3 all fail one or more of these three criteria to various de- grees. In this section, we explain the nature of the failures without explicitly comparing the procedures. That is done in Section 5. Multiplicity of Weights . Some procedures have the advantage of assigning to each house- hold a single weight which depends only on con- ditions as of the first reference month for the household and which is valid for every interval that the household is in the universe. Other procedures have the disadvantage of sometimes producing different weights for the same house- hold for different time intervals. (Procedures with this disadvantage could be modified so that only a single weight applies to any time interval, by computing for each household the weight appropriate for that procedure for the unrestricted universe and the 2 1/2 year time interval corresponding to the life of the panel. The weight obtained would also be used for any smaller subinterval for which the household is in the universe. However, weights obtained in this manner might not be able to be determined until the end of the life of the panel. This would make them difficult to use because we would have to wait until the last data from the panel were processed before estimates could be produced for any earlier time period. In any case, such weights would often lead to higher variances for short time intervals than weights developed specifically for the short time intervals.) Unavailable Data Requirements . Most defini- tion and procedure combinations require data from some households for time periods when the household is in existence but not in sample, that is for time periods for which interviews are not conducted for the household because no original sample people are members of the household. This needed data could be informa- tion for determining proper longitudinal weights or subject-matter information for use in tabulating the estimates. Some of this Information is not collected for the 1984 panel of SI°p beca;",* of the current operational procedures. T ^is is a consequence of the fact that agreement has not been reached on the longitudinal household definition to be used 107 in SIPP. In this vacuum, operational proce- dures were determined mainly by considerations of difficulty and cost. Once a definition has been agreed on, depending on the nature of the unavailable data, it might be possible to change operational procedures for future SIPP panels so that the required data are collected. To understand the problem with current opera- tional procedures, consider the following sit- uation. A household is longitudinal" from month tg to tf. Original sample people are part of the longitudinal household only from month t} to t 2 . If tRrob(Xj= a, X 2 = bJ+ProbCX^ b, X 2 = b)\ + n ProWXi = a) ProbtYj = 1, Y 2 = 0) • [Prob(Xj= a, X 2 = b) "1 Prob(Xj= a, X 2 = a)+Prob(Xj= a, X 2 = b)J + n ProMYj = 0, Y 2 = 0) ProWXj = a, X 2 = b) = n ProWXj = a, X 2 = b) The theorem is extended to longitudinal records of any length by adding the appropriate terms to equation (3). TOE EXPECTED NUMBER OF INCORRECT IMPUTATIONS Longitudinal data by itself may not always be sufficient to accurately impute missing data. The amount of information available longitudinally can be measured by estimating the expected number of incorrect imputations. Consider the longitudinal record for the monthly receipt of wages and salaries, (6) X = (0,0,0,0,0,0,0,0,0,0,0,2), where X t = indicates receipt and X t = 2 (t=l,..^12) indicates missing data. The probability Probreceived. Dividends credited. 9.4 30.7 10.3 28.2 8.3 33.8 9.8 30.1 9.3 30.5 Number of persons Total Rotation One Two Three Four Total One 43 29 44 57 70 83 98 113 48 33 50 64 76 90 105 114 44 30 45 57 72 81 111 (B) 42 26 42 55 67 84 101 120 41 26 41 55 66 77 71 94 Four Five Six Seven or more. . B Less than 10 sample households. 121 DISCUSSION Roy Whitmore, Research Triangle Institute 1. Introduction The Bureau of the Census is to be commended for presenting papers dealing with proposed Methodology during the planning stages of the Survey of Income and Program Participation (SIPP). Presentation of these papers is sure to stimulate constructive suggestions from the scientific community. On the other hand, the SIPP has been in progress since October 1983 and a second panel is to be fielded in January 1985. Hence, it is important for methodological issues that impact directly upon data collection tech- niques be resolved as quickly as possible. The first of the three papers being reviewed discusses person-level and household-level cross- sectional weighting procedures for the 1979 Research Panel of the Income Survey Development Program (ISDP), a nationwide field test for the SIPP. The next two papers discuss person-level and household- or family-level longitudinal weighting methods being considered for the SIPP. Each paper will be discussed individually, although it will be noted that some comments pertain to all three papers. 2. Cross-Sectional Estimates for the ISDP by Cross-sectional household weights are pre- sented for both the area frame and the list frame samples of the 1979 Research Panel. The proposed weighting procedures are discussed below or each sampling frame. 2.2 Area Frame Sample Weights The population of inferential interest for the 1979 Research Panel was defined to be the 1979 civilian, noninstitutionalized United States adult population. Standard area frame household sampling procedures were used to select a sample of members of this population in the Wave 1 sample, which was fielded early in 1979. However, only adults (aged > 16) in the Wave 1 sample were followed when they moved to new addresses during 1979. Thus, "additional" people who entered the target population during 1979 (by birth, by entering the United States, or by leaving the military or an institution) where only interviewed while living in a house- hold that contained at least one Wave 1 sample ■ember. As a result, the sample fails to ade- quately reflect "additional" people in the target population. This issue will arise again and be discussed more fully with regard to the two SIPP methodology papers. Two unbiased cross-sectional time t estimators were discussed for estimation of the population total, X . I found it instructive to reformu- late these estimators as follows: The following motivational statement is found early in Huang's paper: "There is a great deal of interest in developing cross-sectional weights at the time of each interview wave." Due to the use of three rotation groups within each wave, I question the use of wave-specific cross-sectional weights for direct data analy- sis. The inferential population would be diffi- cult to define due to the sequence of referece periods applicable to the rotation groups (see Figure 1). Wave-specific cross-sectional weights are important for defining longitudinal weights, as is apparent from the other two papers being reviewed. Huang presents his weight formulas in the context of weights for cross-sectional estimates that are time-specific rather than wave-specific, which is probably more useful for data analysis. The formulas presented in the paper can actually be con- sidered to be either wave-specific or time- specific weights. The proposed weighting procedures are dis- cussed in terms of estimation of the population total l - x t,M = .!; w t,M (i)x t (i >' (2) where I (r. n.)" 1 for the i-th [jeS l J household in the time t sample, for households not in the time t sample, S = {Wave 1 (time t )sample households}, r. = Number of households in the Wave 1 universe contributing members to the i-th time t household, and n. = Selection probability for the j-th Wave J 1 sample household. X. = V X.(i), * i=l (1) where i=l,...,N indexes the "units (persons or households)" in the target population at time t. The weighting formulas presented are for cross- sectional household weights applicable as of either time t or wave w. Since all "adult" ■embers of sample households are interviewed, the cross-sectional household weights can be assigned to all household ■embers for cross- sectional person-level analyses. X S, .(S. n.) for the i-th jeS 1J 10 J household in the time t sample for households not in the time t sample, 123 S ■ Number of members of the j-tb Wave 1 household who belong to the i-th tine t household, and S io = Number of members of the Wave 1 uni- verse who belong to the i-th time t household. In Huang's paper the estimator (2) is referred to as the multiplicity estimator and (3) is referred to as the fair share estimator. In fact, both (2) and (3) are multiplicity esti- mators. The difference is that the weight W M is based upon household-level multiplicity and VL _ is based upon person-level multiplicity. The paper shows that both estimators provide unbiased estimates of the population total, X , invoking the "fair share assumption" for the estimator (3). In fact, the weights W „ and W t F are identical to the initial family weights fof the national household survey component of the National Medical Care Utilization and Expen- diture Survey (NMCUES) [See Whitmore, et al (1982a)]. In the NMCUES report, it is shown that both of these estimators provide unbiased estimates, even without the "fair share assump- tion" for the estimator (3). Huang's conclusion that the estimator (3) is preferable mainly because it produces less variable weights and hence smaller sampling variances is also sup- ported. 2.3 List Frame Sample Weights Huang's paper defines the population of inferential interest for the sample based upon SSI and BEOG lists as follows: "At any time t, the target population consists of the original list frame subpopulation (Groups I and II) and the type of 'additions' defined for the area frame." Hence, the time t target population is the Wave 1 universe plus "additions." Additions for the area frame sample were civilian, noninstitu- tionalized United States adults who joined this group by birth, by entering the United States, or by leaving the military or an institution. I expect that the author does not intend to in- clude all such additions in the target popula- tion since the Wave 1 universe does not include all civilian, noninstitutionalized United States adults. Maybe only those additions that simul- taneously enter the universe and enter a house- hold containing a member of the Wave 1 universe are intended to belong to the target population. In any case, the field procedures did not pro- vide adequate coverage of additional target population members because only adults (aged > 16) in the Wave 1 sample were followed when they moved to new addresses during 1979, as was true for the area frame sample. Two cross-sectional time t estimators of the population total, X , were presented. I found it instructive to reformulate these estimators as follows: * T I (B. + U k ) n ] for the k-th i ieS^ * * household in the time t sample con- taining a member from S , for households in the time t sample containini no members from V S Q = {Wave 1 list frame (Group I) sample persons}, p* k = Number of Wave 1 list frame (Group I) persons in the k-th time t household, 1 if the k-th time t household contains any additions (Group III persons), U k = otherwise, and It. = Selection probability for the i-th Wave 1 list frame (Group I) sample person. 2. X** = I* wj _(k) X.(k), t,F t,F t j£ * ' (5) for households not in the time t sample, S = {Wave 1 (time t ) sample households}, S. . = Number of members (Groups I and II) of J the j-th Wave 1 sample household who belong to the k-th time t household, = (Total number of members of Wave 1 sample households who belong to the k-th time t household) + (Number of additions (Group III persons) in the k-th time t household), and = Unbiased multiplicity-adjusted weight for the j-th Wave 1 household. oj <«■ , (k) X. (k), The paper notes that the estimators (4) and (5) do not provide unbiased estimates of the population total, X . Part of the problem may be that additional (Group III) sample members explicitly enter the weight computations. Since households in the sample must, by definition, contain at least one Group I or II sample mem- ber, Group III persons need not explicitly enter the weight computations. 124 It should be noted that the alternative weights W~ M and W~ do not give positive weights to' identically the same households. Tine t households that contain Group II people, but no Group I people, are given a weight of zero by W~ M , whereas Vr is positive for these households!" t,i! Consideration should be given to defining the person-level target population as simply the original list frame (Group I) persons. Weights similar in definition to (2) and (3) can then be defined that provide unbiased estimates of population totals for this target population. These weights would be essentially the same as the initial family weights used for defining longitudinal family weights for the state Medi- caid household survey component of the NMCUES [See Whitmore et al (1982b)]. 3. Person-Level Longitudinal Weights for the SIPP by Judkins, et al Early in the Judkins paper, it is stated that when an interview is missing for a wave and is bracketed by good interviews, imputation will probably be used for the missing wave. Why not use a longer reference period for the next wave interview and collect the data directly, as was done in the NMCUES? The SIPP universe at any fixed point in time is defined as the persons aged 15 or older who are members of the civilian, noninstitutional United States population, as well as members of the military living on bases with family or living off bases. Dynamic longitudinal features of this universe are: 1. "Additions" - Individuals who were not members of the Wave 1 Universe but became members of the SIPP universe during the panel's 2 2/3 year reference period. 2. "Exits" - Individuals who left the SIPP universe during the 2 2/3 year reference period due to death, moving out of the United States, or going into the military or an institution. The Wave 1 interview should probe for the occur- rence of such events during the Wave 1 reference period. As was true for the ISDP, only Wave 1 sample members are followed to new addresses when they move, and current SIPP survey proce- dures do not provide adequate coverage of the "additional" target population members. Methods for improving coverage of the "additional" target population members will be discussed later in this section. The Judkins paper indicates that the ideal annual longitudinal universe is the union of 12 monthly universes. Either this universe or the union of 366 daily universes should be the target population. The problem of analysis of annual statistics when some population members are survey-eligible for less than the full year is noted as one difficulty with this target population definition. I believe that methods exist or can be developed to adequately address this problem. For example, estimation of an annual mean can be based upon the following statistics: T (i) = Annual income of the i-th sample * member while survey-eligible, P (i) s Proportion of the days in the year that the i-th sample member was survey-eligible, and W(i) = Longitudinal analysis weight for the i-th sample member. The population totals for Y and P would be estimated unbiasedly as follows: a (6) (7) These estimators would have the following inter- pretation: N(a) = Unbiased estimate of total annual personal income for the target population, and 0(a) = Unbiased estimate of the average daily number of members in the target popu- lation. Hence, the ratio estimator, R(a) = N(c) / D(a), (8) would provide a consistent estimate of the average annual personal income. Estimation of the population distribution of annual statistics, such as total annual personal income, is somewhat more difficult. The income of a sample member who was survey-eligible only part of the year requires special treatment. The NMCUES defined a time-adjusted income de- fined for each sample member as TAY(i) = Y a (i) / P a (i), (9) and produced the distribution of these time- adjusted values. Another possibility is to produce separate distributions of annual income for individuals who were survey-eligible for 12 months, 11 months, 10 months, etc. A third possibility might be to simply estimate the annual average monthly income based upon all sample members who were survey-eligibte for one month or more, instead of the average annual income . Four longitudinal weighting procedures are discussed in Judkin's paper. The first procedre defines a longitudinal weight applicable for all longitudinal analyses of an individual's data, irrespective of the analysis time period. A weight of this type is definitely needed for each sample member to facilitate all types of longitudinal analyses. This first procedure gives zero-valued weights to all "associated" sample members. These data are collected mainly to enable family and household analyses. The other procedures attempt to make greater use of the data for "associated" sample members by giving some of them positive weights for par- ticular analysis time periods. Since these "associated" sample members bad a chance of inclusion in the Wave 1 sample and were not selected, the bias and variance reduction pro- perties of these procedures would have to be investigated carefully before these procedures 125 could be recommended. Empirical studies based upon the longitudinal data collected by the ISDP, NMCUES, and/or National Medical Care Expenditure Survey (NMCES) could provide the basis for resolving this issue. A weighting procedure similar to the first procedure in Judkin's paper can provide improved coverage of the target population with some modification of SIPP field procedures. The changes in field procedure would be the fol- lowing : 1. Each "additional" sample member becomes a "key addition" (i.e., to be followed to the end of the 2 2/3 year panel and receive positive longtudinal weights) if the first household that the person belongs to after entering the universe is a sample house- hold. 2. The Wave 1 sample housing units (and the half-open intervals between sample housing units and next listed housing units) are to be ' monitored throughout the 2 2/3 year panel for entry of "additional" universe members. If such "additional" people move into one of these housing units and estab- lish their own independent household as their first household after re-entry into the universe, they are also "key addi- tions." Using this data collection protocol, all longi- tudinal weights Can be based upon selection probabilities for Wave 1 sample households as follows: 1. For each member of a Wave 1 sample house- hold, the longitudinal weight is the recip- rocal of the selection probability for that household. 2. Every "key additional" sample member can be linked uniquely to either a Wave 1 sample household or a Time t (time of entry into the universe) sample household. Hence, the longitudinal weight for such a person is either the reciprocal of the selection probability for the uniquely linked Wave 1 household or the Time t cross-sectional weight of the uniquely linked sample house- hold. 3. All "associated" sample members and other "additional" sample members get a weight of zero because they could have been selected into the sample, but were not. The person-level longitudinal weight adjust- ment procedures discussed in Judkin's paper seem reasonable. I would, however, only recommend the cross-sectional consistency adjustments to monthly totals and equalization within marriage groups if the adjustments to the post-stratified weights were minor. A. Household- and Family-Level Longitudinal Weights for the SIPP by Ernst, et al Ernst presents four longitudinal household definitions for consideration. Preference is indicated for a "Shared Experiences" definition. What is the justification for choosing this definition? More consideration should be given to the question: "What longitudinal household or family definitions are most useful for ad- dressing analysis issues?" Ernst suggests that longitudinal families not be identified as such but rather that longitudi- nal households be classified as family and non-family households. The desirability of this approach is questionable. Families that exist either long-term or short-term as multi-family households are potentially important for family- level analyses. Based upon the NMCES and NMCUES experience, it is not especially difficult to divide households into family reporting units for data collection. Consideration should be given to identifying the properties that one would like all longi- tudinal households or families to satisfy. Such properties might include the following: 1. Since cross-sectional families are well- defined at any fixed point in time, it may be desirable for the longitudinal families in existence at any fixed point in time to be identical to the cross-sectional fami- lies in existence at that same point in time. 2. It may be desirable for changes in house- hold composition that strongly affect family income or program participation to trigger the beginning and ending of SIPP longitudinal families. Some questions like "What longitudinal family definition is most useful for assessing the effect of divorce on family income?" should be addressed in detail before adopting a SIPP longitudinal family definition. In fact, con- sideration of how to best address analysis issues may suggest that multiple longitudinal family definitions are needed to satisfy multi- ple analysis objectives. The Ernst paper presents five longitudinal household weighting procedures. Each weighting procedure is based on cross-sectional household weights that are equivalent to Huang's "fair share" weight. This appears to be the proper basis for longitudinal household weights. The need for data for time periods when the longitudinal family is not in the sample is investigated. The need for this additional retrospective or prospective data depends upon both the family definition and the weighting procedure. Use of longitudinal weights applicable only to specific time periods is discussed as a means for making use of more of the data collected for specific time periods. As noted in the paper, these procedures also tend to require the great- est amount of data for time periods when the family is not in the sample. The variance/bias tradeoff would have to be carefully investigated for these procedures before they could be recom- mended. Empirical investigations based upon the ISDP, NMCUES, and/or NMCES databases may be useful in this regard. In any case, it is important to have a longitudinal weight appli- cable for all time periods to enable longi- tudinal family analyses of all kinds. One shortcoming of all family weighting procedures suggested by Ernst is that the fami- lies spawned by "additional" sample members all get zero weights. The paper states that the first procedure discussed is the procedure used by the NMCUES. This is not exactly true because 126 the NMCUES traced certain types of "key addi- tional" sample members and assigned positive weights to the families spawned by them. The procedures discussed with regard to the Judkins paper are recommended for identifying and trac- ing "key additional" people. Given these survey procedures, an unbiased "beginning date" type of longitudinal family weightng procedure is pre- sented in Horvitz and Folsom (1980). Review of this paper is highly recommened to everyone interested in longitudinal surveys. The weight adjustment procedures discussed in the Ernst paper appear to be appropriate and satisfactory for the most part. However, weight adjustment is discussed as a method for compen- sating for lack of data for specific time inter- vals, e.g., prior to the first interview or following the last interview. In order to adjust for this type of nonresponse, the NMCUES used attrition imputation procedures. I feel that attrition imputation is a more satisfactory solution because it can address all data missing due to attrition at once and the resulting database is more amenable to analysis. Finally, I would only recommend the final adjustment of longitudinal family weights to monthly controls if the adjustments were a minor. REFERENCES Horvitz, D. G. and R. E. Folsom (1980). Method- ological Issues in Medical Care Expenditure Surveys. Proceedings of the Section on Survey Research Methods of the American Statistical Association , pp. 21-29. Whitmore, R. W. , B. G. Cox, and R. E. Folsom (1982a). Family Unit Weighting Method - ology for the National Household Survey Component of the National Medical Care Utilization and Expenditure Surveys . RTI/1898/06-03F. Whitmore, R. W. , B. G. Cox, and R. E. Folsom (1982b). Family Unit Weighting Methodology for the State Medicaid Household Survey Components of the National Medical Care Utilization and Expenditure Survey . RTI/1898/06-04F. 127 on o> >■ >» I a a. a. a, ■V CM < < < 01 r- xi r ae a ~ ai 01 41 J2 X> X) im *j 4) 0> v 01 U (», (n fc OS a *j sad »■» •-> ^ •-> 01 01 a a o z 3 ON ON ON ON ON ON Co xj n >< >. q i-i 01 O D <5 O. W 3 3 *jjc u. sc < x: >-s •-> a o

2 and 2->3 (within survey wave n), 3->4 (the last month of wave n and the first first month of wave n+1), and 4->5 and 5->6 (within wave n+1). For each income type in each month-pair, a turnover rate (P1(1+1)) was calculated as the number of adult sample persons^/ who changed recipiency status with regard to income source X (I.e., who received income of type X in the first month of the pair but not in the second, or vice versa ) divided by the total number of adult sample persons. The between-wave rate, P34, was then compared to the average of the within-wave rates, p 3 1/4 (P12 + P23 + P45 + P56K Tn e difference between these two values, Pdiff = P34 - P. comprises the major variable of interest for this paper. Table 1 summarizes the results of a simple test of significance!/ carried out on each Pdiff for tne 17 income types across all sets of linked survey waves!/. The message of Table 1 1s unmistakeable. There is a strong and consistent tendency toward greater turnover in recipiency between survey waves than between months within a wave. Of the 85 p^ff observa- tions in Table 1, 78 are positive (i.e., P34 > "p). Sixty-nine of the differences are signi- ficantly positive, 51 are significant at the p<.01 level or beyond. In contrast, only one difference is significant in the opposite direction. Almost as obvious as the general trend in Table 1 are its two apparent exceptions. Six of the seven negative difference scores (includ- ing the only significantly negative value) are concentrated 1n two closely related income sources— educational benefits and Basic (Presented at the ASA Meetings, Philadelphia, PA, August 1984) 131 Educational Opportunity Grants (BE06). The only explanation we have for these outliers follows from the fact that they Involve one- time payments at the beginning of school terms. Thus, their receipt may be more easily "date- able" than other income sources, and the single payment means that accurate reporting can never produce more between-wave than within-wave turnover. Aside from these relatively weak exceptions, however, 1t is clear that the great majority of income sources display an exagger- ated turnover rate between survey waves. The important question then becomes: Why 1s this the case? Discussion Although it is perhaps the most commonly assumed explanation, response error is by no means the only possible source of the effects observed in this paper, nor is it necessarily the most likely source. In this final section, we briefly examine four potential contributors to greater between-wave than within-wave recip- iency turnover: real underlying trends, edit and imputation procedures, person mismatches in linking data from successive survey waves, and response error. Real underlying trends : Since this investi- gation is without the benefit of external vali- dating information, we cannot demonstrate conclusively that the observed results indicate "error" as opposed to reflecting accurately real underlying trends in the events being measured. Two facts, however, render the lat- ter hypothesis untenable: 1) a change in eco- nomic conditions or eligibility rules could produce an increase in recipiency turnover at a particular point 1n time, but it is difficult to imagine this happening periodically for a wide range of income types over an extended period of time; 2) the staggered interviewing schedule for the 1979 ISDP Panel (see Yeas and Linfnger, 1981) further reduces this likeli- hood, since each calendar month over the life of the panel served as the first reference month of a wave for one set of respondents, the second reference month for another set, and the third month for a third set. In other words, each reference month in a survey wave combines data from three calendar months, so that any real change effects are present only in diluted form in three reference months. Edit and imputation procedures : Three proc- essing procedures possibly contributed to greater recipiency turnover between waves than within waves: reformatting edits to simplify and make consistent various data fields, imputa- tion for person nonresponse, and imputation for item nonresponse. The only known problem with the reformatting edits is that they were carried out independ- ently for each wave; incorrect resolutions in the name of consistency thus may have artifi- cially reduced turnover within waves, while reporting inconsistencies between waves were ignored. Another edit decision which may have contributed to the phenomenon of less turnover within waves than between waves was the follow- ing: if at least one "yes" was reported for an Income type, and/or 1f at least one monthly amount was a valid nonzero amount, then any blank monthly recipiency Indicators were set to "yes" and any blank monthly amounts were imputed using the average of the amounts reported 1n other months. The obvious effect of such a procedure Is to reduce the apparent amount of change within a wave. Unfortunately, these edits were not Identified on the data file. As a result, the extent to which they affected the results presented here is not known, although their combined impact is likely to be small . Another possible contributor to the observed effect is the treatment of person noninterviews within interviewed households. Because there were, in fact, few such cases (only 298 in Wave 1), an imputation procedure was developed to substitute complete person records for the otherwise missing data. The procedure used reported demographic data as matching variables in a hot-deck assignment. Since each wave's data were processed 1nd^pendently, it is highly unlikely that an individual who was a nonrespon- dent in each of two consecutive waves would receive the same imputation donor for both waves. Consequently, some spurious wave-to-wave change could occur solely as an artifact of the independent processing. The same argument applies to the case of item nonresponse within a person's record. The presence of valid data in one wave and the absence of valid data in the next (or vice versa ) suggests possible problems for between- wave analyses because the ISOP imputation system did not take previous (subsequent) reporting patterns into account. In addition, if a respondent did not provide information for a specific item on two successive waves of inter- viewing, it is likely that different imputation donors provided the missing data in each wave. Mismatches : Technically, of course, although respondents do report month-to-month turnover within a survey wave, it is incorrect to refer to respondents' "reports" of between-wave turn- over. These events are created by the computer- ized process which links together the data for specific individuals across survey waves. To the extent that people are incorrectly linked, a certain amount of arti factual turnover may appear 1n the month-pair which connects the two waves. Preliminary simulation work suggests that mismatching need not be extensive to produce within-wave versus between-wave differ- ences of the magnitudes observed in Table 1. In fact, for most of the income types in this paper, a mismatch rate of 3 percent or less would produce an apparent increase in turnover quite comparable to the observed increase from within-wave month-pairs to between-wave pairs. It is impossible after the fact to determine the impact of person mismatches on the estimates of between-wave turnover in the 1979 panel. Returning to the discrepancy between the early Lepkowski and Kalton data and the subsequent 132 refined file, one Intriguing possibility Is that although the former produced fewer matches than the latter, the matches that were completed may have been relatively error-free. If this were the case— that 1s, 1f the Michigan group somehow skimmed off the definite matches—then the appearance of heightened between-wave turnover in the later data file may simply reflect increased match errors. Clearly, eval- uating the impact of match errors in turnover estimates from the SIPP will require maintain- ing data on the quality of the match for each person, perhaps in the form of a scale showing the number of variables which were identical across the linked waves. Response error : Perhaps the most common explanation for the effects observed 1n this paper Involves some form of recall bias. This was certainly Czajka's (1982) assumption. Presumably, a gestalt-like process operates 1n response to imperfect recall, leading respon- dents to report receipt for the entire 3-month period of a single wave as having been more stable than it really was. Such a process would work in two ways to produce more reports of between-wave than within-wave turnover: first, by reducing the number of w1th1n-wave turnover episodes (see Example 1); and second, by shifting the occurrence of turnover episodes to the between-wave period (Example 2). wave n wave n+1 Example 1 actual receipt: yes no yes no yes no reported receipt: yes yes yes no no no Example 2 actual receipt: yes yes yes yes no no reported receipt: yes yes yes no no no Although it 1s impossible with the available data to evaluate these notions directly, other research has demonstrated effects which appear to be related to the processes hypothesized to be at work here. Goudreau, Oberheu, and Vaughan (1984) report two results of interest from a survey of known AFDC recipients, first, those who failed to report receipt were likely to have received AFDC income for only part of the reference period of the survey. And second, the most common error in reporting income amounts was the tendency to report "the most recent payment for all three months of the reference period when payments actually varied" (p. 184). A second, related response error possibility can be examined using the present data. Accord- ing to this explanation, misreports of the type described above, while perhaps representing a general human tendency, are even more likely to occur when the respondent and the subject of the report are not the same person, and especially when different respondents provide the data for two consecutive survey waves. Table 2 summarizes the data regarding the role of proxy response 1n general, and changing respondents specifically, on elevated between- wave turnover. The results do not present a simple picture, but there 1s no evidence that self-response In consecutive waves erases the general effect observed 1n this paper. Note that with only one exception, all differences In column (c) are positive; that 1s, between- wave turnover 1s consistently greater than w1th1n-wave turnover even when attention is restricted to the constant self-response group. Nor, in fact, 1s there consistent support for the weaker argument that self-response might at least reduce between-wave/with 1n-wave turn- over discrepancies. As shown 1n columns (j) and (m), the weight of the evidence 1s in the opposite direction. Only for the two earned Income categories does proxy involvement strongly and consistently produce greater dif- ferences as compared to constant self-response. Why the two general income types produce such disparate results is not clear. A plausible par- tial explanation— at least for the both-self/ m1xed-self-and-proxy comparison — is that a true change 1n recipiency for earned income also changes a person's availability for interview. For example, those who are not employed may be more readily available to be interviewed for self than those who are employed. Receipt of unearned income, on the other hand, is not associated with with the likelihood of finding a person at home; thus, recipiency turnover for unearned income is not associated with a corresponding change in response status. Conclusion This paper has demonstrated the existence of some data quality problems 1n the 1979 Panel of the ISDP, at least when data are examined from more than one survey wave at a time. We have as yet no definitive explanation for these problems, but only a list of possible causes: edit, imputation, and processing procedures; matching difficulties; and response errors. It is likely, of course, that all contributed to the observed effects. Although modelled in many ways on the 1979 Panel, the SIPP has adopted several modifica- tions which may reduce the problem of heightened turnover in income recipiency between survey waves. First, the SIPP questionnaire includes procedures by which Information brought forward from the previous Interview can be verified and corrected, if necessary, at the time of inter- view. The identification and correction of incorrect information was not systematically addressed in the ISDP. Second, the SIPP exer- cises much tighter control on the sample than did the ISDP, through an improved control numbering system, and improved check-in proce- dures In Census Regional Offices. These new procedures should help keep mismatches to a minimum in linking consecutive survey waves. 133 In the future, as SIPP data become available we will monitor them closely for evidence of the type of problem we have demonstrated here. In addition, we will seek to ensure that data which might help pinpoint the cause of the prob- lem (for example, match certainty indicators and edit and imputation flags) are systemati- cally gathered and maintained. We are also planning a more active program of investiga- tion—a record check study matching selected SIPP income receipt and amount data with exist- ing administrative records. Such a study will contribute greatly to our understanding of the quality of SIPP responses, and will provide valuable direction to the development of any ameliorative actions to improve the quality of the SIPP. Technical Note on Significance Testing Procedures : The following assumptions guided procedures for testing the significance of the between-wave versus within-wave difference in turnover rates: Suppose five observations have common variance a' and common correlation p. Then the variance of the average of four 4q — 5 — © — — cjo—c oeoooeeoooseooo- «• o O n 9j N N n rj pg m q N f"; « * 3 ©«•■> — eooeooeoo — oo — s « q « q n q - q n n » q n oj n soooooosseooooo SOPH — o — ooooooo — o© — © 5 J "2 e O © © — e — e « e u ' uo>« eat u a) •I -ls-c|8 few- *llS>° es — ?ee «ai- *» • 5 c • $.«.«■ -B **** •»•"*»•»« ^ j lojaJeUaJQ. s — aiaaa I I - 8 8 135 z + x£i K o c }Z 0) & *w 0) W^. g e Q. «n «n ill" H- «- o o a. III £* £ ea«r 5 5 oi*j so oo N-8lfl»l««»»9i-09in» c>j — ooof««Nje>JO — ro — ew — & £12 a. a. co r-» «- 0) E UJ • i/i o 0)W «— ift © ooooooooooooeoo >- Z •s «^. «« 3 ♦ «W»>| d o>— OC-i fc gtf) o »t/> o c »-«> — >> ■o *— k *o^ m m •a e 5 * *» w *S8 it a. a—' ■ e 1-s ttfS o £-1 t g§ n q q s q q -; q o o - q « o 5 < v> IS £ 2 •5 (. 0) — ) St ~ 00 — aooooooooooosso < 9) ^ It- UJ § vq— rg — ooooocsjiqroM — iq «z IX o> "O— e*o ooooooooooooooo ^ > a. »» 1 01 o N °31 e> esi — © «sj — — «g"fo.m«e»»-«5t«rtO>©wpgvom© i»jC«jr«. — — — ©© — wr«;«n — ODfi^ 5 o> a.— r> r*> sooo'eeooosooo© — rfl | < "" It- * eo ^©o'ooooooeooooo §i lal 3 01 01 "O- > *.~. > a. ee */i ■w 1 u e e < 11 i E a. S3 ^3 |-2 cqoj mo — ©co^ev^^©esj«re>o© — Sw 000000000000090 1- 0) 01 1 NK pj^O — *©^-j©©oj»»foifl UJ — «s %— >o £ |F r»ew ©©oooooooooooso Sfc ^ 01 ' ll j= i « ggs N*-neooini«»iftgNO"i cJo eeoeeoooeooooMO =a 3 c & s> f»f\ir»©e*"*r» — no)KieiA>«n • 2 S7 iaia cjwojc^c^^^cj — "JTI^J^J ! a.< 1 01 a.«— tecJ — ooooooooes — oo — o >2 1 * I! Ol SB — e e e <*» "c o 5 >> E £2 *» M M 01 « e e oi 2| 1 in O 01 M > — e — a.*> — * 01 *» — *» s si • iOt * §« — J3 IS «£ O £1* s« « x < — 2* ll I wb.'a.L.l ^o"*.. *«o«i ^ s 8>t •" ~ §2 <•> -TB 1: oS 3 * a. >> 1 g gg ■3 0) ** i - U € 5 2 lo' §1 1 B ci J? 0> V ** 3 I II 1 e Si 01 W c s B« :i • 5! 1 *3 ^ 5h 5?l 136 The Student Follow-up Investigation of t Anthony M. Ron-ian, Diane V. O'F I. .und The Income Survey Development Program (ISDP) was a research and development program established in the mid-1970' s by the Department of Health, Education and Welfare (HEW) in conjunction with the U.S. Census Bureau to prepare for the upcoming Survey of Income ind Program Participation (SIPP). The SIPP is the new survey conducted by the Census Bureau designed to satisfy a wide variety of data needs concerning the economic situation of persons and families living in the United States. Data collection for the first SIPP survey, the 1984 Panel, began October 1983. The major purposes of the ISDP were the same as the goals set out for the SIPP: to improve current estimates of income and income change; to extend the scope and precision of policy analyses for a wide range of Federal and State tax and social welfare programs, and to broadly assess the economic well-being of the population. 1 The ISDP conducted four field tests. All were experimental in nature as different concepts, procedures, questionnaires and recall periods were tested. The 1979 Research Fanel effort conducted by the ISDP. The 1979 Panel was a nationwide household survey with a total sample of 11,800 households drawn from 130 Census primary sampling units (PSUs). Of this total, approximately 9300 cases were selected from an area sample and 2500 cases were drawn from list samples. Data collection began in February 1979 and ran through June 1980. One-third of the sample households were interviewed each month durins the interview period. information was obtained on household composition, labor force participation, various sources of Money and nonmoney income, taxes, assets and liabilities, and other related topics. The 1979 Panel included many controlled experiments which tested alternatives for basic survey design. The major tests conducted were: household versus individual questionnaire format; self versus proxy respondent rules; and 3-month versus 6-month respondent recall. As part of the research effort to test respondent rules, one unresolved issue concerned proxy interviews taken for college students not living at their parents' address. In order to test the validity of information collected for this type of proxy interview, an experiment was conducted during the November and December interviews of the 1979 ISDP Panel. This experiment was called the Student Followup Investigation. This paper discusses the objectives, design, and field procedures used for the investigation, and some preliminary results of this experiment. II. Purpose Respondent rules during the 1979 Research Panel were to conduct a personal interview for each adult household member 16 years or older. If a self-response interview could not be obtained, the procedure was to accept a proxy interview from another household member who wa: knowledgeable about trie absent person. In thi: survey, as in other Census surveys, students were considered as members of their parents' households until they established a permanent residence elsewhere. Therefore, the usual procedure for students living away from home while attending school was to treat them as household members who were temporarily absent and obtain proxy interviews from other members of their parents' household. ire (Wave 4) .■>.£ a special econdary elf The fourth interview questii used during the 1979 Panel con set of questions concerning po educational enrollment and expenses.'" ' interview seemed especially approoriati studying the quality of proxy interviei students, as compared to the student's interview. In order to measure the accuracy of information taken from proxy interviews for students living away from home, the fourth interview was first obtained by proxy at the parents' household, and thor. by self interview at the student's school residence. This self-response interview is referred to as the student followup interview. There were two basic purposes for conducting the Student Followup Investigation: DTo obtain the most complete and accurate information possible for items in the Education Expenses section of the Wave '4 questlonnaira (such as school enrollment, tuition, fees, and living expenses), and 2)To determine whether proxy respondents at the sampled address are able to provide reliable information on labor force participation, income, education expenses and enrollment for students living away from heme. This experiment icted by comparing the information , bl i III. ■ ed fr< i the s' home to the at the student': xy in- self-i i takei the addr< iceduri jgned to identify The fourth intervi. November and December students living away I while attending school. Only students who were actually btaying at their school residence (either a dormitory, fraternity house, apartment, etc.) curing the time of the November or December interview were eligible for followup. Census nal i thi: instructed to 24 hours of household interview had been administerei absent and living aw, interviewer would th. the student's . or- nd : student who wa: hool. The de the office ; hool address. all Ith Census regional offices were responsible for the control and assignment of the student followup cases. The rules for assigning ch« cases were essentially the same as the ISDP rules for movers. If the student's school address was within 50 miles of an ISDP PSU, the 137 the student for an interv n interviewer visited iew. Regional office were instructed to always interviewer for the stude employ a different nt's interview in ord to eliminate any intervie Additionally, interviewer accept only self-response student's school address; from roommates or friends wer bias. s were instructed to interviews at the no proxy responses were allowed. IV. Field Results The analytic universe for the study was the totality of students in the 4th Wave of the 1979 Panel who usually lived away from home and were attending post secondary schools. 2 There were 443 such students identified. Of these, 117 (26.4 percent) were not eligible for interview since the school residence was more than 50 miles from an ISDP P3U and 54 (12.2 percent) were not eligible because the student was staying at home during the time of the 4th Wave interview. Of the 272 cases assigned, 202 student followup interviews were obtained yielding a response rate of 7A.3 percent. Of the 70 noninterviews, 6 were cases in which the parents refused permission for the interviewer to contact the student. The major reason for the noninterviews was that many students were not staying at their school address (because of Thanksgiving, Christmas and semester breaks) by the time the interviewer received the followup assignment. Although interviewers were allowed until the first week of December to obtain the followup interviews for students identified in November and until the second week of January for students identified in December, many students remained on some type of break later into December and January. This proved to be an inappropriate time of year for conducting interviews with students at their school address. However, in the case of the 1979 Panel, we overlooked this factor in the survey- design in order to conduct the experiment in conjunction with the Education Expenses questions, which were set beforehand for the Wave 4 interview. A recommendation for future studies involving students interviewed at their school address is to obtain the school address in a previous wave's interview. This would allow interviewers more time to contact the student. V. Preliminary Finding s A. Data Set Creation The first task in analyzing these data was the creation of a data set of matched responses from the student followup questionnaire and the proxy questionnaire administered during Wave 4 cf the ISDP. During the matching process, 35 students (17.3 percent) could not be matched to the Wave 4 ISDP File. Attempts to reconcile the mismatches were unsuccessful. In all but one instance, the most basic identifiers for these 35 students did not exist on rhe Wave 4 ISDP File. Due to the time elapsed from the initiation of this followup study to the creation of the analysis data set, it has been extremely difficult to find out why these be aware of these problems and pr< them. Omitting the 35 mismatches data set of 167 matched responses are analyzed ii The : data is report. In all but two instances, the variables analyzed are direct responses to questions on the ISDP form (i.e., they are not in any way computed) . The only exceptions are "usual hours worked per week at all jobs" and "total pay before deductions from all jobs last month". These two variables are computed by summing the response from each reported job. B. Relationship of Student to Proxy Respondent The relationship of the student to the respondent serving as his/her proxy can be determined in most cases through their relationships to the household reference person. The reference person is that household member who is stated as owning or renting the residence. Table 1 indicates that in 84.4 percent of the cases, the proxy was a parent of the student. This follows the expected pattern. C. Wage and Salary Comparisons The ISDP questionnaire was divided into several sections. One section was designed to identify receipt of income types while other : obt; ined i series of wage and salary questions if they indicated in the recipiency section of the questionnaire that they worked at a job or business. Ore wage and salary record was created containing responses to the set of wage and salary questions fc each job named. Thus, if a student had only one employer, a wage and salary record should have been created with the student's responses while another wage and salary record should have been created with the proxy's responses. The reference period used in the ISDP was the previous 3 months, but tiie wage and salary records were created on a job basis. Therefore, a reported job could have been held at any time during the 3-:nonth reference period. In examining the 167 matched cases of self and proxy responses, the following breakdown of wages and salaries was observed: 83 had at least one self and one proxy record 53 had neither a self nor a proxy record 27 had a self but no proxy record 4 had a proxy in*-, no self record If one assumes that the self response is correct, then the proxy failed to identify a job held by the student in 27 cases ^24. 5 percent). This appears to be rather substantial and indicates a potential source of underreporting of wages and salaries with proxy response. The 4 cases in which a proxy record exists while no self record exists may be interpreted as a potential source of mi sreport ir.g wages and salaries under proxy response. In attempting to analyze particular wage and salary questions of interest, several conditions must be kept in mind. While 83 matched cases exist with both a self and pro::y wage and salary record, the number of cases available for making comparisons for any particular question may be less. There are two primary reasons c or this: 1) one interview may have proceeded in a fashion 138 sked the question of inter, e was coded, while the oth< ed in a fashion which did ; als que (i.e due the not ask the ) , and 2) e idered < ki P patterns within the questic though the question of interest may have been asked during both interviews, one may have resulted in a valued response 3 while the other did not. Valued responses are important in evaluating the quality of data obtained in a survey. They indicate both knowledge by the respondent of the investigated subject matter and willingness to cooperate in the survey. With this in mind, the percentages of coded responses which were valued (i.e., given that a question was asked, the number of times it resulted in a valued response) are presented in Table 2. It is seen that for several wage and salary questions, it is more likely that a self respondent will give a valued response. This is particularly evident with the "usual hours worked per week" and "hourly rate of pay" variables. In all but one instance, when a valued response was not given, a "don't know" was the recorded response. Table 2 also presents the mean value of self responses for seven wage and salary variables for three particular categories: 1) the proxy could not identify that the student had a job (i.e., nc proxy wage and salary record existed but a self wage and salary record did exist; 2) the self response was valued while the proxy response was not (e.g., the proxy most likely responded "don't know"), and 3) both self and proxy responses were valued. This table demonstrates that a pattern appears to exist in which proxies best identify jobs at which strdonts earn the most mon-ay or work the most hours. The smaller the earnings or hours worked, the more likely the proxy will either not be able to identify the job or not be able to answer detailed questions about the job. Several points should be noted concerning Table 2. The usual hours worked per week may seem rather high for student jobs. This is due to the reference period for these questions extending back into the summer months. Therefore, summer jobs in which the student may have worked 40 or more hours per week will be included in these summaries. This also explains the decreases in total monthly pay from three months ago to last month. Also, it is impossible to compute total monthly pay by using the usual hours worked per week and regular hourly rate of pay. This is because the values presented in these tables are means and concern the student's primary job. One student's primary job may have been three months ago while another's may have been last month. The final table, Table 3, presents comparisons of the self and proxy valued responses. It should be noted that the estimated variances used in computing these confidence intervals do not take into account any sample design effects. The reason is that this analysis is considered preliminary and will be used to decide if a more lengthy detailed ntec . The net r suit is i) T< b] 3 hould be ive w ile concl sions o f ces sh uld be co sidered n o d sig i effe ts may add significant differei liberal. Computati< a small degree of accuracy to results from this study, but it should be noted for future studies, that increased emphasis on obtaining responses from all sample students and their proxies would greatly enhance the accuracy of results. Of the seven wage and salary variables analyzed, two showed a significant difference at the .05 level. These were "usual hours worked per week" and "regular hourly rate of pay", both for the student's primary job. In both instances, the proxy gave the larger valued mean response. It is interesting to note that for "usual hours worked per week at all "jobs", the mean self and proxy responses are not significantly different. This raises the question of the proxy and student possibly identifying different jobs as being primary. D. Education Expenditure Comparisons All 167 matched c.i3es had both a self and a proxy educacion expenditures record, but 61 of these records were unavailable for this preliminary analysis. This was due to a flaw discovered in the manner in which the Wave 4 ISDP data were processed. Only rekeying of the questionnaires could retreive the data and this was deemed unwarranted at the present time. Therefore, 106 matched records were available for analyzing education expenditures. Table 2 again presents the percentage of responses which were valued. It is obvious that a valued response is much more likely from a self respondent than a proxy respondent. This seems understandable for all variables except "amount paid by family on tuition ar.d fees" since the other variables involve expenditures most likely handled directly by the student. In every instance that a valued response was not given, "don't know" was the recorded response. Table 2 displays the mean value of self responses both when the proxy has a valued response and also when the proxy response is "don't know". Three of the four variables considered do not appear to differ substantially between these two categories. Only the "amount paid by family on tuition and fees" exhibits a rather large difference with the mean ;elf response being greater if the proxy has a valued response. This is consistent with the wage and salary results in that the more expensive the tuition, the more the proxy is likely to know about the amount. It may also help explain why so many "don't knows" were given by proxies in response to this question. Perhaps when the amount of tuition is low, the student is more likely to be directly involved in its payment ( e . g . , the student may pay the tuition from support supplied by the parent). Table 3 again presents results of comparisons of self and proxy valued responses. Two of the four variables showed a significant difference at the .05 level. They were "academic credit hcurs taken this term" and "cost of course materials". In both instances, the mean proxy 139 response was larger, was also larger for "i fees" but with a largi statistically signifii be detected. E. Other comparisons Two additional are; this preliminary anal; nount paid c estimated i ant differei to 1 .aly. nly two had ( ough . conce; received. These were: Basic Educational Opportunity Grants (31 cases) and Government Scholarships, Fellowships, Etc. (11 cases). The results of comparisons of self and proxy valued responses are shown in Table 3. No significant differences in mean amount received were found for any of the assistar.ee variables. The last area investigated was receipt of interest income. Reporting of interest was handled in the ISDP questionnaire in the same That is, a persoi 'as asked a series of quest mounts of interest if they nterest income in the reci uestionnaire. For tho 167 ■sported interest was 104 cases had both foi: is regarding idicated receipt of mcy portion of the itched cases, the and proxy report 30 cases had neither a self nor proxy report 27 cases had a self but no proxy report 6 cases had a proxy but no self report Assuming the self response is correct, the proxy failed to identify that the student would have interest income in 27 cases (20.6 percent). Although this appears to be a large problem, interest income is poorly reported for all people. For example, in the 104 cases in which both the self and proxy respondent reported receipt of interest earned on the student's own accounts, 61.0 psreent of the coded self responses were "don't lenow" while 81.5 percent of the coded proxy responses were "don't know". Considering the question on interest earned on the student's shared accounts, 69.4 percent of the coded self responses and 80.0 percent of the coded proxy responses were "don't know". Obviously, it appears that the quality of interest data for students is suspect regardless of whether a self or proxy interview is conducted. VI. Conclusions The aim of this preliminary analysis was to examine the self and proxy student data in order to decide if a more extensive investigation (e.g., effects of accepting proxy responses on overall survey estimates) seemed warranted. Any inferences drawn from these data should keep in mind that the estimated variances did not reflect any sample design effects and that the size of the data set is quite small. Indeed, most comparisons were based on less than 100 observations. Still, this study is unique and although somewhat flawed in administration and implementation, it is possible to make certain general remarks. When valued responses are available from both the self and proxy interviews, the quality of the proxy responses appears to be generally quite gooa. Substantially more data would be needed to derive better estimates of the difference between self and proxy response and to narrow the confidence intervals around these estimates. A problem that does appear to exist is in obtaining a valued proxy response. Quite often, a proxy cannot identify a particular source of student income (e.g., wages and salaries) and even if they can identify it, they are more likely to respond "don't know" to the particulars about that source. A trend does seem to exist that the larger the income or expense, the better the proxy response becomes. Still, this implies that by using proxy responses, the lower range of income or expense amounts are more likely missed. Finally, the main issues involved in interviewing students away from home are the impact of accepting proxies on overall survey estimates and the differential costs involved in obtaining self responses. Since no cost data are available from this study, an estimate of the additional amount required in obtaining self responses cannot be computed. It may be possible to make some very general comments about the potential impact of accepting proxies on overall survey estimates. Students living away from home make up less than 3 percent of the overall ISDP sample. With this in mind and the fact that results from this study indicate that proxies are more likely to miss only the smaller expense and income amounts, it may appear unlikely that overall survey estimates will be strongly affected. Still, the limitations of the sample involved in this study must be considered in any statement of results. For instance, students living more than 50 miles from an ISDP PSU were omitted from consideration. Also, problems were encountered in matching students to proxies and in losing some survey data due to a processing flaw. The effect that these students could have had on results from this study is unknown. In concluding, further detailed investigation of this particular data set is not recommended due to the limitations in the size and composition of the sample. Future study may lead to stronger results but based upon this preliminary investigation, it is recommended that while the self-proxy student issue should not be forgotten, it should not occupy a high place on the SIPP research agenda. FOOTNOTES 1 Research Triangle Institute. 1983. The 1979 ISDP Research Panle Documentation. National Technical Information Service, Washington, D.C. 2 Since Wave 4 of the 1979 Panel was administered over a two month period, only two-thirds of the 11,800 household sample was interviewed, making the Wave 4 sample size approximately 8,100 households. 3 Throughout this report, the term valued response is used to imply any response with a legitimate value for the question asked. Valued responses do not include refusals, don't knows, or responses whose value is considered cut of ■ nge < othei 140 hild of other rel. ef. pers. of ref. per, u nknot chi ref Id pe oth of er f. >1. pe Table 2 : Results for Wage & Salary and Education Expend!* % of coded cases M ean valu e of ; Jlf responses w hen: 2 Wage and Salary Usual hours worked per week, at primary job which wer i valued:! proxy could proxy did both self not identify not give and proxy self proxy that student a valued had valued espor.se response had a -job resDonse responses 93.7% 76.4% 22.30 hrs 21.11 hrs 35.60 hrs (n=76) (n=55) (n-20) (n=38) lved a cont rolled responde nt rules. iewed u ing rules except in s pecial intervi ewed under s, whereby proxy "standard" respondent interviews were accepted for absent persons from other household members when convenient. A number of different analyses have been con- ducted to date in an effort to study the ef- fects of these proxy respondent rules and self-respondent rules on data quality, non- interview rates, and costs of data collection (e.g., Coder, 1980; Kaluzny, 1981, 1982; Kulka, 1983). In general, while the use of self- response rules results in approximately 20 percent more self-response (85 vs. 65 percent) and 4-6 percent higher interviewing costs than standard respondent rules, results on nonre- sponse and data quality are mixed. While the proxy treatment had a positive effect on house- hold and person interview rates, self-respon- dent rules apparently resulted in somewhat better data (as implied, for example, by the greater use of records, lower item nonresponse for certain key items, less rounding, and less variance in non-zero amounts), although some of these effects appeared to be somewhat smaller by Wave 2 (Kaluzny, 1982). Unlike the "forms" experiment, the self-proxy xper ed throughoi all waves of data collection, so that any potenti; influences on data quality or increases in tl variance of key variables due to this exper: mental factor are likely to be found throughoi the database. Yet, with few exceptions, tl longitudinal implications of these alternate respondent rules have not yet been invest: 145 gated. Changes in the proportions of proxy respondents or in the characteristics of proxy vs. self-responders under the two conditions may vary over time, thereby confounding some- what longitudinal analyses of variables espe- cially sensitive to respondent rules. More generally, suppose that a comparison of the 1979 Research Panel to other survey data sug- gested that the former provided more accurate data relative to an independent source of information. Suppose further that this im- provement was directly attributable to the use of a maximum self-response rule (i.e., under regular proxy rules estimates were similar). Without making such an assessment, however, one might assume that in general the ISDP design results in better data and generalize this assumption to the SIPP, an erroneous presump- tion, of course, unless the SIPP were to employ rules maximizing self-response. Moreover, it is not difficult to see how such a methods "artifact" might similarly influence important relationships among variables of major policy The third experiment compared property or asset income amounts reported using a three- month reference period with that reported for a six-month recall period. The basic objective of this experiment was to determine if infor- lected every six months would be as accurate as that collected quarterly. Results of this experiment would provide evidence on the magni- tude of loss with the longer recall period (a critical ingredient in justifying the current four-month recall design for the SIPP), but very little of this analysis has been done to date (cf. Czajka, 1983). The very reason for conducting this experi- ment, however, implies increased variation in reported amounts of asset or property income due to differences in length of recall period. Since the preceding three months are reported with an identical recall period by both groups every other wave, the influence is not constant for all months. Thus, substantive analyses of a "common" three-month period may yield differ- ent results than that of a similar period where recall is three months longer for one group than the other. Similarly, quarter-to-quarter variation in asset income reporting may be greater within the six-month reporting group than within the three-month subsample. More- over, since asset income recipiency is reported quarterly, the expected influences would likely be on asset income amounts, but a longer re- porting period for "amounts" could also have an indirect adverse effect on reports of recipi- ency as well. In addition, if a recall effect of either type is present, such effects may either dissipate or increase in magnitude over the life of the panel (through Wave 5). A fourth experiment, afforded by the use of a "staggered" interview design in which each quarter's interviewing was spread over three months with a variable three-month reference period (see Table 1), provided for a systematic comparison of income and other information reported for several months during the year using a one-, two-, or three-month reference period. Although the staggered design was not adopted for this reason, it provides a "na- tural" experimental design for the assessment of potential monthly recall bias by length of reporting period for virtually all income types and a wide variety of other variables. To date, however, only preliminary analyses of this natural recall experiment have been con- ducted (Kaluzny, 1981, 1982; Czajka, 1982), none of which have provided consistent evidence of i systema Neverthele: all effect implications of this month- ly interview design for substantive analyses are considerable. On the positive side, the staggered interview procedure provides an ongoing measure of monthly recall bias, and to the extent that such bias exists, the varied recall period tends to minimize its effect (relative to more typical quarterly interview- ing) when making comparisons of monthly changes, since income and other monthly data were always collected with the same average length of recall. On the other hand, the staggered approach introduces some substantial problems with regard to missing data and re- sponse variance for monthly Poir for .nth and the are made with higher vai gered approach requires that calendar quarter estimates for two thirds of the sample be derived from data collected in two separate interviews, resulting in greater levels of missing data, linkage problems, and increased month-to-month variation within quarters. For example, recent analyses of data from the 1979 Research Panel indicate a degree of variation in quarterly earnings greater than seems rea- sonable, and month-to-month changes in income ally tend to be greater between in the reference period re- d within each interview (David, 1983:11; and Kasprzyk, 1984). final area of controlled experimentation cipiency gen< involved a compari: on o f rep arts by self- responden ts of their subj sctive economic well- being as essed by using eithe r a seve n- or ten-point attitude sc ale. Preli ninary an alyses of these data (Vaughan and Lancaster, 1980) suggested that the tt n-poi nt sea .e achieved the desired e ffect of sp eading out their dis tribu- tions and reducing the pos itive skew asso ciated with the seven-point sc ale, and addi tional analyses with a few vara ables suggeste i that this was primarily v< lid v arianc e. In li ght of these fi tidings, it would appe ar that these economic well-being measi res m ay indeed vary significa titly accord ng to which response scale version is used, and substa itive an alysis involving these vari< bles must t ake such varia- tion into account, sithei by conducting sepa- alys< 'ith f including in th< variable indicating which of thes( Feasibility Tests In addition to the formal experimental com- parison of self-respondent versus proxy respon- dent rules, two other more specialized respon- dent tests were carried out. One examined the accuracy of information collected for students living away at school during the interview period by administering the fourth wave ques- 146 tionnaire twice for absent students -- once by proxy at the parents' address and a second time in person at the school address. The basic objective of this study was to evaluate both differences in reporting and the additional burden imposed on field staff when students were followed to their temporary addresses. With regard to the latter, over one-fourth of the students identified lived at a school address outside the sample area, and of those assigned for follow-up, only 74 percent were interviewed, with most nonresponse due to inability to contact respondents at their school addresses. Preliminary analyses of data from the students interviewed indicate that when "amounts" or details are available from both the self and proxy interviews the quality of proxy responses is generally quite good, but proxy respondents are frequently unable to provide a "valued response" at all (cf. Roman and O'Brien, 1984). In general, then, proxy data obtained for college students are clearly somewhat incomplete, but most analyses of data from the 1979 Research Panel should not be greatly influenced by these deficiencies, with the possible exception of those which rely on special subsamples containing a large propor- tion of college students and focus on variables especially prone to such proxy reporting error. The second respondent test examined the feasibility of using off-line mail-back surveys for obtaining quarterly estimates of nonfarm self-employment income from respondents owning a business or professional practice. Because of poor response rates, this particular effort to measure subannual self-employment income was abandoned after the second quarter. Although some substantive analyses have been conducted using these data (e.g., Whiteman, 1983), meth- odological analysis took the form of additional experimentation with alternative procedures in an effort to improve this performance, none of which were very successful. The major implica- tion of this feasibility test for social or policy analysts is that data on subannual self-employment income collected in the 1979 Research Panel are generally regarded as defi- cient. The staggered interview design (mentioned earlier) , which roughly tripled each inter- viewer' s experience with a form, was itself a feasibility study. In addition to routine quality control interviews, an expanded re- interview program was initiated to determine whether such increased interviewer experience with the questionnaire and with the survey in al i suits in loi Resi conducted to date provides little support for the proposition that monthly interviewing resulted in substantially improved field per- formance or data quality. Should such differ- thes. uld ot be then if all conducted the first week of the calendar qua; ter, for example. Two other feasibility tests incorporated : the 1979 Research Panel were designed to e: plore issues related to linkage of survi responses with data in administrative records systems. First, since the Social Security Number (SSN) is the identifier in most general use, a project to determine valid SSN's for sample households was conducted using two rounds of validation, both including a computer match and manual search of Social Security Administration (SSA) administrative records. Through the Use of those procedures and ex- ploiting the panel design to obtain corrected SSN's from the field in later interview waves, valid SSN's were determined for 95.5 percent of the cases included in the project, a rate that might be improved with minor modifications in the future (Kaspryzk, 1983b). As a result, should access to administrative records systems be granted, substantive analysis using survey information linked to records data -would be possible for a high proportion of persons sampled in the 1979 Research Panel. Second, two distinct projects were undertaken to examine the feasibility of linking 1979 Research Panel data to benefit records of the Supplemental Security Income (SSI) program. The first involved a match of survey and admin- istrative records using the 1979 Research Panel SSI subsample and SSI administrative tapes in order to validate information common to both sources and enhance the survey database. Overall, 3,950 sample persons in the 1979 Research Panel were matched with the SSI data sets, yielding a final match rate of 99 per- cent. However, analyses of data quality on this survey-administrative data match have not yet been conducted, and, since these list frame sample cases are not included in the public use microdata files (NTIS, 1983), this project is of par ular data analysts. Similarly, the second linking pro- ject, an SSI "domain match", was designed to determine the number of persons included in the panel through the area and Basic Educational Opportunity Grants (BEOG) subsamples who were also in the frame used to select the SSI sub- sample. Employing a match indicator code algorithm using validated SSN's, in combination with name, race, and date of birth, a reason- able atch -ed, albe: longer time period than would be required to support multiple frame estimation (Kaspryzk, 1983b). Since only the area frame cases are included in the public use files, however, multiple frame estimation is not required for substantive analyses of these data. Finally, in an effort to determine the incre- mental costs of following movers (an integral feature of the survey design for the 1979 Research Panel and the SIPP) , interviewers were asked to keep a systematic record of their mileage and time spent in discovering, locat- ing, and following up persons or households that moved. A detailed analysis of this Mover's Cost Study is presented by White and Huang (1982), who (among other things) reported a mover household follow-up rate of 76 percent (with an eligible person interview rate of 92 percent in interviewed households) and a cost increase of approximately 8 percent attribut- able to following movers. Of particular inter- est to potential policy analysts of these data is that nearly 78 percent of the 1979 Research 147 Panel Wave 6 sample households had never moved. Relative to other longitudinal databases where movers are not followed, then, sample attrition due to this factor is clearly lower in the 1979 Research Panel, and estimates involving vari- ables related to residential mobility less subject to such nonresponse bias. CONCLUSION In clusi< should be clear that, aside from their great analytic potential, some of these tests and experiments may also have a deletorious or confounding impact on certain substantive analyses that might be conducted using data from the 1979 Research Panel. And, in a few cases, these methodological have some positive implications for such analyses as well. Such implications range from some obvious defici- encies in some of these data highlighted by these field tests to more subtle influences on data quality and variances due to the experi- mental treatments imposed on the survey design. Especially with regard to the latter, the positive benefit of including such methodologi- cal tests in the survey design is that the potential influence of such factors on substan- tive results from this survey may be directly assessed in data analyses. It is important to note, however, that if these factors are not so examined their influence may lead to distorted or spurious conclusions. By describing some possible road blocks that these tests and experiments may throw in the way for substan- tive analysis and interpretation, this paper has sought to illustrate the need for both consumers and analysts of these data to keep their methodological nature clearly in mind and, where possible, to assess directly the potential influence of these factors on re- search results. REFERENCES Czajka, John L. 1982 "How complete was food stamp reporting in Wave II of the ISDP Panel." Manu- script. Washington, D.C.: Mathematica Policy Research, Inc. Czajka, John L. 1983 "Subannual income estimation." Pp. 87-97 in Martin H. David (Ed.), Technical, Conceptul, and Administrative Lessons of the Income Survey Development Program (ISDP). New York: Social Science Re- search Council. Coder, John F. 1980 "Some results from the 1979 Income Survey Development Program Research Panel." Pp. 540-545 in Proceedings of the Section on Survey Research Methods, American Statistical Association. David, Martin H. 1983 "Measuring income and program parti- cipation." Pp. 3-20 in Martin H. David (Ed.), Technical, Conceptual, and Adminis- trative Lessons of the Income Survey Development Program (ISDP). New York: Social Science Research Council. Kaluzny, Richard 1982 Evaluation of Experimental Effects, 1979 Research Panel—Wave 1. Final Re- port. Princeton, NJ.: Mathematica Policy Research, Inc. Kaluzny, Richard 1982 Evaluation of Experimental Effects, 1979 Research Panel—Wave 2. Draft Final Report. Princeton, NJ.: Mathematica Policy Research, Inc. Kasprzyk, Daniel 1983a "Some research issues for the Survey of Income and Program Participation." Pp. 686-691 in Proceedings of the Section on Survey Research Methods, American Statis- tical Association. Kaspryzk, Daniel 1983b "Social security number reporting, the use of administrative records, and the multiple frame design in the Income Survey Development Program." Pp. 123-141 in Martin H. David (Ed.), Technical, Concep- tual, and Administrative Lessons of the Income Survey Development Program (ISDP). New York: Social Science Research Council. Kulka, Richard A. 1983 "Tests and Experiments." Pp. 4-1 to 4-39 in P. Nileen Hunt, et al., ISDP 1979 Research Panel Documentation. Spring- field, VA: National Technical Information Services. Moore, Jeffrey C, and Daniel Kasprzyk 1984 "Month-to-month income recipiency changes in the ISDP." Paper presented at the annual meetings of the American Sta- tistical Association, Section on Survey Research Methods, Philadelphia, August 13-16. Roman, Anthony M. , and Diane V. O'Brien 1984 "Findings from the student follow-up investigation of the 1979 Income Survey Development Program." Paper presented at the annual meetings of the American Sta- tistical Association, Section on Survey Research Methods, Philadelphia, August 13-16. Vaughan, D.R. , and C.G. Lancaster 1980 "Applying a cardinal measurment model Synopsis of a preliminary look." Pp. 546-551 in Proceedings of the Section on Survey Research Methods, American Statis- tical Association. White, Glenn D. , Jr., and Hertz Huang 1982 "Mover follow-up costs for the Income Survey Development Program." Pp. 376-381 in Proceedings of the Section on Survey Research Methods, American Statistical Association. Whiteman, T. Cameron 1983 "Lessons to be learned from ISDP: The measurement of nonfarm self-employment income." Pp. 73-82 in Martin H. David (Ed.), Technical, Conceptual, and Adminis- trative Lessons of the Income Survey Development Program (ISDP). New York: Social Science Research Council. Yeas, Martynas, and Charles A. Lininger 1981 "The Income Survey Development Program: Design features and initial findings." Social Security Bulletin 44 (November): 13-19. 148 SOME DATA COLLECTION ISSUES FOR PANEL SURVEYS WITH APPLICATION TO THE SURVEY OF INCOME AND PROGRAM PARTICIPATION Anne C. Jean and Edith K. McArthur, Bureau of the Census Introduction: Need for a Longitudinal Survey. The Survey of Income and Program Participation 1s designed to collect data which will Improve our understanding of the income distribution, wealth, and poverty 1n this country. Information collected 1n the survey win be useful for planners and program administrators 1n areas such as income support programs and health care. The survey is longitudinal in the sense that the same persons are Interviewed periodically over an approximately 2 1/2 year period. This implies following persons and updating Information that reflects changes in their lives and 1n the com- position of the households of which they are members — before, during, and after these changes occur. Persons in SIPP are interviewed every four months. At each interview, household members 15 years old or over are asked to report on income sources, amounts and employment for each of the previous four months. With SIPP data we will be able to observe the effects over time of chanaes in receipt of dif- ferent types of income upon the total income of a household; we will also see the effects of household composition change, such as the birth of a child or a marital separation, on partici- pation in Federal transfer programs. In the past, analysts have often relied upon the income data collected 1n cross-sectional surveys, such as the March supplement of the Current Population Survey (CPS). The CPS describes household member- ship at a point in time, while obtaining income data for the entire previous calendar year. These data are conseauently dependent on the household respondent's recall of events over the whole previous year. Thus many assumptions are made and monthly data cannot be collected accurately. Implementation of a Longitudinal Survey. The 1984 panel is the first panel of the SIPP. During the four months constituting Wave 1, that is October 1983 through January 1984, Census interviewers visited approximately 26,000 addresses located in 174 primary samplino units (PSUs) nationwide. The addresses were evenly distributed among four rotation groups, and each month one rotation group 1s assigned for inter- view. Nine interviews at four month intervals were scheduled for three rotations; the fourth rotation was scheduled for eight Interviews. The shift from an address sample for the first visit to a person sample in subsequent visits presented unique challenges to the planning staff, regional office staffs, and interviewers. Updating procedures for the address listings, nonlntervlew classifications, Interviewing pro- cedures, and many other activities required for surveys maintaining an address sample were not appropriate. New controls and follow-up pro- cedures, some requiring Interregional office cooperation, were Implemented. Interviewers received extensive training on new nonlntervlew classifications and movers' procedures. Office staff maintained extensive clerical controls to guarantee the receipt of control cards and questionnaires from Interviewers and to monitor the processing of over 40,000 person records that were uniquely identified. The remainder of this paper describes the Wave 1 field procedures associated with the Implement- ation of the address sample and the follow-up procedures for subsequent waves. Included is an explanation of the SIPP Identification system and those field operations desiqned specifically for sample maintenance and control. Some prel- iminary results of the 1984 panel follow-up are given and finally proposals for improving the follow-up system in future panels are discussed. Wave 1 Address Sample Procedures . Field activities for the first SIPP interview were similar to operations, undertaken for other major surveys that are basically cross-sectional, such as CPS and the National Crime Survey (NCS). Interviewers listed specific addresses of living quarters either prior to or at the time of the interview visit. Reasons for differences between the number of expected units based on census address lists and the number of units listed by the interviewer were researched by the office staffs. During the first interview, the address was verified, the unit was classified as a hous- ing unit or OTHER unit according to census defin- itions. Coverage questions were asked to deter- mine if EXTRA or additional units were located at the address, and the interview status of the address was recorded. The interview status distinguished interviewed households from noninterviews. Noninterviews were further classified by type. For example, Type A noninterviews include all eligible house- holds for which interviews were not obtained, such as refusals or cases where no one was home each time the interviewer visited. Types R and C noninterviews were recorded for addresses con- taining no eligible household such as vacant addresses, or units under construction or being demolished. In an interviewed household the interviewer listed all persons currently living or stayinq at the address, and applied a set of household membership rules to classify each person. Listed persons were classified as household members if the sample address was their usual place of residence as of the date of interview. The specific rules for household membership in SIPP are identical to those used in CPS. All house- hold members listed in Wave 1 were desianated as sample persons. After listing all household members, demographic Information, such as aoe, sex, and relationship, was obtained for each household member and a SIPP questionnaire was completed for each household member who was 15 years old or over. 149 Development of Special Procedures for the Longitudinal Survey. The procedural differences between SIPP and most other major household surveys conducted by the Census Bureau beqin with the second Inter- view. While other major surveys such as the CPS and NCS return to the same address for each subsequent visit regardless of whether the occupants of that address change, the SIPP Inter- viewer returns to interview the same sampl e persons—that is, persons listed during the first Interview. If persons move to a new address, they are followed and interviews are obtained at the new address. Between March 1981 and March 1982 almost 17 percent of the pop- ulation of the United States moved. 1 / If SIPP did not follow movers from the original sample households, we would lose the capability of observinq the effects of many major changes in the original sample households, and the person sample would be biased, since it would not in- clude movers. Interviewers who discover that a Wave 1 sample person has moved (usually while updating the household roster) are instructed to inquire for new addresses at the original address and if further inquiry is necessary they are to contact mail carriers, rental agents, real estate com- panies and postal supervisors. Other sources may be used, such as an employer or a contact person (this is a person identified by the res- pondent during the initial interview as one who would usually know where the respondent was, such as a relative or a close personal friend). Occasionally, interviewers contact other persons with the same last name listed in local telephone books, although this procedure is not specified in their follow-up instructions. Beginning with the third interview visit, change of address notification forms are left with respondents and respondents are encouraged to mail these to the census regional offices (the address of the appropriate census regional office is preprinted on the form). In addition, advance letters are mailed to respondents before each interview; if the respondent no longer lives at that address the post office is requested to provide a for- warding address. The regional office staff determines whether a new address will be assigned for a personal visit. Personal visits are required for all new addresses located in SIPP PSUs or within 100 miles of a SIPP PSU. Telephone interviews are encouraged for all sample persons who have moved to an address located more than 100 miles of a SIPP PSU within the United States. The following persons are excluded from mover follow-up: (1) Persons who join Wave 1 sample persons in later waves are not followed to new addresses unless these additional persons remain with Wave 1 sample persons who are 15 years old or over; (2) Persons who move out. of the sample uni- verse are not followed. These are per- sons who become Institutionalized, move outside of the United States or live 1n an Armed Forces barracks; (3) Children under 15 who move and are not accompanied by a sample person who 1s 15 years old or over are not followed. The geographic area covered by personal visit follow-up is extensive. Based upon the 1980 census population distribution, about 130 million persons live 1n areas within SIPP PSUs; another 87 million persons live within 100 miles of the outer boundary of a SIPP PSU. We counted 226 million persons 1n 1980; 217 million are within our currently covered areas— 96 percent of the population. Of the 17 percent of the population that moved between March 1981 and March 1982, the great majority moved only a short distance— about two- thirds of the movers stayed 1n the same county (10 percent of the total population) .£' If persons or households move within the same county (or to a nearby county), the new address 1s usually assigned to the same Interviewer for follow-up. The remaining third who move outside of their original county are usually transferred to another Interviewer. This occasionally involves a transfer between two census regional offices. 3 / Of the 26,024 addresses included in the original SIPP address sample, 19,878 addresses were interviewed households in Wave 1 and were reassigned for a second visit. The 6,146 addresses reported as noninterview at the time of the first visit were not reassigned. Of these noninterviews, 1,019 were eligible house- holds whose members refused to participate in the survey, or were temporarily absent, unable to be located or not interviewed for other reasons. Survey planners were reluctant to reassign in Wave 2 those Wave 1 eligible noninterviews be- cause of the added complexity for both the inter- viewers and the processing system. 4 / Interviewers visited sample addresses for the second interview during February throuqh May 1984 and attempted to locate and Interview the approximately 40,000 sample persons interviewed during the first visit. New persons not present initially were added to the household rosters, provided original sample persons were still Included on the roster. Any new persons who were household members 15 years old or older were also eligible for interview. If no sample person remained at an address, no interviews were con- ducted at that address, but interviewers were required to follow the sample persons to their new addresses. The SIPP Identification System. The SIPP Identification System is a numbering system designed to provide a unique unchanging Identifier for each person in an interviewed household. The person identifier is used to link data from more than one interview for the same individual regardless of what moves have taken place or what changes 1n household member- ship have occurred since Wave 1. In addition, the ID system provides the means for grouping Individuals into unique households in each wave. This is an important attribute, which allows for the tracking and identification of changing household membership— persons moving away can be linked to each household of which they have been a member since their first Interview. However, no attempt is made during the field operations 150 to define or number each "different" household for longitudinal analysis. The components of the operational SIPP identi- fication system are: PSU number - 3 digits Segment number - 4 dialts Serial number - 2 digits Address I.D. - 2 digits Entry address I.D. - 2 digits Person number - 3 digits The PSU and segment numbers are assigned by Washinaton staff during sample selection. The 3-digit PSU number identifies a county or group of counties and is the same number used by other census surveys, such as the CPS and the NCS. As a sample of seqments, that is, clusters of hous- ing units, is drawn from a PSU, the segments are uniquely numbered within each PSU, usina a 4- digit number. The clusters generally range in size from two to four housing units. Office staff in the 12 regional offices are responsible for assigning the 2-digit serial number. The 2-digit serial number is assigned sequentially in Wave 1 to each SIPP livinq quarters within a segment. The 9-digit combination PSU, segment, and serial number uniquely identifies each sample address for the first interview. As a result, SIPP households interviewed during Wave 1, (October 1983-January 1984) can be uniquely identified with these three components: PSU, segment, and serial number. The PSU, segment, and serial numbers never change, regardless of movers and new household formations . For SIPP, a 2-digit address ID code is added by office staff to provide a means for identify- ing more than one unique household associated with the same PSU, segment, and serial number. This situation occurs after Wave 1, when an original Wave 1 household splits up to form more than one household. The first digit of the address I.D. code indicates the wave a new address is first assigned for interview. The second diqit sequentially numbers households originatinq from the same PSU, segment, and serial number. While not essential for Wave 1, an address ID code of 11 was assigned to all Wave 1 sample addresses. In later Waves, as SIPP sample persons move to new addresses, the office staff assigns new address ID codes to each new address brought into the survey by movers. Address ID codes assigned during a previous wave are deleted from the processing system for the current and successive waves 1f no SIPP sample persons remain at the address. Thus, the com- bination of PSU, segment, serial number, and address ID code uniquely identifies each sample address at each qlven Wave. As only one sample household 1s associated with a sample address, this combination provides unique household identifiers for a given Wave. The person identification number 1s a 5-d1g1t number consisting of an entry address ID code and a person number. It 1s assigned by the Inter- viewer as each person 1s Initially listed on the household roster. As the interviewer lists the name of each person in the household, he/she transcribes the current 2-digit address ID code to each person's record. The 2-digit number is the entry address ID. Next, the Interviewer assigns a 3-digit person number to each person. Numbers 101, 102, and so on, are assigned to persons at the sample address 1n Wave 1; the numbers 201, 202, and so on, are assigned to persons added to the roster 1n Wave 2; and so forth. The first digit Indicates the wave the person enters the survey. This 5-digit number consisting of entry address ID and person number is not changed or updated, except Tn rare Instances of merged households which are des - cribed later. Thus, the 14-digit combination of PSU, seg- ment, serial, entry address ID, and person number uniquely identifies each person 1n the SIPP sur- vey and can be used to link data for persons across waves. The PSU, segment, serial, and address ID code uniquely Identifies each house- hold in a given wave; and the PSU, -segment, and serial number can link all households 1n sub- sequent waves back to the original Wave 1 house- hold. An example of the numbering scheme may help to clarify it further. Consider a Wave 1 household. There is a basic control number consisting of PSU, segment and serial number, along with the address ID code. At the time of the first visit, four persons are listed--a father, mother, son and daughter. Each is assigned the current address ID code 11, alonq with a three digit person number— 101, 102, 103, and 104. The interviewer returns four months later and finds that the mother and father remain at the original address. The two children have moved to separate new addresses and both have married. The separate new addresses retain the basic control number (PSU, segment and serial number). One new address receives address ID code 21, the other receives 22. A new person, the son's wife is added. She is added at an address coded 21 in Wave 2, so she receives an entry address ID of of 21 and person number 201. The daughter's husband 1s added at an address coded 22, so his person ID is 22-201. The original persons, the son and daughter, do not change their person ID's. In Wave 3, the mother and father retire and move to Florida. No one lives at the original address. The mother and father moved in Wave 3, so their new address ID code is 31. Their person ID's remain the same. The son and his wife haven't moved in Wave 3. Their address ID's do not change. The daughter 1s still at the same address, so her address ID doesn't change. How- ever, she has split-up with her husband and he has moved out. Since her husband is not an original Wave 1 sample person, he is not followed to his new address. As mentioned previously, the operational phase makes no attempt to apply longitudinal household definitions to the changing relationships, nor to number households longitudinally. However, as analysts develop longitudinal definitions, the current data base must be able to provide the Information required to support these definitions. Further refinements 1n the questions asked at each Interview may be implemented as the needs of 151 a longitudinal household definition become more precisely specified. The SIPP numbering system has several advan- tages over alternative schemes that have been considered: (1) The portion of the control number consist- ing of PSU, segment, and serial number 1s similar to the numbering system used by other major surveys conducted by the Bureau. (2) Interviewers are able to assign person numbers during the course of the Inter- view. The person number 1s used 1n various parts of the questionnaire during the interview. This number 1s also tran- scribed to several other survey documents during the Interview and immediately after- ward during clerical coding operations. A person number assigned after the time of Interview does not provide this Immediate linkage. (3) The person number itself has relatively few digits, reducing the possibility of transcription errors. Several disadvantages have been noted: (1) Duplications of person numbers for add- itional persons (persons added after Wave 1) can conceivably occur 1n situ- ations where households have split and are in different regional office juris- dictions. The computer processing system identifies these duplicates and the regional office staff corrects them durinq processing. (2) Mergers between two separate sample house- holds require special procedures. If this situation occurs, one set of controls is retained for the merged household. New person numbers are assigned to those persons who lose their original Identi- fiers. Interviewers record both the old and new ID numbers on the control card to provide a means for linking the two ID's. By the end of the second Wave, this had occurred once. Monthly Cross-Sectional Households . While the ID system provides identifiers for each household in a given wave, 1t does not identify households for a qiven month. Monthly cross-sectional households are not constructed in the field; rather they are constructed during processing using information obtained during each wave. During each visit, demographic character- istics such as changes 1n marital status, changes in reference person (householder) status, and changes in household relationships are recorded on a control card. The same control card is used for each visit to the same address. If a sample person moves to a new address, the Inter- viewer prepares a new control card for the new address and transcribes any Information that is not expected to change. Date entered (month and day) and date left (month and day) are recorded on the control card for every entry and exit from an address. Reasons for entries and exits are coded: Entry 1 - birth ) 2 - marriage ) 3 - other ) 4-5/ ) Exit 5 - deceased 6 - Institutionalized 7 - living 1n Armed Forces barracks 8 - moved outside of country 9 - separation or divorce 10 - person who joined a household durinq Wave 2 or later and who 1s no longer living with any sample person 11 - other 99 - listed in error Date entered and left 1s used during process- ing to group persons Into households for a given month. A person entering a household before mid- month is considered to be a member for the entire month; a person entering after mid-month is con- sidered not to be a household member for that month. A similar mid-month cutoff date is used for persons leaving households. As this monthly household determination is done during process- ing, 1t does not affect field operations, short of obtaining month and day of entries and exits. Clerical Field Controls . The SIPP movers' procedures have long been recognized as ambitious, requiring a system of field controls that are more extensive than those 1n effect for other major surveys conducted by the Bureau. Two standard forms are used for controlling Interviewer assignments, and a third control was developed specifically for SIPP. All three forms are used during a clerical check-in at the regional offices. An interviewer's Assignment and Control form is completed for each interviewer, listing every case in a given interviewer's assignment. A copy 1s sent to the interviewer and a control copy is kept in the office. As completed questionnaires are returned to the office, they are checked in against this form. A second control form lists all interviewers and the number of assigned cases for each interviewer. Tallies are kept as material 1s returned. This form gives super- visors a summary of the number of outstanding cases for a given month. The third control developed specifically for SIPP is a computer- generated listing of all persons listed as house- hold members in Wave 1. It includes names, person numbers, interviewer codes, and interview status. The regional offices update the listing during each wave and account for every inter- viewed person as documents are received from Interviewers. These three forms provide the basis of the clerical check-In and control. They must be updated to account for assignments that are transferred between interviewers and between regional offices, and they must be up- dated to Include new persons entering SIPP after Wave 1. Two other control forms are used by the offices to facilitate the movers' operation. One form is used to 11st the original address of a sample household along with all subsequent addresses. It 1s used primarily to control the assignment of address ID codes. A second form, a worksheet, 1s used for transferring cases from one Interviewer to another by telephone. Because 152 of time constraints, transfers are done by tele- phone; and required control card Information — such as new address, names of persons, demo- graphic Information for the movers, etc. — must be obtained from the original interviewer and passed on to the new interviewer. While the scope of this paper concerns field operations, some mention must be made of two major features in the computerized processing system designed for check-In and control. (1) During the keying operation all persons listed on the control card who are 15 or over and are current household members must have an accompanying questionnaire. This check is done automatically at each keying station. Keying 1s done in the regional offices and immediate resolution of missing questionnaires is required. (2) At the end of each of the four months of each wave, a centralized check-in is com- pleted in Washington. A control card record must be transmitted for every person showing an active status on a master file maintained in Washington of all active records. Offices cannot close out an interview month until every active status person is accounted for and some demoqranhic data--aqe, race and sex — is verified to make sure that we are not checkinq in the wrong person. Each missing case is referred to the appropriate regional office for reso- lution. Experience with Following Movers. Available data for follow-up interviews con- ducted during February -May 1984 gives an initial indication of the success rates for the SIPP follow-up. (1) Percentaqe of movers found: about B0%. (2) Percentage of movers lost: about 20*-- represents 0.9 percent of all eligible SIPP households. When sample persons move from the address at which they were contacted in the previous Wave (four months before), interviewers are instructed to go through a series of steps to locate the new address. If all the steps are "Dead ends" they fill in a form which describes what they know about the mover situation for those sample persons. A review of the forms for Wave 2 avail- able at the time this paper was written (they are submitted on a flow basis and a form was not submitted for some of the cases) illust- rated the kinds of events that took place leading to the sample person's moving without leavina a trace. In about half of the cases all household members moved leavinq no forwarding address. For another quarter of the cases one or more persons had left the household leaving other members behind but those other persons had no Information about the departee's whereabouts. In an additional 15 percent of the cases, the spouse (usually the husband) left the rest of the family and the remaining spouse could not or would not give a forwarding address. The remaining cases showed a variety of events; for example the person had moved and had no permanent new address, rather he was just staying with various friends but the Interviewer had no success 1n contacting him. The inter- viewers' comments showed considerable efforts in attempting to track these movers. Recommendations for future SIPP Panels. Improvements 1n the processing system and the expansion of follow-up procedures are envisioned for future panels. These recommended changes are Intended to Improve sample coverage in a number of areas. In the 1984 panel, persons who leave the sample universe—become institutionalized, leave the country, or live 1n an Armed Forces barracks — are dropped from the sample. 6/ As of the 1980 census, about 2.5 million persons were currently Inmates of institutions such as mental hospitals, homes for the aged and correctional institutions. Another 613,000 persons were living in military barracks. Demo- graphers estimate that about 160,000 persons emigrate from the United States each year.?/ As average stays in nursing homes are less than 60 days and live discharges account for about 75 percent of the discharges, a sample person who goes into a nursing home is likely to come out before the end of the SIPP panel. According to current procedures, members of each of these groups are reinstated only if they rejoin a SIPP household. For the SIPP panel beginning in January 1985, planning is underway to track sample persons who become institutionalized. Interviewers will obtain the name of the institution in which the person is residing. At each subsequent inter- view they will determine whether the person is still there and if the person has been discharged they will obtain a new address. It will then be possible to follow sample persons leaving in- stitutions even if they do not rejoin active SIPP households. There are no current plans to track sample persons who move outside of the country or to an Armed Forces barracks. Interviewers may return to an address in the 1984 panel and find that all original Wave 1 sample persons have left but one or more additional persons (who joined households with sample persons after Wave 1) remain. In the 1984 panel no interviews are conducted at that address even though persons currently at the address lived with sample persons during at least part of the reference period. For future panels a final interview will be conducted for the additional persons remaining at the address. As in the 1984 panel, no subsequent follow-up is planned for these persons. As described earlier, 1n the 1984 SIPP, only persons who are 15 or over are followed to new addresses; sample persons who are under 15 years old are not followed unless they move with a sample person who is 15 or over. However once they become 15 they are eligible for interview along with other members of their households. They are missed 1n the 1984 panel 1f they move before turning 15 and are not accompanied by a sample person who is 15 years old or older. Their absence may result in some bias 1n the survey data. In future SIPP panels, all sample persons who are 12 years old or older at the time of the 153 first interview will be eligible for follow-up. When a person who was 12 years old at the time of the first interview moves by him- or herself to a new address, occupants of the new household will be interviewed according to standard pro- cedures—that is persons 15 years old and over will be administered a questionnaire. When the sample person turns 15, that person will also be administered a questionnaire. A number of other recommendations have been made for future SIPP panels. These include: (1) Reassigning Wave 1 eligible noninterviews in Wave 2. Interviewers will be provided with instructions for obtaining household rosters and assigning person numbers retrospectively--!. e. , as of a date approximately four months prior to the date of the second interview. (2) Adjusting the computerized check-in system to allow for new serial numbers (represent- ing persons or addresses) to be introduced in Wave 2. This will provide flexibility for including missed Wave 1 housing units. (3) Developing a questionnaire that is appro- priate for telephone interviews. This could be administered to persons who are not followed for a personal visit. (4) Increased automation over the next few years will eliminate much of the time consuming clerical operations associated with the check-in, control and monitoring of assignments. In summary, SIPP has attempted an ambitious undertaking by implementing and attempting to improve an extensive follow-up program. Data users will be the ultimate beneficiaries and judges of the program's success. \J U.S. Bureau of the Census, "Geographical Mobility: March 1981 to March 1982." Current Population Reports , Series P-20, No. 3BT: Issued February 1984, U.S.G.P.O. 2/ U.S. Bureau of the Census, op. cit. 7/ The United States is administratively divided into 12 geographic areas. Each area consists of a group of states under the jurisdiction of a census regional office. 4/ Wave 2 interviews for households not origin- ally interviewed in Wave 1 require special proc- edures for constructing household rosters. For example, interviewers would need to obtain the names of persons living at the address as of a reference date four months prior to the Wave 2 interview. An appropriate Wave 1 person number would be assigned (see the SIPP Identification System explained later in this paper). However the 1984 computerized check-in system was de- signed to reject any Wave 1 person number that appeared for the first time in later waves. 5/ Code 4 is used in circumstances where a sample person moves to an address already occupied by persons not previously in SIPP. The persons not previously in SIPP are added to the roster and are coded "4." 6/ It was decided, not to obtain proxy inform- ation for sample persons (as well as other members of a household that has at least one resident sample person) who die while they are in a SIPP panel. 7/ Robert Warren and Jennifer Marks Peck, ""Foreign-Born Emigration from the United States: 196U to 1970," Demography , Vol. 17, No. 1 (February 1980), pp. 71-84. 154 MANAGING THE DATA FROM THE 1979 INCOME SURVEY DEVELOPMENT PROGRAM Pat Doyle, Mathematics Policy Research Constance F. Citro, National Academy of Sciences During 1979 and 1980 the Department of Healrti and Human Services and the Bureau of the Census, with support from other federal government agencies including the Food and Nutrition Service, USDA, administered a panel study of households representative of the civilian noninstitutionalized population in the United States called the 1979 Income Survey Development Program (ISDP) Research Panel. The survey was designed as the final pretest for the Survey of Income and Program Participation (SIPP) which had been under development since 1975 and was fully implemented in late 1983. The 1979 panel study was extremely complex due to the efforts put forth to improve the measurement of income, net worth, and program participation and to increase the information available on behavior, attitudes, expenses and disposable income of the population. The complexity of the 1979 ISDP survey design led to the production of public use files which are cumbersome to use thus making it difficult to access the newly available data for research. The subject of this paper is to describe a pro- ject conducted by Mathematica Policy Research (MPR) under contract to the Food and Nutrition Service, USDA, to solve the data access problems through the use of data base management system technology. The DBMS chosen for this work was RAMIS II" developed and distributed by Mathematica Products Group. The system developed by MPR is referred to as the ISDP/RAMIS II system. It is important to note that a number of problems that were confronted in designing the access system described in this paper have been resolved in the release of the public use ISDP files (in fact, data from the ISDP/RAMIS II system were the source of some of these improve- ments). Furthermore, some, but by no means all, of these access problems have been explicitly taken care of in the design of the SIPP. Conse- quently, designing an access system for the new survey should be easier than for the ISDP. It is also true that the best design for a SIPP access system is likely not to be the design chosen for the ISDP system. In the subsequent section, an overview of the panel study with emphasis on the contents and problems of the data files is provided. The report concludes with an overview of the newly created system with a summary of the data prob- lems solved in the course of this work. For detailed information on the contents and use of the ISDP system, the reader is referred to Doyle and Citro (1984). Overview of the ISDP and Its Applications Figure 1 gives a graphic representation of the key features of the ISDP design. Briefly, note that: — There were 6 waves of interviewing provid- ing 12 to 15 months of data for each household. — Interviewing was staggered; one-third of the sample was interviewed each month, with, thus, a different 3-month reference period for each rotation group. — This pattern was regular, except that the third rotation group, for various reasons, was skipped over Wave 4. — Each wave asked a core set of items, including monthly income and employment, plus one-time supplemental items. The SIPP design for the first panel is very similar, including skipping one wave for part of the sample. The ISDP, by virtue of gathering" detailed month-by-month data over a span of at least a year, offered the potential for exciting research that simply could not be carried out before. But, to make it possible for the researchers at MPR to realize that potential, we had to design an access system that would do the following: — Generate reports and analysis files from individual waves, undoubtedly the easiest way of using the data ■ Let researchers apply different rules to identify households and families across waves for longitudinal analysis ■ Link supplemental data collected in one wave to core data in other waves ■ Make it possible to carry out sophisticated statistical as well as tabular analysis of the data ■ Make it possible to use the ISDP data with data from other sources, for example, 1980 census summary data. All of these access require well to the SIPP. tents apply equally Problems for Access Posed by the ISDP Various design features of the ISDP posed more or less serious problems for developing an access system that would satisfy the requirements just listed. These are summarized below. o Staggered Interviewing . The use of a staggered interviewing schedule results in a situation where data from more than one interview must be accessed to study a common calendar period for the entire sample (except where the user can make do with the single calendar month that is common to all rotation groups within a wave). o Skipping Wave 4. The alteration of the interviewing schedule to have the third rotation group skip over the Wave 4 inter- view means that, although two-thirds of the 155 sample cases have a full 15 months of data (from the five regular waves if they did not attrite), the other third has only 12 months. Moreover, the third rotation group does not have responses to any of the topical supplemental items asked at Wave 4. o Different reference periods for wave- specific information . For any one interview, there is a potential mismatch between the wave-specific data and the monthly data, given that monthly data for the month of an interview were actually asked at the subsequent interview. o Identifier problems . The Census Bureau encountered problems in uniquely identi- fying individuals across the survey waves, necessitating creation of a new unique person identifier, called the link index, as a separate file from the interview data files. It also turned out that the Bureau erroneously included some persons on the cross-section interview files who were not in fact present and vice versa. One-time ; -specific supplemental data . The fact that important data were asked on a supplemental one-time basis creates problems for using these items together with the monthly and quarterly data. o School lunch data problems . The ISDP files include valid data only for the last child in a family, and these data were erroneous- ly written into the records for all other children. o Lack of editing on Wave 6 . In the case of Waves 1-5, the Census Bureau performed edits on demographic variables and also edited income recipiency flags. No editing was performed on the Wave 6 data, which were collected in an entirely different format. o Asset income reporting experiment . This experiment creates practical problems of associating asset income data with other data for each month of the panel. o Incomplete determination of monthly unit composition . The design of the cross- section files, coupled with a high level of noise in the data on arrival and departure dates, made it very difficult to assemble a stream of monthly unit composition indi- cators consistent with reported monthly economic data. o Absence of longitudinal weights and imputations for missing data . The cross- section interview files contain weights and also imputations for missing income and employment data that were constructed strictly on a cross-section basis which are not suitable longitudinal studies. o Absence of longitudinal editing . With the exception of editing age and sex in the construction of the unique identifiers, no longitudinal edits were performed on the demographic variables. These characteristics of the ISDP survey make retrieval of the information for analysis cumber- some and expensive. This is particularly true for longitudinal applications of the data .such as the study of turnover in the Food Stamp Program. The difficulty in using the ISDP for research was compounded by the structure of the available data files. At the time this project was carried out, the most suitable input file was a conca- tenation of cross-section files from all five waves. The format for each cross-section was similar to the public working files currently available (NTIS, 1982) except that the family level had not been fully developed. The records from all five waves were grouped by PSUSER1AL and a level 1 record was created which recorded information common to each group such as rota- tion. In addition to inserting the level 1 record, the Bureau also merged the link index (constructed unique person identifier) and longi- tudinal edited values of age and sex to this file. However, the Bureau deleted from this file the results of the cross-sectional imputations for income and employment data. The rationale for this omission was the unsuitability of these imputations for longitudinal analysis, the pur- pose of the concatenated file. This file was extremely cumbersome to access due to the lack of a true hierarchical structure, the large number of different record types (data from each topical module were recorded on a sepa- rate record with a distinct record length and layout) and the fact that some of the newly created person identifiers were erroneous. Overview of the ISDP/RAMIS II System The objective of this data base development effort, as noted above, was to take the infor- mation available on the series of cross-6ection files described above and array it in a manner that would facilitate longitudinal as well as cross-sectional analysis. The results of this effort were two RAMIS II data bases, one called SIPPMASTER and one called MH for monthly house- holds. SIPPMASTER is the main file in that all of the data collected during each wave are stored there. This file is used for all cross-section applications as well as longitudinal applications which do not involve the formation of longitudi- nal households or other groupings of indivi- duals. The MH file is the data base designed to support the construction of longitudinal units. It essentially provides information on monthly household, family, and food stamp unit compo- sition. The data in MH are arrayed to permit a user to develop a definition of longitudinality and apply that in the construction of a longi- tudinal unit file. Once the longitudinal unit itself is determined, the user can employ the data stored in SIPPMASTER to derive variables like total household monthly income which reflect the longitudinal unit characteristics. The remainder of this section provides an overview of the contents of the ISDP/RAMIS II system. A detailed discussion of the motivation for choosing this file design and the procedures 156 required to develop this data base is described in Doyle and Citro (1982). S1PPMASTER . Figure 2 displays the logical organization of S1PPMASTER. It has a hierarch- ical structure with fifteen levels, five of which are real and ten of which are virtual. Th« five real levels are wave, household, family, person and month. Some relevant comments on each of these levels follow: Wave . (Level 1) Indicators for Waves 1 through 5 are contained in SIPPMASTER on level 1. The data from Wave 6 are treated as supplemental and are therefore stored in the virtual level PM (level 7). SIPPMASTER is physically separated into 5 data bases, one for each wave. They are linked to- gether with RAMIS II USE commands to logically form one data base. Household . (Level 2) This reflects house- hold composition at the time of the inter- view. The household identifier (HHID) uniquely identifies households within wave. It cannot be used to identify households longitudinally. Non-interview households in each wave have entries at this level, however data for all other levels are zero. The contents of the household level consist of the data found in the household record in the cross- section files prepared by the Census Bureau. Family . (Level 4) The family level simply identifies family units within households as they existed at the time of each interview. Primary individuals, secondary individuals, and outmovers are treated as one person families. Person . (Level 5) This contains interview specific data for each individual and retrospective data that were not collected for specific calendar months such as total weeks unemployed. The identifier for level 5 is the link index (called PERID in RAMIS II) so that each person sampled is identi- fied in the same way across all waves. The data for the person level were derived from record type 5 of the cross-section files. Some relevant points: outmovers in a given wave are included for that wave but have in the weight fields; the weights are cross-sectional; all person identifiers with values exceeding 200000 should be deleted for longitudinal analysis but not for cross sectional analysis; corrected age (C0RAGE) should be used instead of edited age (AGEED) except that corrected age is on Wave 2: Income recipiency flags on level 5 are not to be used to determine item non- response as they were retained here for other reasons (for example, if the interest flag in Wave 1 is 1 on level 5 but there is no entry for that income type in the WU or MU associated files, then the person was reported to have had an interest producing asset but did not actually receive interest income during the Wave 1 reference period). Month. (Level 12) This represents the reference period for each wave. All months in the survey have been numbered longitudi- nally so that, for example, the 3 months pertaining to Wave 2 are 4, 5, and 6. Aside from identifying the longitudinal reference months, this level contains numerous fillers intended to support the construction of longitudinal household (or other aggregate unit) files. The remaining data available through SIPP- MASTER are stored in associated files which can be accessed directly if desired. A summary of the contents of each can be found in Doyle and Citro (1984). MH. Figure 3 describes the logical organiza- tion of MH. It is a relatively simple hierarchi- cal file with five real levels and ojie virtual level. Thi6 file reflects the outcome of a complicated procedure designed to determine monthly household and food stamp unit composition from the data collected in the 1979 ISDP Research Panel. Documentation on the methodology employed in the development of this file is included in (Doyle and Citro, 1984). The contents of this file are described below followed by a section summarizing how it is used to develop longi- tudinal units. Unlike SIPPMASTER, MH contains a limited number of variables. It is comprised mostly of pointers detailing who lived with whom during each month covered by the first five waves of the survey. The remaining variables provide descrip- tive characteristics such as age and relationship to reference person which are necessary to effectively determine longitudinal units. Each of the levels of MH is described below. PSUSERIAL . (Level 1) This level contains the scrambled values PSUSERIAL as well as the rotation group identifier. For the ISDP all persons who ever resided together have common values of PSUSERIAL, so this level was created to increase the effi- ciency of data retrieval and to minimize storage costs. MONTH . (Level 2) This level simply identifies the month. Longitudinal reference months as described for SIPP- MASTER were used. For rotation groups 1 and 2, the months range from 1 to 16 and for rotation group 3 they range from 1 to 13. Note that household composition can be described for one more month than is cover- ed by the retrospective data collected in the ISDP. This extra time period is the month of the final interview. Household . (Level 3) This level describes who lived with whom during each month and the Food Stamp Program participation and benefits of that group. The contents are the monthly household identifier and food stamp recipiency and amount variables for up to two food stamp units. 157 Family . (Level 4) This is an indication of family groupings within monthly house- holds. The contents are family identifier, family type, and family kind. Person . (Level 5) This level contains an entry for each person for every month h« or she was present in the sample. The key to this level is PER1D, the same identifier used in S1PPMASTER. The other variables stored in this level are age, relationship to reference person, marital status, food stamp unit membership and variables neces- sary to link to S1PPMASTER. PP . (Level 6) This is a virtual level in MH. The associated file is called PD and it is the same PD file accessed through level 6 of SIPPMASTER. It contains presence in sample indicators as well as constant demographic data such as sex. The intended use of the MH data base is to determine longitudinal units. In developing the ISDP/RAMIS II system, one objective was to allow researchers flexibility in the development of the definition of what constitutes the same unit when viewed over time. For some applications it may be appropriate to define a unit as being the same from one month to the next if all adults remain the same. For another application it may be sufficient to only require that the reference person (household head) be the same. More complicated definitions may be required in other situations. An example might be that units are the same if the composition changes from one month to the next are restricted to birth of a child, loss of a peripheral adult, e.g., an older daughter leaves for college, or a death of one spouse in a husband-wife primary family. Each of these three definitions can be speci- fied with the ISDP/RAMIS II system as can many others. The procedure is as follows. Using the preferred definition, an algorithm for uniquely identifying each unit each month is developed. In the second example above, this would simply involve assigning the PERID of the reference person to the monthly unit as the identifier. Next, a comparison across months within PSUSERIAL groups is made. All monthly units with common values of the newly created identifier constitute one longitudinal unit. Finally, an extract is created which records the available information organized by the longitudinal unit identifier. The available data from MH are primarily demographic, the exception being Food Stamp Program characteristics. The user will of course also desire economic data to support the analysis of the longitudinal units. This can be achieved through the extraction of data from SIPPMASTER. Conclusion This paper describes a system to access data from a complicated longitudinal survey of house- holds when the survey itself was in its devel- opment stages. It represents a successful attempt to apply modern DBMS technology to solve access problems posed by complex social science data collection efforts. Some of its feature o Procedural language interface to allow the use of FORTRAN or PL/1 to conduct complex applications o SAS interface to permit more sophisti- cated statistical analysis. The system is, of course, not without draw- backs. For example, the hierarchy imposed in the primary file, SIPPMASTER, is cumbersome and, with recent developments in relational data base technology, unnecessary. This structure could easily be simplified today. Furthermore, the system is on-line and therefore require large amounts of disk storage. As the cost of mass storage goes down with improved hardware now being developed, this will become less of a problem. In spite of these imperfections the ISDP/RAMIS II system works. It represents the first truly integrated ISDP data base available to the public for research. With this system users can and indeed have carried out analyses that truly exploit the longitudinal nature of the data. FOOTNOTES In A 0n the publicly available data bases, PSU- SERIAL is a nine character field which uniquely identifies all households in Wave 1. Together with person number it was originally intended to uniquely identify persons followed in the panel. A virtual level is a level for which the data are not physically stored in the file. Instead there is an internal record of the location of another file which contains the information. With a DBMS, this other (or associated) file is accessed automatically when data from it are requested. REFERENCES Carr, Timothy; Doyle, Pat; and Lubitz, Irene. "An Analysis of Turnover in the Food Stamp Program." Draft Report. Washington, D.C. : Mathematica Policy Research, 1983. Doyle, Pat and Citro, Constance F. "The ISDP/ RAMIS II System and Its Development." Draft Report. Washington, D.C: Mathematica Policy Research, 1984. Doyle, Pat and Citro, Constance F. "Proposed Design Strategy for Storing and Accessing Data from the 1979 Income Survey Development Pro- gram Research Panel." Draft Report. Washington, D.C: Mathematica Policy Research, 1982. National Technical Information Service. Income Survey Development Program: 1979 Research Panel Documentation . Springfield, V.A.: U.S. Department of Commerce, 1982. 158 fillip ItfKltl ifli tut 8 .c ,u !l«fo Q I & I 1 5 4 4 4 2 j 159 FIGURE 2 FILE STRUCTURE FOR SIPPMASTER S IPPMASTER HOSTFi LE Associated Files I " ,£ I I HOUSEHOLD I | H.E. | I HH; Nave; Eligibility I PERSON I | PD | I Person - Constant | PM | | Person - Miscellaneous ! (topical modules) I «WS | | Person; Wave; | Wage and Salary Job Data | WB | | Person; Wave; Business Data | WF | I Person; Wave; Farm Data | WU | | Person; Wave; I Unearned Income Data I MONTH | I MWS |. I MSE |. 1 p« 1 Wage anc son; Month; | Salary Job Data 1 P« | Busln. son; Month; | s and Farm Data 1 1 P« Unear son; Month; | 1 PSUSERIAL 1 1 I 1 j MONTH 1 1 I 1 | Ho |Foo sehold and | 1 1 1 | Family | 1 1 1 | Person | 1 1 1 1 PD 1 ASSOCIATED FILE 160 The papers of Je Doyle and Citro foe of data collection search Center mbers for all Individuals, sample an nsample, and (4) storing the data fo e individuals and aggregation* r,f and individuals so that an analys -erform a variety of analyses Program Participation (SIPP) and its elaborate pretest, the 1979 Income Survey individuals in an efficient Development Program (ISDP). Both must .??fL°[ S i "!!" Pt address issues arising from the basic design of longitudinal surveys of utlined by Jean and McArthur are quite imilar to the ones that have been used Individuals and households and it is successfully by the PSID for 17 years. nts abou worth beginning with a brief review of the sampling theory behind the SIPP design. Since this design is so similar to the one used in the longitudinal study with which I am most familiar— the Panel alon S as a P art of SIPP households Study of Income Dynamics (PSID)— I will the PSID . individuals 1. Not all individuals who are institutionalized appear to be carried vily from the 17-year history of that study. Many cross-sectional surveys obtain and cannot be erviewed are associated with a sample ily for as long as they heir samples of individuals and institutionalized. Of course they a seholds by sampling dwellings. not considered part of the family fo ablishing contact with if they leave the institution. It may tempting to drop institutionalized individuals from the sample, but there are a substantial number of them, Longitudinal surveys can do this as well, most Purposes, but the family pr as evidenced by the procedures of both wlth the means of tracking the the SIPP and PSID. Representative samples of dwellings provide representative samples of subunits within those dwellings — households, families, Food Stamp recipiency units, AFDC recipiency units and individuals. The especially at younger adult ages. A selection probabilities of each of these strategy of dropping institutionalized subunits are identical to the selection individuals in a country with a probabilities of the dwelling Itself. compulsory, universal military service, With a properly specified set of rules would result in all young people being regarding the definition of units and the dropped from the sample! Not keeping tracking of those units over time, a track of y° un 8 children who move into longitudinal study such as the SIPP or Institutionalized housing of various the PSID can maintain a representative types or with relatives who are not sample of each of the various subunits sample members means that the SIPP will over time. This requires that newly be ""able to inform analysts about such formed subunits of interest (families, children. (The PSID does not follow AFDC recipiency units, etc.) enter into these voun 8 children either.) They may the sample with known selection be to ° expensive to follow, but the probabilities in order to reflect decision of not following them should be corresponding changes that are taking based on an appreciation of the opulation at large sequenc requires that individuals be classified 2> Model-based statisticians may not as either "sample" or "nonsample " and appreciate the distinction between sample that explicit rules be followed and nonsample individuals and will lament the fact that nonsample individuals are nsistently in the event of dramatic anges in the composition of units. In e SIPP, as in the PSID, for example, individuals who join the sampl dropped by SIPP once they leave sample households. The PSID does not follow mple individuals either, but perhaps through marriage are followed only as this is a mistake. Some methodologi long as they household containing Once they regain their independence from significant diffe onducted by Finis Welch and his lleagues on the PSID has detected no ey regain their independence from all sample members, they are no longer followed. samplln behavioral models estimated for sample and nonsample Individuals. (Becketti, al. 1983.) siderations require that the study 3 ' The "onresponse rules for the SIPP are not entirely clear from the Jean and McArthur paper, especially the rules they'go, (2) allowinglndi'viduais to'join regarding attempts to contact the sample to provide accurate nonres pond en t s to waves subsequent to the first one. The PSID does not attempt to recontact these nonr espondent s and I equire that good systems for (1) tracking sample Individuals, regardless of where information about the household in which sample individuals reside, (3) having a fool-proof system of identification 'SfS.X'lS !?!! 'Vf! M88 PSID design. Evidence from the ne' 161 youth cohorts of the NLS indicates that nonrespondents in one wave are often quite willing to respond to subsequent waves. One gets the impression that refusals or contact difficulties are often quite transitory in nature. A. The Jean and McArthur paper mentions but does not emphasize the importance of obtaining the name, address and phone number of a contact person who might know the whereabouts of sample households if they move. More conventional means of following individuals such as through forwarding addresses sometimes do not work precisely because the individuals do not wish to be followed easily. In our experience, the contact information is invaluable. 5. Telephone interviewing is mentioned as a possible way of preserving high response rates. The PSID experience suggests that this is indeed true and that data quality does not suffer unduly from switching interviewing modes. Indeed, a substantial number of recontact calls are made to PSID respondents to clean up unclear interview information. Telephones also provide a means of not only reaching geographically remote respondents but also respondents whose time schedules make telephone interviews much more likely to succeed. Before turning to the subject of the Doyle and Citro paper I would like to make a comment on the interaction between the data collection and data management. Too often we compartmentalize the two without realizing how intimately they are related. As illustrated in the Doyle and Citro paper, data analysts often discover apparent inconsistencies or outright errors and are in the worst position to make an informed judgement about the problems. Data collectors ought to anticipate problems of this sort and have significant resources allocated to solving them. Most of the problems must be resolved by returning to the original protocols, at least briefly, to understand the nature of the problem. What now of the data structure and methods proposed in the Doyle and Citro paper? Several basic questions come to mind. 1. The most basic question to be asked of any proposed data structure is " Is it feasible ?" That the proposed structure has been used with success for several ISDP projects suggests an affirmative answer to this question. 2. The second question, more difficult to answer from the information contained in the paper, is " Is it efficient ?" or, more properly stated, " Under what circumstances is it efficient ?" Does one need a dedicated machine capable of grinding away throughout the night to select an abstract from the data set with this in which CPU is priced at its marginal cost? I suspect that the proposed system is not very efficient in the latter type of computing environment but I could not tell from the information contained in the paper. 3. Since most "computing" costs are the labor costs of the programmers and other analysts rather than the machine charges, the third question is " Is it easy to use ?" Apparently once one has acquired a great deal of specific training about the proposed system, it is fairly straightforward. But outside analysts are encouraged to consider avoiding the data abstracting complications by delegating that work to those who are more familiar with the system. The data structure that is proposed is modelled after the exceedingly complex file structure used by the Census Bureau. Surely there is a simpler method than an eight-level hierarchy for each wave and four files each with a fifteen-level hierarchy and a completely separate six- level hierarchy that can be used to sort out different aggregations of individuals. The PSID files are more complicated in that they have more waves of data but are simpler in that they are in only one aggregation — the family. It but -the family history and the individual. The term "family history" is chosen with care because a major insight, obvious now but not during the first twelve years of the study, are the data structure implications that stem from the fact that not all individuals in a given family in the most recent wave share the same "family history". In fact, we have about seven thousand current families but over nine thousand family histories. The first level of the hierarchy, then, is the family history; the second level consists of the individual histories of all of the individuals who share that same family history. One could also construct "household histories", "Food Stamp recipiency histories", etc. as additional hierarchical levels or as separate records in a networking data structure. These simpler hierarchies require that some of the information from the individual data record be aggregated into the family or household record and this work is probably best done at the Census Bureau rather than having outside analysts attempting to do this with the information they have at their disposal. A final comment concerns a limitation again attributable to the way in which the Census Bureau processes its data rather than to the organizations such as Mathematica Policy Research that attempt to make sense out of it. Implicit in the file structure is the assumed need to aggregate individuals into households or other sensible unitr, but not the 162 possible need Co relate individuals to one another. One could think of a file in which all sample individuals had data records that contained information on all Individuals who had been or were about to be related to them in some way (by blood, marriage, adoption, or sharing the same dwelling). Information on the related individuals would include wave by wave (or, in the case of SIPP and ISDP, month by month) information on how the individuals were related and whether they shared the same dwelling, family, household, Food Stamp recipiency unit, etc. For most purposes thi6 would be the most general file structure for SIPP, enabling the analysts to distinguish step children from natural children, ex- spouses, and other relatives so that one could analyze the economic consequences of divorce, etc. This would, of course, require a great deal more information than is now currently provided in the Census Bureau's current "relation to head" coding. But the added detail would enable the construction of a file structure that would be of greatest use. References Becketti, Sean; Gould, William; Lillard, Lee and Welch, Finis Attrition from the PSID , Santa Monica, CA: Unicon Research Corporation, November, 1983. 163 The papers by J. C. Moore and D. Kasprzyk, A. M. Roman and D. V. O'Brien, and R. A. Kulka are important efforts to examine sources of error other than that due to sampling. I would hope that this concern with nonsampling sources of variance continues throughout the program of evaluation and research on the new SIPP, and that the results are reflected in reports coming out of the SIPP. There is ample reason to suspect that, in a survey as complex and difficult as is the SIPP, error due to sampling will be swamped by error due to miscommunication between respon- dent and interviewer, respondent dissembling, respondent ignorance, etc. The Moore and Kasprzyk paper tackles the prob- lem that the ISDP (the pilot study for the SIPP) can be seen as having measured more change be- tween waves than within waves. The authors argue persuasively that this cannot be reflective of true conditions, citing a number of reasons for thinking that not only did respondents make errors that resulted in the appearance of change between waves but also that post-interview pro- cessing may have contributed significantly to this phenomenon. Their analysis heavily depends on the correct matching of persons across waves. An earlier analysis, by Graham Kalton and James Lepkowski, had depended on a file that contained many iden- tifiers with erroneous codes, making matches very prone to error. Moore and Kasprzyk used instead a "definitive" data file produced by Mathematica, which linked data for the first five of the six waves. The authors assert that this "apparently" corrected the matching problems by correcting person identifiers. This is a curiously ambigu- ous way of describing what must be one of the central assumptions of the analysis. If it is not now possible for the Bureau of the Census to construct its own matched file, then I suggest that a second-best alternative is to document the matching process used by Mathematica and to obtain an independent validation of that process. This is particularly important given the research now beginning on the ISDP, with support from NSF and other agencies. The authors do point out the intriguing possi- bility that the matched file processed by Kalton and Lepkowski contained a higher proportion of correct matches, although fewer matches in all than does the Mathematica file. Reporting that even a low rate of mismatching can produce the level of between-wave change observed (on the basis of a computer simulation) , Moore and Kas- przyk raise the question that post-interview pro- cessing may have played a large role in the results obtained. The paper points indirectly to one of the big methodological questions facing the SIPP: when data are not only collected longitudinally (as in the CPS and the SIPP) , but are also to be analy- zed longitudinally (as in the SIPP) , it becomes necessary to examine and perhaps rethink esta- blished procedures: editing for consistency with- in records, imputation for item nonresponse, sub- stitution of persons for missing responses, sam- ple weighting across time, etc. Despite years of experience with the PSID and the NLS, for example, there are not yet widely-accepted solutions to such problems. With the SIPP demanding solutions it is imperative to undertake research now. The Roman and O'Brien paper focuses on one experiment of the ISDP, the comparison of data obtained from college students living away from home with data obtained from proxies, usually their parents. The experiment was conducted dur- ing November and December, certainly not the best months to find students resident at school. Facts, fate, and perhaps a few gremlins took whacks at the sample size. Over one-quarter of the students who were identified were not inter- viewed, because their school was more than 50 miles distant. Not all parents gave permission for the interview of their students. Not all students were at home. Not aM of the completed interviews could be matched with parent interviews. From a potential sample of 443 students identified as usually living away from home, the result is a sample of only 167 matched proxy-student records. One could argue that the failure to match data is the most fundamental form of discrepancy in parent-student comparisons, and in that event the true sample size is somewhat larger, but even with this increase in sample size it is difficult to generalize from the results, with so large a pro- portion of the sample lost. The results seem intuitively right: better jobs come to the attention of parents more than low- paying ones; jobs with fewer hours also attract their attention less. So the proxy data are more like the data provided by the students for the big-ticket items. I wonder whether this result is more general: is the income of any lower-earning member of a household, whether a student, an aged relative, or a spouse, less well-reported by the principal earner in the household than is the principal income? Is there a general tendency to underestimate or otherwise misreport the less crucial items in a family budget? Kulka makes the important point that the 1979 Research Panel of the ISDP was not primarily designed as a substantive data collection instru- ment but instead as a flexible vehicle for a num- ber of experiments in the technical and operation- al problems of an income survey. This means not only that the ISDP is a rich data resource for methodological research but also that substantive research must take account of a variety of design effects. Because funding for ISDP research was termina- ted in 1982 — just when the data sets were becoming available for research — a number of the experi- ments described by Kulka have been underanalyzed. Kulka 's paper raises more questions than can be answered, as a consequence. The ISDP results that offer some confidence in the data were based on the particular design adop- ted in the ISDP. As that design was not trans- planted to the SIPP, we must not read Kulka 's paper as indicative of the quality of data to be derived from the SIPP. The SIPP may be better, or it may not. The same sort of research agenda planned for the ISDP is needed for the SIPP, so that we can have the confidence in the SIPP data that we can now have in the ISDP. 165 CASE STUDIES IN PANEL SURVEY DESIGN: THE INTERNATIONAL EXPERIENCE SESSION V This section is comprised of four papers presented in this session which was sponsored by the Section on Social Statistics. THE SURVEY OF INCOME AND PROGRAM PARTICIPATION Roger A. Herriot and Daniel Kasprzyk, Bureau of the Census Introduction In October 1983, the Bureau of the Census con- ducted the first interviews of the Survey of Income and Program Participation (SIPP). The SIPP is a nationally representative household survey intended to provide information on all sources of cash and noncash income, eligibility and participation in various government transfer programs, disability, labor force status, assets and liabilities, pension coverage, taxes, and many other items. Data from the survey will provide a multiyear perspective on changes in income, and their relationship to participation in government programs, changes in household composition, and so forth. The purpose of this paper is to review the need for a new survey, briefly describe the research and development work leading up to the SIPP, the survey design features, procedures, and content of the survey. Data products and current survey and research activities will also be described. The Need for a New Survey The development of SIPP arose in response to a recognition that the principal source of informa- tion on the distribution of household and per- sonal income in the United States, the March Income Supplement of the Current Population Sur- vey (CPS), had limitations which could only be rectified by making substantial changes in the survey instrument and procedures. For example, the CPS: 1) does not measure monthly income flows and month-to-month changes; 2) provides annual income estimates, where- as eligibility for most Federal programs is based on a monthly accounting period; 3) produces estimates of last year's income based on current household membership; 4) does not measure asset holdings and liabilities and does not provide enough measures of categorical information to produce sound estimates of program eligibility; and 5) underestimates income from transfer pro- grams, retirement and disability income, unemployment compensation, and property income. Because of these limitations, the Income Survey Development Program (ISDP) began. The purpose of the ISDP, authorized in 1975, was to desiqn and prepare for a major new survey, the Survey of Income and Program Participation (SIPP). The development effort was directed by the Office of the Assistant Secretary for Plann- ing and Evaluation in the Department of Health and Human Services and was carried out jointly with the Bureau of the Census, which assisted 1n