Here
Transcription
Here
Part II – Introduction to SILC Data Structure and Documentation DwB Training Course on EU-SILC Longitudinal data Paris, 19-21 February 2014 Heike Wirth Aims of this session • • • • 2 Introduce the rotational design Explain the concept of the selected respondent Explain the organisation of the data Point out some reading: Documents of priority Illustration of the rotational design 3 Rotational design - Illustration 2006 Initial sample 4 Rotational design – Illustration cross-sectional 2006 5 Rotational design – Illustration longitudinal 6 Rotational design – Illustration longitudinal 2006 e.g. longitudinal data 2011 7 Rotational design – empirical Not equivalent to the number of years of participation 8 Rotational design – empirical tab DB075 HHYNR HHYNR (number of hh-year) HHYNR(= number of household year) is not included in the data, must be created Source: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations 10 Rotational design - empirical tab HHYNR YEAR HHYNR (number of hh-year) HHYNR(= number of household year) is not included in the data, must be created Source: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations 11 Rotational design - empirical tab HHYCOUNT HHYNR HHYNR HHYCOUNT HHYCOUNT (= count of household-years) is not included in the data, must be created Source: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations 12 Observation Units Concept of the selected respondent 13 Selected respondent Collection unit/data source 14 Survey countries Register countries Household (HH) HHRespondent Registers/HH-R All HH-members HHRespondent Registers/HH-R All HHmembers aged 16+ Registers/HH-R Type of information Observation unit Social exclusion, housing, childcare … Basic demographic personal data Basic personal data on education, labour information, income … All HH-members aged 16+ Detailed personal data on health, access to health care, labour market activity … All HH-members aged 16+ or Selected respondent All HHmembers aged 16+ Selected respondent (One person 16+ per Household) 15 Example: PH030- Limitation in activities because of health problems (register countries) (mainly) not selected respondents (see PH030_F) 16 Source: UDB_l11P_ver 2011-1 from 01-08-2013.dta Organisation of the data 17 Organisation of the data EU-SILC consists of 4 separate files for the cross-sectional data Household Register FILE Household Data FILE Personal Register FILE Personal Data FILE 18 Organisation of the data … and of 4 separate data files for the longitudinal data Household Register FILE Household Data FILE Personal Register FILE Personal Data FILE 19 Household Files- longitudinal Household Register D-File • Includes every selected household (also those where the address could not be contacted or which could not be interviewed) > 19 variables: household Household Data H-File • Only households which have been contacted and completed a hh interview and at least one hh member has complete data in the personal data file > 180 variables (incl. flag-variables & imputationfactors): basic data, social exclusion, income, housing identifier, sampling design information, region UDB_l11D_ver 2011-1 from 01-08-2013: N = 542 942 households 20 UDB_l11H_ver 2011-1 from 01-08-2013: N = 411 189 households Personal Files - longitudinal Personal Register R-File • Every person currently living in hh or temporarily absent. Longitudinal file: also persons registered in the R-File of the previous year or living at least 3 months in the hh during the income reference period. Personal Data P-File • Only reference population (persons aged 16 and over) and only persons for whom the information could be completed by interview (personal/proxy) and/or register > 190 variables (incl. flag variables & imputation factors): e.g. demographic, income, work and unemployment > 50 variables (incl. flag variables): basic information e.g. relationship between household members 21 UDB_l11R_ver 2011-1 from 01-08-2013 N=1,079,261 persons UDB_l11P_ver 2011-1 from 01-08-2013; N= 879,720 persons Depending on the research question: Use of separate datasets Household Register Personal Register Household Data Personal Data 22 …. or a combination of different datasets Household Register Personal Register Household Data Personal Data 23 Organisation of the data While for both, c-s and longitudinal data all 4 files are linkable among each other, c-s and longitudinal data are not linkable Household Register Personal Register Household Register Personal Register Household Personal Data Household Data Personal Data Data cross-sectional data 24 longitudinal data Organisation of the data … as well as cross-sectional data are not linkable over time (HH-ID and related identifaction variables are randomized) t t+1 25 HH Register Personal Register HH Data Personal Data HH Register Personal Register hh Data Personal Data Organisation of the data … combine different datasets – Key Variables • In order to link (combine) the four files D, H, R and P among each others all observations must have a unique link to the respective three other files This link is achieved by the following 4 key variables (1) Year of Survey (2) Country (3) Household ID (4) Personal ID 26 Organisation of the data … combine different datasets – Key Variables Household Register Year of Survey Country Household ID Household Data 27 Personal Register Year of Survey Country Household ID Year of Survey Country Household ID Personal ID Personal Data Organisation of the data Household ID – Personal ID • Household ID • • Cross-sectional (max. 6 digits) = hh number 1-999999 Longitudinal (max. 8 digits) = hh number 1-999999 + split number Default split number = 00 • Personal ID • • 28 Cross-sectional = hh-id + personal number (max 2 digits) Longitudinal = hh number + default split number (00) + personal number In the longitudinal survey the Personal ID never changes, even if the person moves to a different household in the cross-sectional survey, from year to year the Household ID and Personal ID may change The 4 key variables – illustration (longitudinal data) year country hh_id pers_id year of birth 2010 2010 2011 2011 2009 2009 2009 2009 2009 2010 2010 2010 2010 2010 2010 2010 2011 2011 2011 2011 2011 2011 A A A A B B B B B B B B B B B B B B B B B B 40017100 40017100 40017100 40017100 40017100 40017100 40017100 40017100 40017100 40017100 40017100 40017100 40017100 40017100 40017101 40017101 40017100 40017100 40017100 40017101 40017101 40017101 4001710001 4001710002 4001710001 4001710002 4001710001 4001710002 4001710003 4001710004 4001710005 4001710001 4001710002 4001710003 4001710004 4001710005 4001710003 4001710004 4001710001 4001710002 4001710005 4001710002 4001710003 4001710004 1937 1939 1937 1939 1953 1956 1982 1984 1985 1953 1956 1982 1984 1985 1982 1984 1953 1956 1985 1956 1982 1984 29 Combining information from two separate files at a 1:1 level 30 Combined data 31 Combining information from two separate files at a 1:n level 32 Combined data 33 Use of separate sub datasets Create household level variables from personal level data, e.g. number of current household members • persons < 18 in household • age of the youngest child in household • Number of unemployed hh-members • Highest educational level in household • … • 34 Create new household level summary variables from person level information, e.g. household size, number of children, age of youngest child (< 18 years) year 2010 2010 2010 2011 2011 2011 2011 2011 2010 2010 2008 2008 2008 2009 2009 2009 2010 2010 2010 2011 2011 2011 2011 country a a a a a a b b b b c c c c c c c c c c c c c hh_id 6800 6800 6800 6800 6800 6800 6800 6800 7000 7000 7000 7000 7000 7000 7000 7000 7000 7000 7000 7000 7000 7000 7000 pers_id 680001 680002 680003 680001 680002 680003 680001 680002 700001 700002 700001 700002 700003 700001 700002 700003 700001 700002 700003 700001 700002 700003 700004 RX010 36 35 17 36 36 18 69 73 80 80 42 34 2 43 35 3 44 36 4 45 37 5 0 new hh-level variables added from hh-data hhsize numchild ychild HX080 3 1 17 0 3 1 17 0 3 1 17 0 3 0 . 0 3 0 . 0 3 0 . 0 2 0 . 0 2 0 . 0 2 0 . 0 2 0 . 0 3 1 2 1 3 1 2 0 3 1 2 0 3 1 3 0 3 1 3 0 3 1 3 0 3 1 4 1 3 1 4 1 3 1 4 1 4 2 0 1 4 2 0 1 4 2 0 1 4 2 0 1 35 Some reading – Documents of priority 36 Some reading – Documents of priority Guidelines_Doc65_2011.pdf • • • General technical information on sample design, weights, etc. List of all variables included in the original EU-SILC data base Description of (cross-sectional and longitudinal) variables DIFFERENCES BETWEEN DATA COLLECTED AND UDB.doc • • List of variables removed or added to Userdata Base (UDB) Methods of anonymisation SILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls National and EU Quality reports • http://epp.eurostat.ec.europa.eu/portal/page/portal/income_social_inclusi on_living_conditions/quality 37 Some reading – Documents of priority Guidelines_Doc65_2011.pdf 38 Source: Guidelines_Doc65_2011.pdf Some reading – Documents of priority Flag Variable HH020_F 39 Source: Guidelines_Doc65_2011.pdf Some reading – Documents of priority Flag Variable HH021_F 40 Source: Guidelines_Doc65_2011.pdf Some reading – Documents of priority Cross-sectional data 2011 Source: UDB_c11H_ver 2011-2 from 01-08-13.dta 41 Some reading – Documents of priority Longitudinal data 2011 Old (HH020) New (HH021) Source: UDB_l11H_ver 2011-1 from 01-08-2013.dta 42 Some reading – Documents of priority Example: variable included in the cross-sectional and longitudinal data 43 Source: Guidelines_Doc65_2011.pdf Some reading – Documents of priority Example: variable included in the cross-sectional only 44 Source: Guidelines_Doc65_2011.pdf Some reading – Documents of priority Example: variable included in longitudinal data only 45 Source: Guidelines_Doc65_2011.pdf Some reading – Documents of priority Example: selected respondent 46 Source: Guidelines_Doc65_2011.pdf Some reading – Documents of priority Differences between data collected and Userdata Base (cross-sectional file) 47 Some reading – Documents of priority Differences between data collected and Userdata Base (longitudinal file) 48 Source: L2011 DIFFERENCES BETWEEN DATA COLLECTED AND UDB.doc Some reading – Documents of priority Differences between data collected and Userdata Base (cross-sectional file) 49 Some reading – Documents of priority Differences between data collected and Userdata Base (longitudinal file) 50 Some reading – Documents of priority SILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls Source: SILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls 51 Some reading – Documents of priority Quality reports 52 Data Structure – Some reading National quality reports 53 Data Structure – Some reading E.G. Austria: Final Quality Report Relating to the EU-SILC Operation 2007-2010 54 Source: Austria, Final Quality Report Relating to the EU-SILC Operation 2007-2010, p. 7 55 THANK YOU [email protected] 56