get file 'original datafile' .
* Unspecified workers (hiscode 99900 and 99920) cannot be assigned to a specific Hisclass, because
* it is unknown whether they are farm workers (62105) or unskilled laborers (99910). It is therefore
* estimated which of these two is most likely, using information on the place of residence. If there
* is no information on place of residence, than place of marriage (of groom and bride) is used instead.
* Method:
* - If there are at least 50 observations in a certain place of residence (groom, groom's father en bride's father
* combined) and the percentage working in the primary sector (hisco major group 6 as a percentage of all
* valid hiscodes) is lower than 30%, all unspecified workers in this place become 99910.
* - If there are at least 50 observations in a certain place of residence and the percentage working in the
* primary sector is at least 30%, all unspecified workers in this place become 62105.
* - All places with less than 50 observations are combined; if there are at least 50 observations together
* and if the percentage working in the primary sector is lower than 30%, all unspecified workers in these
* places become 99910;
* - and if there are at least 50 observations together and if the percentage working in the primary sector is
* at least 30%, all unspecified workers in these places become 62105;
* - and if we have less than 50 observations, all unspecified workers in these places become 62105.
* - All observations with missing place of residence are combined; if there are at least 50 observations
* together and if the percentage working in the primary sector is lower than 30%, all unspecified workers
* in these places become 99910;
* - and if we have at least 50 observations together and if the percentage working in the primary sector is
* at least 30%, all unspecified workers in these places become 62105;
* - and if we have less than 50 observations, all unspecified workers in these places become 62105.
* Variable names in this job:
* ghisco = hisco of groom.
* fghisco = hisco of groom's father.
* fbhisco = hisco of bride's father.
* gplace = place of residence groom.
* fgplace = place of residence groom's father.
* fbplace = place of residence bride's father.
* plmar = place of marriage
compute gworker = 0.
if (ghisco eq 99900 or ghisco eq 99920) gworker = 1.
compute fgworker = 0.
if (fghisco eq 99900 or fghisco eq 99920) fgworker = 1.
compute fbworker = 0.
if (fbhisco eq 99900 or fbhisco eq 99920) fbworker = 1.
freq gworker fgworker fbworker.
* n.b. in this dataset there are xx% (grooms), xx% (groom's father) and xx% (bride's father)
* unspecified workers (together xx cases).
* Missing values on place of residence are substituted by place of marriage.
if missing(gplace) gplace = plmar.
if missing(fgplace) fgplace = plmar.
if missing(fbplace) fbplace = plmar.
* Note here if there are still missings.
compute gsector = trunc(ghisco/10000).
if (ghisco eq 99900 or ghisco eq 99920) gsector = -1.
compute fgsector = trunc(fghisco/10000).
if (fghisco eq 99900 or fghisco eq 99920) fgsector = -1.
compute fbsector = trunc(fbhisco/10000).
if (fbhisco eq 99900 or fbhisco eq 99920) fbsector = -1.
missing values gsector fgsector fbsector (-1).
freq gsector fgsector fbsector.
compute gfarm = 0.
if (gsector eq 6) gfarm = 1.
if missing(gsector) gfarm = -1.
compute fgfarm = 0.
if (fgsector eq 6) fgfarm = 1.
if missing(fgsector) fgfarm = -1.
compute fbfarm = 0.
if (fbsector eq 6) fbfarm = 1.
if missing(fbsector) fbfarm = -1.
missing values gfarm fgfarm fbfarm (-1).
freq gfarm fgfarm fbfarm.
sort cases by gplace.
compute place = gplace.
aggregate outfile = 'xxxg.sav'
/ break = place / gaantal = n / gmissfarm = nmiss(gfarm) / gfarm = pin(gfarm,1,1).
sort cases by fgplace.
compute place = fgplace.
aggregate outfile = 'xxxfg.sav'
/ break = place / fgaantal = n / fgmissfarm = nmiss(fgfarm) / fgfarm = pin(fgfarm,1,1).
sort cases by fbplace.
compute place = fbplace.
aggregate outfile = 'xxxfb.sav'
/ break = place / fbaantal = n / fbmissfarm = nmiss(fbfarm) / fbfarm = pin(fbfarm,1,1).
match files file = 'xxxg.sav'
/ file = 'xxxfg.sav'
/ file = 'xxxfb.sav'
by place.
if missing (gaantal) gaantal = 0.
if missing (fgaantal) fgaantal = 0.
if missing (fbaantal) fbaantal = 0.
compute aantal = gaantal + fgaantal + fbaantal.
freq aantal.
* In xx% of all places there are at least 50 observations.
if missing (gfarm) gfarm = 0.
if missing (fgfarm) fgfarm = 0.
if missing (fbfarm) fbfarm = 0.
if missing (gmissfarm) gmissfarm = 0.
if missing (fgmissfarm) fgmissfarm = 0.
if missing (fbmissfarm) fbmissfarm = 0.
* Calculation of percentage in the primary sector.
compute rural = (gfarm/100*(gaantal-gmissfarm) + fgfarm/100*(fgaantal-fgmissfarm)
+ fbfarm/100*(fbaantal-fbmissfarm) ) / aantal.
freq rural.
* Calculation of percentage in the primary sector in all places with less than
* 50 observations.
compute n = aantal - gmissfarm - fgmissfarm - fbmissfarm.
weight by n.
temp.
select if (aantal lt 50).
desc rural.
weight off.
* xx% (N=xx).
compute workcode = 0.
if (aantal lt 50) workcode = xxxxx. ** Depends on calculation above: 62105 or 99910.
if (aantal ge 50 and rural ge 0.30) workcode = 62105.
if (aantal ge 50 and rural lt 0.30) workcode = 99910.
freq workcode.
* In xx% of all places unspecified workers become 99910.
save outfile 'place.sav'
/ keep place workcode.
get file 'original datafile' .
* Missing values on place of residence are substituted by place of marriage.
if missing(gplace) gplace = plmar.
if missing(fgplace) fgplace = plmar.
if missing(fbplace) fbplace = plmar.
sort cases by gplace.
compute place = gplace.
match files file = * / table = 'place.sav'
/ by place.
rename var (workcode = gworkcode).
sort cases by fgplace.
compute place = fgplace.
match files file = * / table = 'place.sav'
/ by place.
rename var (workcode = fgworkcode).
sort cases by fbplace.
compute place = fbplace.
match files file = * / table = 'place.sav'
/ by place.
rename var (workcode = fbworkcode).
freq gworkcode fgworkcode fbworkcode.
compute ghisco2 = ghisco.
compute fghisco2 = fghisco.
compute fbhisco2 = fbhisco.
if ((ghisco eq 99900 or ghisco eq 99920) and gworkcode eq 62105) ghisco2 = 62105.
if ((ghisco eq 99900 or ghisco eq 99920) and gworkcode eq 99910) ghisco2 = 99910.
if ((fghisco eq 99900 or fghisco eq 99920) and fgworkcode eq 62105) fghisco2 = 62105.
if ((fghisco eq 99900 or fghisco eq 99920) and fgworkcode eq 99910) fghisco2 = 99910.
if ((fbhisco eq 99900 or fbhisco eq 99920) and fbworkcode eq 62105) fbhisco2 = 62105.
if ((fbhisco eq 99900 or fbhisco eq 99920) and fbworkcode eq 99910) fbhisco2 = 99910.
temp.
select if ghisco eq 99900 or ghisco eq 99920.
freq ghisco2.
temp.
select if fghisco eq 99900 or fghisco eq 99920.
freq fghisco2.
temp.
select if fbhisco eq 99900 or fbhisco eq 99920.
freq fbhisco2.
* Of the xx unspecified workers among the grooms xx become 62105 and xx become 99910.
* Of the xx unspecified workers among the groom's fathers xx become 62105 and xx become 99910.
* Of the xx unspecified workers among the bride's fathers xx become 62105 and xx become 99910.
save outfile 'xxx'
/ drop gworkcode fgworkcode fbworkcode gplace fgplace fbplace place.