Case-control study The dataset “data_assessment2_ccstudy.dta” provides data from 560 patients admitted to hospital (in a region with malaria) who are part of a hypothetical nested case-control study. There are 140 patients who died within 1 year of hospital admission (cases) and 420 controls, the cases and controls have been selected from a larger cohort study of 13000 patients where 140 had died within 1 year of follow-up. Sex and age were routinely recorded in the hospital admission records. For the case-control study further information, haemoglobin level and malaria infection status on admission, were extracted from laboratory data records. The variables in this dataset are: Variable name Description id Unique identifier dead Died within 1 year of hospital admission (0 = control, 1 = case) age Age at baseline (years) haemoglobin Haemoglobin level at baseline (g/dL) malaria Malaria at baseline (0 = no malaria; 1 = malaria) male Sex of patient (0=female, 1=male) We will use multivariable logistic regression to investigate the evidence for an association between haemoglobin and death, controlling for the possibility that this association is confounded by other exposure variables that appear in the dataset. 1. (Description of study sample; 16 marks) a) Present histograms for age and haemoglobin and describe the distribution of these variables in terms of approximate normality and appropriate measures of centrality and spread. (6 marks) b) Provide a table that summarises the distribution of age, sex, haemoglobin, and malaria with separate columns for those who died (cases) and the controls (remember this is a case-control study). (5 marks) c) Using the information regarding the numbers of patients who died in the 1 year follow-up period in the cohort study, estimate the odds of death for a patient in the cohort study. (2 marks) d) Calculate the estimated odds of death in the case-control study. Why isn’t this estimate equal to the odds of death in the cohort study (calculated in 1c)? (3 marks)