1MAST90102Assessment 5Guidelines for the length and structure of responses are given in the questions. Pleaseensure your submission is as polished and professionally presented as possible;…

1MAST90102Assessment 5Guidelines for the length and structure of responses are given in the questions. Pleaseensure your submission is as polished and professionally presented as possible; please seejournal articles as a guide.Part 1 (28 marks)The dataset chol_riskf.dta provides data from 239 unrelated individuals from the VictorianFamily Heart Study. Total cholesterol (totchol) is the outcome measure.The variables in this dataset are:

Variable name Descriptionproxyid Unique identifiermale 0=Female, 1=Maleage Age (years)hgt Height (cm)wgt Weight (kg)bmi Body mass index (kg/m2)smoke Smoking status (0=non-smoker, 1=ex-smoker,2=current smoker (≤ 20 cigarettes per day), 3= currentsmoker (> 20 cigarettes per day)totchol Total cholesterol level (mmol/litre)

If you perform some quick exploratory analysis you will see that the total cholesterol variableis reasonably well behaved: please analyse this variable on its original scale. (Do not spendtime in your responses considering whether or not transformation of the outcome isnecessary.)You are asked to consider the following exposure measures, which are both of potentialinterest because they relate to total cholesterol:

wgt Weight (kg)bmi Body mass index (kg/m2)

The other variables (age, male sex, and smoking status) are all to be considered as potentialcovariates in multiple regression analysis, as described in the questions below.[Note that the usual caveats apply: these data have been sampled and modified from anoriginal study and no substantive conclusions should be drawn from these analyses.]The overall aim of your analysis is to examine the evidence for an association between totalcholesterol and the two exposure measures using regression methods, following the outlinebelow.For parts 1a and 1b written explanations and interpretations, tables and graphs should beprovided. Computer output or code should be provided in an Appendix.Perform the following steps (marks indicated):(1a) [6 marks – 1 page limit]Use a multiple linear regression model to obtain estimates of association between totalcholesterol and the two exposure measures simultaneously. (For part (a) ignore all of theother covariates.) Would you recommend omitting one of the exposure measures from themodel? Is there a collinearity problem?2(1b) [10 marks – 2 page limit]Your collaborator is concerned about potential confounding effects and effect modification.They have stated in a Statistical Analysis Plan (SAP) that (i) age, sex and smoking status areconfounders, and that (ii) a further analysis will investigate if sex modifies the associationbetween the exposures of interest and total cholesterol.Perform the analyses for (i) and (ii) above and provide appropriate tables, figures and textthat interprets the findings.For the potential confounding effect of sex, age and smoking status, comment on whetheradjustment for these variables affect the associations found in part (a)? Can you explain why,for the major effects that you observe? (This should be in general statistical terms, withoutneeding to be an expert in the subject matter.)[N.B. For all of these analyses, including part (a), you should investigate if the associationbetween continuous covariates and the outcome is linear. Note, you do not need to go intoextensive detail with respect to investigating particular influential points, unless you identifymajor issues that would affect the overall interpretations.](1c) [6 marks – 200 words limit]Provide a statistical analysis paragraph (as commonly given in the methods sections ofmedical research articles – see British Medical Journal (BMJ; www.bmj.com/theBMJ ) forexamples) that describes your analysis.(1d) [6 marks – 200 words limit]Conclude with a general summary that describes the findings for the associations betweenthe two exposures (body weight and body mass index) and the outcome, total cholesterol.This should take the form of a single paragraph that summarises the main results andattempts to interpret them.Part 2 (14 marks)A sexual health researcher has asked you for some statistical help in interpreting the resultsof their study. In this study, the researcher randomised 100 people into 4 different educationinterventions, and measured their knowledge on sexually transmitted infections (STIs) onemonth later. The knowledge score is measured on a scale from 0 to 25 and the educationgroups are as follows:Group A: Control groupGroup B: A one on one discussion with a nurse about STIsGroup C: A fact sheet / brochureGroup D: A group presentationThe data are provided in the dataset “knowledge.dta”.The researcher has previously completed an introductory statistics course, and analysed thescores across groups using the stata code below, where variables B, C, and D representindicator variables for education groups B, C and D respectively, and ‘score’ represents theknowledge score.regress score B C D

Source | SS df MS Number of obs = 100————-+———————————- F(3, 96) = 2.70Model | 214.16 3 71.3866667 Prob > F = 0.0497Residual | 2534.4 96 26.4 R-squared = 0.0779————-+———————————- Adj R-squared = 0.0491Total | 2748.56 99 27.7632323 Root MSE = 5.1381


score | Coef. Std. Err. t P>|t| [95% Conf. Interval]————-+—————————————————————-

B | 2.36 1.453272 1.62 0.108 -.5247225 5.244722C | 2.32 1.453272 1.60 0.114 -.5647225 5.204722D | 4.12 1.453272 2.83 0.006 1.235278 7.004722_cons | 14.68 1.027619 14.29 0.000 12.64019 16.71981

——————————————————————————(2a) [4 marks]The researcher interprets the results as telling him that group D (group presentation) is theonly one that produces a higher knowledge score than the control group. Why does he makethis conclusion and what is wrong with it?(2b) [6 marks]Following the conclusion that he reached in part 2a, the researcher decided to leave the“non-significant” indicator variables out of the regression model and obtained the followingresults:regress score D

Source | SS df MS Number of obs = 100————-+———————————- F(1, 98) = 4.59Model | 122.88 1 122.88 Prob > F = 0.0347Residual | 2625.68 98 26.7926531 R-squared = 0.0447————-+———————————- Adj R-squared = 0.0350Total | 2748.56 99 27.7632323 Root MSE = 5.1762


score | Coef. Std. Err. t P>|t| [95% Conf. Interval]————-+—————————————————————-

D | 2.56 1.195383 2.14 0.035 .1878005 4.932199_cons | 16.24 .5976917 27.17 0.000 15.0539 17.4261

——————————————————————————He suggests that this provides the simplest summary result, and asks you to explain why thecoefficient estimate has reduced (and the P-value increased) compared with the previousmodel. Would you recommend that this estimate be reported? Explain why or why not andprovide a detailed explanation of what it means, using a little algebra to make this clear.(2c) [4 marks]Having persuaded the investigator that the initial approach above was not addressing hisspecific questions of interest, you find after further discussion that he is primarily interestedin the following comparisons among the diets:(i) Control group versus all other education interventions combined together(ii) The fact sheet / brochure (C) compared with the more interactive interventionscombined together (B and D)Express these comparisons in terms of the ’s using a regression model with binary groupindicators for interventions B, C, and D, and estimate them using the data and theappropriate Stata command.Part 3 (18 marks)For this question you will need to analyse the dataset of 500 patients who were randomisedto either a new treatment or the standard treatment.The variables in the dataset were simulated for this question and are coded as:

treatment fscore_0 new treatment (coded as 1) & standard treatment (coded as 0)foot score (measuring pain in foot on a scale of 0 to 100 where a higher scoreindicates less pain) at baseline.


fscore_12 foot score at 12 months.(3a) [4 marks]

Provide a table of the distribution of foot score at baseline and 12 months by treatmentgroup and describe this table in a single paragraph.(3b) [4 marks]In question 3a you would have noticed that there are missing data for foot score at 12months for some of the trial participants. Provide a table of the distribution of treatment andfoot score at baseline for those with and without foot scores measurements at 12 monthsand describe this table in a single paragraph.(3c) [4 marks]Perform a linear regression, adjusting for baseline foot score, to estimate the associationbetween treatment and foot score at 12 months. This analysis is known as a complete-caseanalysis as only those with complete data on all variables in the regression model areincluded. Comment on the potential limitations of this analysis in terms of bias and precision.(3d) [6 marks]Sometimes researchers perform an adhoc approach, known as the last observation carriedforward (LOCF) to handle missing data in the outcome of the trial. Here the missing valuesfor the foot score at 12 months are replaced by the participant’s foot score at baseline.Perform this analysis and comment on how the estimate and standard error for treatmenthas changed compared to the complete-case estimate and standard error in part 3c.Comment on the major assumption that this approach makes and why it is therefore notrecommended.

Let’s block ads! (Why?)

Do you need any assistance with this question?
Send us your paper details now
We’ll find the best professional writer for you!


error: Content is protected !!