MINITAB Statistical Analysis of Data get help

MINITAB Statistical Analysis of Data get help
Deadline: 5pm Monday 16th October 2017
Introduction and dataset
The aim of this coursework is to investigate and predict the onset of diabetes based on
various diagnostic measurements.
The dataset was originally compiled by researcher at the Johns Hopkins University
School of Medicine, from a larger database owned by the National Institute of Diabetes
and Digestive and Kidney Diseases. All patients were females at least 21 years old of
Pima Indian heritage. Note that Pima Indians have one of the highest rates of diabetes
in the world.
This dataset includes 392 observations, taken at the individual level and available from
diabetes_dataset.xlsx file in Statistical Data Analysis Coursework folder on NOW.
The key indicator of diabetes (response variable), as defined by the World Health
Organization, is a plasma glucose concentration greater than 200 mg/dl two hours
following ingestion of a 75 gm carbohydrate solution (variable Glucose).
The  explanatory variables (or predictors) are known risk factors for diabetes: number of
pregnancies, diastolic blood pressure, triceps skinfold thickness (an indicator of
bodyfat), 2 hour serum insulin, body mass index, age, and diabetes pedigree function
(see Table).
Table. Measurements recorded in the dataset,
Measurement/variables Description
Glucose plasma glucose concentration 2 hours in an
oral glucose tolerance test
Pregnancies number of times pregnant
BloodPressure diastolic blood pressure (mm Hg)
SkinThickness triceps skin fold thickness (mm)
Insulin 2-Hour serum insulin (mu U/ml)
BMI body mass index (weight in kg/(height in m)2
)
DiabetesPedigreeFunction diabetes pedigree function*
Age age (years)
Outcome class variable (0 or 1)**
* a synthesis of diabetes history in an individual’s relatives
**negative (0)/positive (1) diabetes test
Creating your unique dataset
Copy the data from this file into MINITAB so that Glucose is recorded in column C1,
Pregnancies in C2, etc.
(1) Generate two random numbers between 2 and 7 and provide MINITAB output.
(1 mark)
(2) Using MINITAB, erase columns corresponding to your generated numbers (e.g. if
one of the generated numbers is 5 then erase column C5, etc). Describe how you did
this and provide the sequence of actions (e.g. Calc->Descriptive Stats->….)
(2 mark)
(3) Using MINITAB select a random sample of 300 observations (n = 300) from your
dataset. Provide the sequence of actions of how you did this.
(1 mark)
Your unique dataset will now consist of 300 rows and seven columns including
Glucose, Age and Outcome.
Investigating your unique dataset
(4) For your unique dataset summarise information about your observations and present
graphically the frequency distributions for all variables that are left in your unique
dataset including Glucose but excluding Outcome variables. Comment on unusual
observations and make your own decision, how to deal with them.
(6 marks)
(5) Using MINITAB, define a new variable, Age_Group, by combining observations
for participants younger than 30 into group 1 and all others (of age 30 and older) into
group 2. Provide either a description or a screen shot of how you did this.
(3 marks)
(6) Investigate whether there is a significant difference in mean/median Glucose
concentration between age groups. Formulate the null and alternative hypotheses;
choose, justify and perform an appropriate statistical test using MINITAB; provide all
MINITAB outputs; write your conclusions.
(10 marks)
(7) Show whether the proportion of participants with Glucose concentration greater
than 100 mg/dl is different between age groups that you defined previously. Formulate
the null and alternative hypotheses; choose, justify and perform an appropriate
statistical test using MINITAB; provide all MINITAB outputs; write your conclusions.
(10 marks)
(8) Using MINITAB, produce a table of correlation coefficients. Justify the choice of
correlation coefficient, investigate the resulting table and comment on most interesting
relationships between chosen variables. Do not use Glucose and Outcome variables in
this analysis.
(4 marks)
(9) Using simple linear regression, model Glucose concentration by one of the
variables of your choice that are available in your unique dataset. Comment on
significance of intercept and slope.
(4 marks)
(10) Fit a multiple regression model with Glucose being a response variable and other
five variables excluding Outcome as predictors. Treat variable Pregnancies as an
interval scale data. Identify insignificant predictors in the model and explain why they
are insignificant.
(4 marks)
(11) Cluster your 300 observation into 10 groups using one of the linkage method and
similarity measure from the corresponding drop-down menus. Give a brief (half a page)
description of the linkage method and similarity measure chosen. Show a dendrogram
with cases labelled by Outcome. Comment on the results obtained. Provide all
MINITAB outputs.
(6 marks)
(12) It is known that the incidence of diabetes in the UK is 0.6. In a small northern
village of 100 people isolated from the mainland for six months per year the pharmacy
wants to know how many insulin shots to order. We want to know what is the
probability that between A and B people will develop the disease during this period. To
perform analysis, generate two random numbers between 0 and 100 using MINITAB
and paste the outputs into your report. Denote by A the smallest number and by B the
largest number out of these two generated numbers. Calculate the probability that
between A and B people develop the disease and how many shots should be ordered.
(9 marks)

Glucose Pregnancies BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
56 2 56 28 45 24.2 0.332 22 0
68 2 62 13 15 20.1 0.257 23 0
68 2 70 32 66 25 0.187 25 0
68 10 106 23 49 35.5 0.285 47 0
71 1 48 18 76 20.4 0.323 22 0
71 1 78 50 45 33.2 0.422 21 0
74 0 52 10 36 27.8 0.269 22 0
74 3 68 28 45 29.7 0.293 23 0
74 8 70 40 49 35.3 0.705 39 0
75 2 64 24 55 29.7 0.37 33 0
77 1 56 30 56 33.3 1.251 24 0
77 5 82 41 42 35.8 0.156 35 0
78 3 50 32 88 31 0.248 26 1
78 0 88 29 40 36.9 0.434 21 0
79 1 80 25 37 25.4 0.583 22 0
79 1 60 42 48 43.5 0.678 23 0
80 1 74 11 60 30 0.527 22 0
80 3 82 31 70 34.2 1.292 27 1
81 1 72 18 40 26.6 0.283 24 0
81 3 86 16 66 27.5 0.306 22 0
81 2 72 15 76 30.1 0.547 25 0
81 1 74 41 57 46.3 1.096 32 0
81 7 78 40 48 46.7 0.261 42 0
82 1 64 13 95 21.2 0.415 23 0
82 2 52 22 115 28.5 1.699 25 0
83 7 78 26 71 29.3 0.767 36 0
83 2 66 23 50 32.2 0.497 22 0
83 3 58 31 18 34.3 0.336 25 0
83 2 65 28 66 36.8 0.629 24 0
84 2 50 23 76 30.4 0.968 21 0
84 3 68 30 106 31.9 0.591 25 0
84 0 64 22 66 35.8 0.545 21 0
84 1 64 23 115 36.9 0.471 28 0
84 0 82 31 125 38.2 0.233 23 0
84 4 90 23 56 39.5 0.159 25 0
85 4 58 22 49 27.8 0.306 28 0
86 5 68 28 71 30.2 0.364 24 0
86 1 66 52 65 41.3 0.917 29 0
87 2 58 16 52 32.7 0.166 25 0
87 1 78 27 32 34.6 0.101 22 0
87 1 60 37 75 37.2 0.509 22 0
87 1 68 34 77 37.6 0.401 24 0
88 5 66 21 23 24.4 0.342 30 0
88 3 58 11 54 24.8 0.267 22 0
88 2 58 26 16 28.4 0.766 22 0
88 2 74 19 53 29 0.229 22 0
88 1 62 24 44 29.9 0.422 23 0
88 1 78 29 76 32 0.365 29 0
88 12 74 40 54 35.3 0.378 48 0
88 1 30 42 99 55 0.496 26 1
89 1 24 19 25 27.8 0.559 21 0
89 1 66 23 94 28.1 0.167 21 0
89 3 74 16 85 30.4 0.551 38 0
89 1 76 34 37 31.2 0.192 23 0
90 2 80 14 55 24.4 0.249 24 0
90 1 62 18 59 25.1 1.268 25 0
90 1 62 12 43 27.2 0.58 24 0
90 4 88 47 54 37.7 0.362 29 0
91 1 54 25 100 25.2 0.234 23 0
91 4 70 32 88 33.1 0.446 22 0
91 0 68 32 210 39.9 0.381 25 0
92 1 62 25 41 19.5 0.482 25 0
92 12 62 7 258 27.6 0.926 44 1
92 6 62 32 126 32 0.085 46 0
93 0 60 25 92 28.7 0.532 22 0
93 6 50 30 64 28.7 0.356 23 0
93 2 64 32 160 38 0.674 23 1
93 0 100 39 72 43.4 1.021 35 0
94 2 68 18 76 26 0.561 21 0
94 2 76 18 66 31.6 0.649 23 0
94 7 64 25 79 33.3 0.738 41 0
94 0 70 27 115 43.5 0.347 21 0
95 1 66 13 38 19.6 0.334 25 0
95 1 60 18 58 23.9 0.26 22 0
95 1 74 21 73 25.9 0.673 36 0
95 2 54 14 88 26.1 0.748 22 0
95 1 82 25 180 35 0.233 43 1
95 0 80 45 92 36.5 0.33 26 0
95 0 85 25 36 37.4 0.247 24 1
95 0 64 39 105 44.6 0.366 22 0
96 4 56 17 49 20.8 0.34 26 0
96 2 68 13 49 21.1 0.647 26 0
96 3 56 34 115 24.7 0.944 39 0
96 1 64 27 87 33.2 0.289 21 0
96 5 74 18 67 33.6 0.997 43 0
97 1 64 19 82 18.2 0.299 21 0
97 1 66 15 140 23.2 0.487 22 0
97 0 64 36 100 36.8 0.6 25 0
97 7 76 32 91 40.9 0.871 32 1
98 0 82 15 84 25.2 0.299 22 0
98 6 58 33 190 34 0.43 43 0
98 2 60 17 120 34.7 0.198 22 0
99 3 80 11 64 19.3 0.284 30 0
99 2 70 16 44 20.4 0.235 27 0
99 3 62 19 74 21.8 0.279 26 0
99 4 76 15 51 23.2 0.223 21 0
99 2 52 15 94 24.6 0.637 21 0
99 3 54 19 86 25.6 0.154 24 0
99 6 60 19 54 26.9 0.497 32 0
99 5 54 28 83 34 0.499 30 0
99 2 60 17 160 36.6 0.453 21 0
99 1 72 30 18 38.6 0.412 21 0
100 1 74 12 46 19.5 0.149 28 0
100 1 66 15 56 23.6 0.666 26 0
100 1 72 12 70 25.3 0.658 28 0
100 12 84 33 105 30 0.488 46 0
100 0 70 26 50 30.8 0.597 21 0
100 3 68 23 81 31.6 0.949 28 0
100 1 66 29 196 32 0.444 42 0
100 2 66 20 90 32.9 0.867 28 1
100 14 78 25 184 36.6 0.412 46 1
100 2 54 28 105 37.8 0.498 24 0
100 2 68 25 71 38.5 0.324 26 0
100 8 74 40 215 39.4 0.661 43 1
100 2 70 52 57 40.5 0.677 25 0
100 0 88 60 110 46.8 0.962 31 0
101 2 58 35 90 21.8 0.155 22 0
101 2 58 17 265 24.2 0.614 23 0
101 1 50 15 36 24.2 0.526 26 0
101 10 76 48 180 32.9 0.171 63 0
102 0 86 17 105 29.3 0.695 27 0
102 3 44 20 94 30.8 0.4 26 0
102 0 78 40 90 34.5 0.238 24 0
102 7 74 40 105 37.2 0.204 45 0
102 0 64 46 78 40.6 0.496 21 0
102 2 86 36 120 45.5 0.127 23 1
103 1 80 11 82 19.4 0.491 22 0
103 4 60 33 192 24 0.966 33 0
103 3 72 30 152 27.6 0.73 27 0
103 6 72 32 190 37.7 0.324 55 0
103 1 30 38 83 43.3 0.183 33 0
104 0 64 23 116 27.8 0.454 23 0
104 6 74 18 156 29.9 0.722 41 1
104 0 64 37 64 33.6 0.51 22 1
105 6 70 32 68 30.8 0.122 37 0
105 2 80 45 191 33.7 0.711 29 1
105 2 58 40 94 34.9 0.225 25 0
105 5 72 29 325 36.9 0.159 28 0
105 0 64 41 142 41.5 0.173 22 0
106 2 56 27 165 29 0.426 22 0
106 2 64 35 119 30.5 1.4 34 0
106 3 54 21 158 30.9 0.292 24 0
106 1 70 28 135 34.2 0.142 22 0
106 0 70 37 148 39.4 0.605 22 0
107 3 62 13 48 22.9 0.678 23 1
107 1 72 30 82 30.8 0.821 24 0
107 2 74 30 100 33.6 0.404 23 0
107 0 62 30 74 36.6 0.757 25 1
108 6 44 20 130 24 0.813 35 0
108 2 62 32 56 25.2 0.128 21 0
108 2 62 10 278 25.3 0.881 22 0
108 2 52 26 63 32.5 0.318 22 0
108 1 60 46 178 35.5 0.415 24 0
108 5 72 43 75 36.1 0.263 33 0
109 1 38 18 120 23.1 0.407 26 0
109 1 56 21 135 25.2 0.833 23 0
109 1 60 8 182 25.4 0.947 21 0
109 8 76 39 114 27.9 0.64 31 1
109 1 58 18 116 28.5 0.219 22 0
109 4 64 44 99 34.8 0.905 26 1
109 5 62 41 129 35.8 0.514 25 1
110 4 76 20 100 28.4 0.118 27 0
110 2 74 29 125 32.4 0.698 27 0
111 1 62 13 182 24 0.138 23 0
111 3 90 12 78 28.4 0.495 29 0
111 3 58 31 44 29.5 0.43 22 0
111 4 72 47 207 37.1 1.39 56 1
112 2 68 22 94 34.1 0.315 26 0
112 9 82 32 175 34.2 0.26 36 1
112 1 72 30 176 34.4 0.528 25 0
112 1 80 45 132 34.8 0.217 24 0
112 2 86 42 160 38.4 0.246 28 0
112 2 78 50 140 39.4 0.175 24 0
113 3 50 10 85 29.5 0.626 25 0
114 7 76 17 110 23.8 0.466 31 0
114 1 66 36 200 38.1 0.289 21 0
114 0 80 34 285 44.2 0.167 27 0
115 1 70 30 96 34.6 0.529 32 1
115 3 66 39 140 38.1 0.15 28 0
116 4 72 12 87 22.1 0.463 37 0
116 3 74 15 105 26.3 0.107 24 0
116 1 78 29 180 36.1 0.496 25 0
117 2 90 19 71 25.2 0.313 21 0
117 0 66 31 188 30.8 0.493 22 0
117 4 64 27 120 33.2 0.23 24 0
117 1 60 23 106 33.8 0.466 27 0
117 1 88 24 145 34.5 0.403 40 1
117 5 86 30 105 39.1 0.251 42 0
117 0 80 31 53 45.2 0.089 24 0
118 1 58 36 94 33.3 0.261 23 0
118 0 84 47 230 45.8 0.551 31 1
119 1 54 13 50 22.3 0.205 24 0
119 6 50 22 176 27.1 1.318 33 1
119 0 64 18 92 34.9 0.725 23 0
119 1 44 47 63 35.5 0.28 25 0
119 1 88 41 170 45.3 0.507 26 0
119 1 86 39 220 45.6 0.808 29 1
120 9 72 22 56 20.8 0.733 48 0
120 0 74 18 63 30.5 0.285 26 0
120 1 80 48 200 38.9 1.162 41 0
120 2 76 37 105 39.7 0.215 29 0
120 11 80 37 150 42.3 0.785 48 1
120 3 70 30 135 42.9 0.452 30 0
121 5 72 23 112 26.2 0.245 30 0
121 0 66 30 165 34.3 0.203 33 1
121 1 78 39 74 39 0.261 28 0
121 2 70 32 95 39.1 0.886 23 0
122 2 60 18 106 29.8 0.717 22 0
122 1 64 32 156 35.1 0.692 30 1
122 2 76 27 200 35.9 0.483 26 0
122 2 52 43 158 36.2 0.816 28 0
122 1 90 51 220 49.7 0.325 31 1
123 4 80 15 176 32 0.443 34 0
123 9 70 44 94 33.1 0.374 40 0
123 6 72 45 230 33.6 0.733 34 0
123 5 74 40 77 34.1 0.269 28 0
123 2 48 32 165 42.1 0.52 26 0
123 3 100 35 240 57.3 0.88 22 0
124 0 56 13 105 21.8 0.452 21 0
124 7 70 33 215 25.5 0.161 37 0
124 8 76 24 600 28.7 0.687 52 1
124 2 68 28 205 32.9 0.875 30 1
124 3 80 33 130 33.2 0.305 26 0
124 9 70 33 402 35.4 0.282 34 0
125 1 70 24 110 24.3 0.221 25 0
125 4 70 18 122 28.9 1.144 45 1
125 6 68 30 120 30 0.464 32 0
125 10 70 26 115 31.1 0.205 41 1
125 1 50 40 167 33.3 0.962 28 1
125 2 60 20 140 33.8 0.088 31 0
126 8 74 38 75 25.9 0.162 39 0
126 0 86 27 120 27.4 0.515 21 0
126 1 56 29 152 28.7 0.801 21 0
126 5 78 27 22 29.6 0.439 40 0
126 0 84 29 215 30.7 0.52 24 0
126 8 88 36 108 38.5 0.349 49 0
126 3 88 41 235 39.3 0.704 27 0
127 2 58 24 275 27.7 1.6 25 0
127 2 46 21 335 34.4 0.176 22 0
127 4 88 11 155 34.5 0.598 28 0
127 0 80 37 210 36.3 0.804 23 0
128 1 82 17 183 27.5 0.115 22 0
128 0 68 19 180 30.5 1.391 25 1
128 1 98 41 58 32 1.321 33 1
128 3 72 25 190 32.4 0.549 27 1
128 1 88 39 110 36.5 1.057 37 1
128 1 48 45 194 40.5 0.613 24 1
128 2 78 37 182 43.3 1.224 31 1
129 6 90 7 326 19.6 0.582 60 0
129 3 64 29 115 26.4 0.219 28 1
129 4 60 12 231 27.5 0.527 31 0
129 2 74 26 205 33.2 0.591 25 0
129 4 86 20 270 35.1 0.231 23 0
129 10 76 28 122 35.9 0.28 39 0
129 3 92 49 155 36.4 0.968 32 1
129 7 68 49 125 38.5 0.439 43 1
129 0 110 46 130 67.1 0.319 26 1
130 1 70 13 105 25.9 0.472 22 0
130 3 78 23 79 28.4 0.323 34 1
130 1 60 23 170 28.6 0.692 21 0
131 1 64 14 415 23.7 0.389 21 0
131 4 68 21 166 33.1 0.16 28 0
133 7 88 15 155 32.4 0.262 37 0
133 1 102 28 140 32.8 0.234 45 1
134 9 74 33 60 25.9 0.46 81 0
134 0 58 20 291 26.4 0.352 21 0
134 6 70 23 130 35.4 0.542 29 1
134 6 80 37 370 46.2 0.238 46 1
135 0 94 46 145 40.6 0.284 26 0
135 0 68 42 250 42.3 0.365 24 1
136 7 74 26 135 26 0.647 51 0
136 11 84 35 130 28.3 0.26 42 1
136 5 84 41 88 35 0.286 35 1
136 15 70 32 110 37.1 0.153 43 1
136 1 74 50 204 37.4 0.399 24 0
137 0 68 14 148 24.8 0.143 21 0
137 0 40 35 168 43.1 2.288 33 1
138 0 60 35 167 34.6 0.534 21 1
138 11 74 26 144 36.1 0.557 50 1
139 0 62 17 210 22.1 0.207 21 0
139 5 64 35 140 28.6 0.411 26 0
139 1 46 19 83 28.7 0.654 22 0
139 5 80 35 160 31.6 0.361 25 1
139 1 62 41 480 40.7 0.536 21 0
140 1 74 26 180 24.1 0.828 23 0
140 12 82 43 325 39.2 0.528 58 1
140 0 65 26 130 42.6 0.431 24 1
141 2 58 34 128 25.4 0.699 24 0
142 2 82 18 64 24.7 0.761 21 0
142 7 60 33 190 28.8 0.687 61 0
142 7 90 24 480 30.4 0.128 43 1
143 1 74 22 61 26.2 0.256 21 0
143 1 86 30 330 30.1 0.892 23 0
143 11 94 33 146 36.6 0.254 51 1
143 1 84 23 310 42.4 1.076 22 0
144 4 58 28 140 29.5 0.287 37 0
144 2 58 33 135 31.6 0.422 25 1
144 5 82 26 285 32 0.452 58 1
144 6 72 27 228 33.9 0.255 40 0
144 1 82 46 180 46.1 0.335 46 1
145 13 82 19 110 22.2 0.245 57 0
145 9 88 34 165 30.3 0.771 53 1
145 9 80 46 130 37.9 0.637 40 1
146 2 70 38 360 28 0.337 29 1
146 4 85 27 100 28.9 0.189 27 0
146 2 76 35 194 38.2 0.329 29 0
147 4 74 25 293 34.9 0.385 30 0
148 4 60 27 318 30.9 0.15 29 1
148 10 84 48 237 37.6 1.001 51 1
149 1 68 29 127 29.3 0.349 42 1
150 7 66 42 342 34.7 0.718 42 0
150 7 78 29 126 35.2 0.692 54 1
151 6 62 31 120 35.5 0.692 28 0
151 12 70 40 271 41.8 0.742 38 1
151 8 78 32 210 42.9 0.516 36 1
152 13 90 33 29 26.8 0.731 43 1
152 9 78 34 171 34.2 0.893 33 1
152 0 82 39 272 41.5 0.27 27 0
153 1 82 42 485 40.6 0.687 23 0
153 13 88 37 140 40.6 1.174 39 0
154 6 74 32 193 29.3 0.839 39 0
154 9 78 30 100 30.9 0.164 45 0
154 4 72 29 126 31.3 0.338 37 0
154 4 62 31 284 32.8 0.237 23 0
154 6 78 41 140 46.1 0.571 27 0
155 2 74 17 96 26.6 0.433 27 1
155 11 76 28 150 33.3 1.353 51 1
155 8 62 26 495 34 0.543 46 1
155 2 52 27 540 38.7 0.24 25 1
155 5 84 44 545 38.7 0.619 34 0
156 9 86 28 155 34.3 1.189 42 1
157 1 72 21 168 25.6 0.123 24 0
157 2 74 35 440 39.4 0.134 30 0
158 3 64 13 387 31.2 0.295 24 0
158 3 76 36 245 31.6 0.851 28 1
158 3 70 30 328 35.5 0.344 35 1
158 5 84 41 210 39.4 0.395 29 1
160 7 54 32 175 30.5 0.588 39 1
161 10 68 23 132 25.5 0.326 47 1
162 0 76 56 100 53.2 0.759 25 1
163 3 70 18 105 31.6 0.268 28 1
163 17 72 41 114 40.9 0.817 47 1
164 1 82 43 67 32.8 0.341 50 0
165 6 68 26 168 33.6 0.631 49 0
165 0 76 43 255 47.9 0.259 26 0
165 0 90 33 680 52.3 0.427 23 0
166 5 72 19 175 25.8 0.587 51 1
167 1 74 17 144 23.4 0.447 33 1
167 8 106 46 231 37.6 0.165 43 1
168 7 88 42 321 38.2 0.787 40 1
169 3 74 19 125 29.9 0.268 31 1
170 3 64 37 225 34.5 0.356 30 1
171 3 72 33 135 33.3 0.199 24 1
171 9 110 24 240 45.4 0.721 54 1
172 1 68 49 579 42.4 0.702 28 1
173 4 70 14 168 29.7 0.361 33 1
173 3 78 39 185 33.8 0.97 31 1
173 3 84 33 474 35.7 0.258 22 1
173 3 82 48 465 38.4 2.137 25 1
173 0 78 32 265 46.5 1.159 58 0
174 3 58 22 194 32.9 0.593 36 1
174 2 88 37 120 44.5 0.646 24 1
176 3 86 27 156 33.3 1.154 52 1
176 8 90 34 300 33.7 0.467 58 1
177 0 60 29 478 34.6 1.072 21 1
179 8 72 42 130 32.7 0.719 36 1
179 0 50 36 159 37.8 0.455 22 1
180 3 64 25 70 34 0.271 26 0
180 0 90 26 90 36.5 0.314 35 1
180 0 78 63 14 59.4 2.42 25 1
181 8 68 36 495 30.1 0.615 60 1
181 1 64 30 180 34.1 0.328 38 1
181 7 84 21 192 35.9 0.586 51 1
181 1 78 42 293 40 1.258 22 1
181 0 88 44 510 43.3 0.222 26 1
184 4 78 39 277 37 0.264 31 1
186 8 90 35 225 34.5 0.423 37 1
187 7 50 33 392 33.9 0.826 34 1
187 3 70 22 200 36.4 0.408 36 1
187 7 68 39 304 37.7 0.254 41 1
187 5 76 27 207 43.6 1.034 53 1
188 0 82 14 185 32 0.682 22 1
189 1 60 23 846 30.1 0.398 59 1
189 5 64 33 325 31.2 0.583 29 1
191 3 68 15 130 30.9 0.299 34 0
193 1 50 16 375 25.9 0.655 24 0
195 7 70 33 145 25.1 0.163 55 1
196 1 76 36 249 36.5 0.875 29 1
196 8 76 29 280 37.5 0.605 57 1
197 2 70 45 543 30.5 0.158 53 1
197 4 70 39 744 36.7 2.329 31 0
198 0 66 32 274 41.3 0.502 28 1

Statistical Analysis of Data Using SPSS

Statistical Analysis of Data Using SPSS
Introduction and dataset
The aim of this coursework is to investigate and predict the onset of diabetes based on
various diagnostic measurements.
The dataset was originally compiled by researcher at the Johns Hopkins University
School of Medicine, from a larger database owned by the National Institute of Diabetes
and Digestive and Kidney Diseases. All patients were females at least 21 years old of
Pima Indian heritage. Note that Pima Indians have one of the highest rates of diabetes
in the world.
This dataset includes 392 observations, taken at the individual level and available from
diabetes_dataset.xlsx file in Statistical Data Analysis Coursework folder on NOW.
The key indicator of diabetes (response variable), as defined by the World Health
Organization, is a plasma glucose concentration greater than 200 mg/dl two hours
following ingestion of a 75 gm carbohydrate solution (variable Glucose).

Glucose Pregnancies BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
56 2 56 28 45 24.2 0.332 22 0
68 2 62 13 15 20.1 0.257 23 0
68 2 70 32 66 25 0.187 25 0
68 10 106 23 49 35.5 0.285 47 0
71 1 48 18 76 20.4 0.323 22 0
71 1 78 50 45 33.2 0.422 21 0
74 0 52 10 36 27.8 0.269 22 0
74 3 68 28 45 29.7 0.293 23 0
74 8 70 40 49 35.3 0.705 39 0
75 2 64 24 55 29.7 0.37 33 0
77 1 56 30 56 33.3 1.251 24 0
77 5 82 41 42 35.8 0.156 35 0
78 3 50 32 88 31 0.248 26 1
78 0 88 29 40 36.9 0.434 21 0
79 1 80 25 37 25.4 0.583 22 0
79 1 60 42 48 43.5 0.678 23 0
80 1 74 11 60 30 0.527 22 0
80 3 82 31 70 34.2 1.292 27 1
81 1 72 18 40 26.6 0.283 24 0
81 3 86 16 66 27.5 0.306 22 0
81 2 72 15 76 30.1 0.547 25 0
81 1 74 41 57 46.3 1.096 32 0
81 7 78 40 48 46.7 0.261 42 0
82 1 64 13 95 21.2 0.415 23 0
82 2 52 22 115 28.5 1.699 25 0
83 7 78 26 71 29.3 0.767 36 0
83 2 66 23 50 32.2 0.497 22 0
83 3 58 31 18 34.3 0.336 25 0
83 2 65 28 66 36.8 0.629 24 0
84 2 50 23 76 30.4 0.968 21 0
84 3 68 30 106 31.9 0.591 25 0
84 0 64 22 66 35.8 0.545 21 0
84 1 64 23 115 36.9 0.471 28 0
84 0 82 31 125 38.2 0.233 23 0
84 4 90 23 56 39.5 0.159 25 0
85 4 58 22 49 27.8 0.306 28 0
86 5 68 28 71 30.2 0.364 24 0
86 1 66 52 65 41.3 0.917 29 0
87 2 58 16 52 32.7 0.166 25 0
87 1 78 27 32 34.6 0.101 22 0
87 1 60 37 75 37.2 0.509 22 0
87 1 68 34 77 37.6 0.401 24 0
88 5 66 21 23 24.4 0.342 30 0
88 3 58 11 54 24.8 0.267 22 0
88 2 58 26 16 28.4 0.766 22 0
88 2 74 19 53 29 0.229 22 0
88 1 62 24 44 29.9 0.422 23 0
88 1 78 29 76 32 0.365 29 0
88 12 74 40 54 35.3 0.378 48 0
88 1 30 42 99 55 0.496 26 1
89 1 24 19 25 27.8 0.559 21 0
89 1 66 23 94 28.1 0.167 21 0
89 3 74 16 85 30.4 0.551 38 0
89 1 76 34 37 31.2 0.192 23 0
90 2 80 14 55 24.4 0.249 24 0
90 1 62 18 59 25.1 1.268 25 0
90 1 62 12 43 27.2 0.58 24 0
90 4 88 47 54 37.7 0.362 29 0
91 1 54 25 100 25.2 0.234 23 0
91 4 70 32 88 33.1 0.446 22 0
91 0 68 32 210 39.9 0.381 25 0
92 1 62 25 41 19.5 0.482 25 0
92 12 62 7 258 27.6 0.926 44 1
92 6 62 32 126 32 0.085 46 0
93 0 60 25 92 28.7 0.532 22 0
93 6 50 30 64 28.7 0.356 23 0
93 2 64 32 160 38 0.674 23 1
93 0 100 39 72 43.4 1.021 35 0
94 2 68 18 76 26 0.561 21 0
94 2 76 18 66 31.6 0.649 23 0
94 7 64 25 79 33.3 0.738 41 0
94 0 70 27 115 43.5 0.347 21 0
95 1 66 13 38 19.6 0.334 25 0
95 1 60 18 58 23.9 0.26 22 0
95 1 74 21 73 25.9 0.673 36 0
95 2 54 14 88 26.1 0.748 22 0
95 1 82 25 180 35 0.233 43 1
95 0 80 45 92 36.5 0.33 26 0
95 0 85 25 36 37.4 0.247 24 1
95 0 64 39 105 44.6 0.366 22 0
96 4 56 17 49 20.8 0.34 26 0
96 2 68 13 49 21.1 0.647 26 0
96 3 56 34 115 24.7 0.944 39 0
96 1 64 27 87 33.2 0.289 21 0
96 5 74 18 67 33.6 0.997 43 0
97 1 64 19 82 18.2 0.299 21 0
97 1 66 15 140 23.2 0.487 22 0
97 0 64 36 100 36.8 0.6 25 0
97 7 76 32 91 40.9 0.871 32 1
98 0 82 15 84 25.2 0.299 22 0
98 6 58 33 190 34 0.43 43 0
98 2 60 17 120 34.7 0.198 22 0
99 3 80 11 64 19.3 0.284 30 0
99 2 70 16 44 20.4 0.235 27 0
99 3 62 19 74 21.8 0.279 26 0
99 4 76 15 51 23.2 0.223 21 0
99 2 52 15 94 24.6 0.637 21 0
99 3 54 19 86 25.6 0.154 24 0
99 6 60 19 54 26.9 0.497 32 0
99 5 54 28 83 34 0.499 30 0
99 2 60 17 160 36.6 0.453 21 0
99 1 72 30 18 38.6 0.412 21 0
100 1 74 12 46 19.5 0.149 28 0
100 1 66 15 56 23.6 0.666 26 0
100 1 72 12 70 25.3 0.658 28 0
100 12 84 33 105 30 0.488 46 0
100 0 70 26 50 30.8 0.597 21 0
100 3 68 23 81 31.6 0.949 28 0
100 1 66 29 196 32 0.444 42 0
100 2 66 20 90 32.9 0.867 28 1
100 14 78 25 184 36.6 0.412 46 1
100 2 54 28 105 37.8 0.498 24 0
100 2 68 25 71 38.5 0.324 26 0
100 8 74 40 215 39.4 0.661 43 1
100 2 70 52 57 40.5 0.677 25 0
100 0 88 60 110 46.8 0.962 31 0
101 2 58 35 90 21.8 0.155 22 0
101 2 58 17 265 24.2 0.614 23 0
101 1 50 15 36 24.2 0.526 26 0
101 10 76 48 180 32.9 0.171 63 0
102 0 86 17 105 29.3 0.695 27 0
102 3 44 20 94 30.8 0.4 26 0
102 0 78 40 90 34.5 0.238 24 0
102 7 74 40 105 37.2 0.204 45 0
102 0 64 46 78 40.6 0.496 21 0
102 2 86 36 120 45.5 0.127 23 1
103 1 80 11 82 19.4 0.491 22 0
103 4 60 33 192 24 0.966 33 0
103 3 72 30 152 27.6 0.73 27 0
103 6 72 32 190 37.7 0.324 55 0
103 1 30 38 83 43.3 0.183 33 0
104 0 64 23 116 27.8 0.454 23 0
104 6 74 18 156 29.9 0.722 41 1
104 0 64 37 64 33.6 0.51 22 1
105 6 70 32 68 30.8 0.122 37 0
105 2 80 45 191 33.7 0.711 29 1
105 2 58 40 94 34.9 0.225 25 0
105 5 72 29 325 36.9 0.159 28 0
105 0 64 41 142 41.5 0.173 22 0
106 2 56 27 165 29 0.426 22 0
106 2 64 35 119 30.5 1.4 34 0
106 3 54 21 158 30.9 0.292 24 0
106 1 70 28 135 34.2 0.142 22 0
106 0 70 37 148 39.4 0.605 22 0
107 3 62 13 48 22.9 0.678 23 1
107 1 72 30 82 30.8 0.821 24 0
107 2 74 30 100 33.6 0.404 23 0
107 0 62 30 74 36.6 0.757 25 1
108 6 44 20 130 24 0.813 35 0
108 2 62 32 56 25.2 0.128 21 0
108 2 62 10 278 25.3 0.881 22 0
108 2 52 26 63 32.5 0.318 22 0
108 1 60 46 178 35.5 0.415 24 0
108 5 72 43 75 36.1 0.263 33 0
109 1 38 18 120 23.1 0.407 26 0
109 1 56 21 135 25.2 0.833 23 0
109 1 60 8 182 25.4 0.947 21 0
109 8 76 39 114 27.9 0.64 31 1
109 1 58 18 116 28.5 0.219 22 0
109 4 64 44 99 34.8 0.905 26 1
109 5 62 41 129 35.8 0.514 25 1
110 4 76 20 100 28.4 0.118 27 0
110 2 74 29 125 32.4 0.698 27 0
111 1 62 13 182 24 0.138 23 0
111 3 90 12 78 28.4 0.495 29 0
111 3 58 31 44 29.5 0.43 22 0
111 4 72 47 207 37.1 1.39 56 1
112 2 68 22 94 34.1 0.315 26 0
112 9 82 32 175 34.2 0.26 36 1
112 1 72 30 176 34.4 0.528 25 0
112 1 80 45 132 34.8 0.217 24 0
112 2 86 42 160 38.4 0.246 28 0
112 2 78 50 140 39.4 0.175 24 0
113 3 50 10 85 29.5 0.626 25 0
114 7 76 17 110 23.8 0.466 31 0
114 1 66 36 200 38.1 0.289 21 0
114 0 80 34 285 44.2 0.167 27 0
115 1 70 30 96 34.6 0.529 32 1
115 3 66 39 140 38.1 0.15 28 0
116 4 72 12 87 22.1 0.463 37 0
116 3 74 15 105 26.3 0.107 24 0
116 1 78 29 180 36.1 0.496 25 0
117 2 90 19 71 25.2 0.313 21 0
117 0 66 31 188 30.8 0.493 22 0
117 4 64 27 120 33.2 0.23 24 0
117 1 60 23 106 33.8 0.466 27 0
117 1 88 24 145 34.5 0.403 40 1
117 5 86 30 105 39.1 0.251 42 0
117 0 80 31 53 45.2 0.089 24 0
118 1 58 36 94 33.3 0.261 23 0
118 0 84 47 230 45.8 0.551 31 1
119 1 54 13 50 22.3 0.205 24 0
119 6 50 22 176 27.1 1.318 33 1
119 0 64 18 92 34.9 0.725 23 0
119 1 44 47 63 35.5 0.28 25 0
119 1 88 41 170 45.3 0.507 26 0
119 1 86 39 220 45.6 0.808 29 1
120 9 72 22 56 20.8 0.733 48 0
120 0 74 18 63 30.5 0.285 26 0
120 1 80 48 200 38.9 1.162 41 0
120 2 76 37 105 39.7 0.215 29 0
120 11 80 37 150 42.3 0.785 48 1
120 3 70 30 135 42.9 0.452 30 0
121 5 72 23 112 26.2 0.245 30 0
121 0 66 30 165 34.3 0.203 33 1
121 1 78 39 74 39 0.261 28 0
121 2 70 32 95 39.1 0.886 23 0
122 2 60 18 106 29.8 0.717 22 0
122 1 64 32 156 35.1 0.692 30 1
122 2 76 27 200 35.9 0.483 26 0
122 2 52 43 158 36.2 0.816 28 0
122 1 90 51 220 49.7 0.325 31 1
123 4 80 15 176 32 0.443 34 0
123 9 70 44 94 33.1 0.374 40 0
123 6 72 45 230 33.6 0.733 34 0
123 5 74 40 77 34.1 0.269 28 0
123 2 48 32 165 42.1 0.52 26 0
123 3 100 35 240 57.3 0.88 22 0
124 0 56 13 105 21.8 0.452 21 0
124 7 70 33 215 25.5 0.161 37 0
124 8 76 24 600 28.7 0.687 52 1
124 2 68 28 205 32.9 0.875 30 1
124 3 80 33 130 33.2 0.305 26 0
124 9 70 33 402 35.4 0.282 34 0
125 1 70 24 110 24.3 0.221 25 0
125 4 70 18 122 28.9 1.144 45 1
125 6 68 30 120 30 0.464 32 0
125 10 70 26 115 31.1 0.205 41 1
125 1 50 40 167 33.3 0.962 28 1
125 2 60 20 140 33.8 0.088 31 0
126 8 74 38 75 25.9 0.162 39 0
126 0 86 27 120 27.4 0.515 21 0
126 1 56 29 152 28.7 0.801 21 0
126 5 78 27 22 29.6 0.439 40 0
126 0 84 29 215 30.7 0.52 24 0
126 8 88 36 108 38.5 0.349 49 0
126 3 88 41 235 39.3 0.704 27 0
127 2 58 24 275 27.7 1.6 25 0
127 2 46 21 335 34.4 0.176 22 0
127 4 88 11 155 34.5 0.598 28 0
127 0 80 37 210 36.3 0.804 23 0
128 1 82 17 183 27.5 0.115 22 0
128 0 68 19 180 30.5 1.391 25 1
128 1 98 41 58 32 1.321 33 1
128 3 72 25 190 32.4 0.549 27 1
128 1 88 39 110 36.5 1.057 37 1
128 1 48 45 194 40.5 0.613 24 1
128 2 78 37 182 43.3 1.224 31 1
129 6 90 7 326 19.6 0.582 60 0
129 3 64 29 115 26.4 0.219 28 1
129 4 60 12 231 27.5 0.527 31 0
129 2 74 26 205 33.2 0.591 25 0
129 4 86 20 270 35.1 0.231 23 0
129 10 76 28 122 35.9 0.28 39 0
129 3 92 49 155 36.4 0.968 32 1
129 7 68 49 125 38.5 0.439 43 1
129 0 110 46 130 67.1 0.319 26 1
130 1 70 13 105 25.9 0.472 22 0
130 3 78 23 79 28.4 0.323 34 1
130 1 60 23 170 28.6 0.692 21 0
131 1 64 14 415 23.7 0.389 21 0
131 4 68 21 166 33.1 0.16 28 0
133 7 88 15 155 32.4 0.262 37 0
133 1 102 28 140 32.8 0.234 45 1
134 9 74 33 60 25.9 0.46 81 0
134 0 58 20 291 26.4 0.352 21 0
134 6 70 23 130 35.4 0.542 29 1
134 6 80 37 370 46.2 0.238 46 1
135 0 94 46 145 40.6 0.284 26 0
135 0 68 42 250 42.3 0.365 24 1
136 7 74 26 135 26 0.647 51 0
136 11 84 35 130 28.3 0.26 42 1
136 5 84 41 88 35 0.286 35 1
136 15 70 32 110 37.1 0.153 43 1
136 1 74 50 204 37.4 0.399 24 0
137 0 68 14 148 24.8 0.143 21 0
137 0 40 35 168 43.1 2.288 33 1
138 0 60 35 167 34.6 0.534 21 1
138 11 74 26 144 36.1 0.557 50 1
139 0 62 17 210 22.1 0.207 21 0
139 5 64 35 140 28.6 0.411 26 0
139 1 46 19 83 28.7 0.654 22 0
139 5 80 35 160 31.6 0.361 25 1
139 1 62 41 480 40.7 0.536 21 0
140 1 74 26 180 24.1 0.828 23 0
140 12 82 43 325 39.2 0.528 58 1
140 0 65 26 130 42.6 0.431 24 1
141 2 58 34 128 25.4 0.699 24 0
142 2 82 18 64 24.7 0.761 21 0
142 7 60 33 190 28.8 0.687 61 0
142 7 90 24 480 30.4 0.128 43 1
143 1 74 22 61 26.2 0.256 21 0
143 1 86 30 330 30.1 0.892 23 0
143 11 94 33 146 36.6 0.254 51 1
143 1 84 23 310 42.4 1.076 22 0
144 4 58 28 140 29.5 0.287 37 0
144 2 58 33 135 31.6 0.422 25 1
144 5 82 26 285 32 0.452 58 1
144 6 72 27 228 33.9 0.255 40 0
144 1 82 46 180 46.1 0.335 46 1
145 13 82 19 110 22.2 0.245 57 0
145 9 88 34 165 30.3 0.771 53 1
145 9 80 46 130 37.9 0.637 40 1
146 2 70 38 360 28 0.337 29 1
146 4 85 27 100 28.9 0.189 27 0
146 2 76 35 194 38.2 0.329 29 0
147 4 74 25 293 34.9 0.385 30 0
148 4 60 27 318 30.9 0.15 29 1
148 10 84 48 237 37.6 1.001 51 1
149 1 68 29 127 29.3 0.349 42 1
150 7 66 42 342 34.7 0.718 42 0
150 7 78 29 126 35.2 0.692 54 1
151 6 62 31 120 35.5 0.692 28 0
151 12 70 40 271 41.8 0.742 38 1
151 8 78 32 210 42.9 0.516 36 1
152 13 90 33 29 26.8 0.731 43 1
152 9 78 34 171 34.2 0.893 33 1
152 0 82 39 272 41.5 0.27 27 0
153 1 82 42 485 40.6 0.687 23 0
153 13 88 37 140 40.6 1.174 39 0
154 6 74 32 193 29.3 0.839 39 0
154 9 78 30 100 30.9 0.164 45 0
154 4 72 29 126 31.3 0.338 37 0
154 4 62 31 284 32.8 0.237 23 0
154 6 78 41 140 46.1 0.571 27 0
155 2 74 17 96 26.6 0.433 27 1
155 11 76 28 150 33.3 1.353 51 1
155 8 62 26 495 34 0.543 46 1
155 2 52 27 540 38.7 0.24 25 1
155 5 84 44 545 38.7 0.619 34 0
156 9 86 28 155 34.3 1.189 42 1
157 1 72 21 168 25.6 0.123 24 0
157 2 74 35 440 39.4 0.134 30 0
158 3 64 13 387 31.2 0.295 24 0
158 3 76 36 245 31.6 0.851 28 1
158 3 70 30 328 35.5 0.344 35 1
158 5 84 41 210 39.4 0.395 29 1
160 7 54 32 175 30.5 0.588 39 1
161 10 68 23 132 25.5 0.326 47 1
162 0 76 56 100 53.2 0.759 25 1
163 3 70 18 105 31.6 0.268 28 1
163 17 72 41 114 40.9 0.817 47 1
164 1 82 43 67 32.8 0.341 50 0
165 6 68 26 168 33.6 0.631 49 0
165 0 76 43 255 47.9 0.259 26 0
165 0 90 33 680 52.3 0.427 23 0
166 5 72 19 175 25.8 0.587 51 1
167 1 74 17 144 23.4 0.447 33 1
167 8 106 46 231 37.6 0.165 43 1
168 7 88 42 321 38.2 0.787 40 1
169 3 74 19 125 29.9 0.268 31 1
170 3 64 37 225 34.5 0.356 30 1
171 3 72 33 135 33.3 0.199 24 1
171 9 110 24 240 45.4 0.721 54 1
172 1 68 49 579 42.4 0.702 28 1
173 4 70 14 168 29.7 0.361 33 1
173 3 78 39 185 33.8 0.97 31 1
173 3 84 33 474 35.7 0.258 22 1
173 3 82 48 465 38.4 2.137 25 1
173 0 78 32 265 46.5 1.159 58 0
174 3 58 22 194 32.9 0.593 36 1
174 2 88 37 120 44.5 0.646 24 1
176 3 86 27 156 33.3 1.154 52 1
176 8 90 34 300 33.7 0.467 58 1
177 0 60 29 478 34.6 1.072 21 1
179 8 72 42 130 32.7 0.719 36 1
179 0 50 36 159 37.8 0.455 22 1
180 3 64 25 70 34 0.271 26 0
180 0 90 26 90 36.5 0.314 35 1
180 0 78 63 14 59.4 2.42 25 1
181 8 68 36 495 30.1 0.615 60 1
181 1 64 30 180 34.1 0.328 38 1
181 7 84 21 192 35.9 0.586 51 1
181 1 78 42 293 40 1.258 22 1
181 0 88 44 510 43.3 0.222 26 1
184 4 78 39 277 37 0.264 31 1
186 8 90 35 225 34.5 0.423 37 1
187 7 50 33 392 33.9 0.826 34 1
187 3 70 22 200 36.4 0.408 36 1
187 7 68 39 304 37.7 0.254 41 1
187 5 76 27 207 43.6 1.034 53 1
188 0 82 14 185 32 0.682 22 1
189 1 60 23 846 30.1 0.398 59 1
189 5 64 33 325 31.2 0.583 29 1
191 3 68 15 130 30.9 0.299 34 0
193 1 50 16 375 25.9 0.655 24 0
195 7 70 33 145 25.1 0.163 55 1
196 1 76 36 249 36.5 0.875 29 1
196 8 76 29 280 37.5 0.605 57 1
197 2 70 45 543 30.5 0.158 53 1
197 4 70 39 744 36.7 2.329 31 0
198 0 66 32 274 41.3 0.502 28 1

From the tabulated figures:
(1) Generate two random numbers between 2 and 7 and provide SPSS output.
(1 mark)
(2) Using SPSS, erase columns corresponding to your generated numbers (e.g. if
one of the generated numbers is 5 then erase column C5, etc). Describe how you did
this and provide the sequence of actions (e.g. Calc->Descriptive Stats->….)
(2 mark)
(3) Using SPSS select a random sample of 300 observations (n = 300) from your
dataset. Provide the sequence of actions of how you did this.
(1 mark)
Your unique dataset will now consist of 300 rows and seven columns including
Glucose, Age and Outcome.
Investigating your unique dataset
(4) For your unique dataset summarise information about your observations and present
graphically the frequency distributions for all variables that are left in your unique
dataset including Glucose but excluding Outcome variables. Comment on unusual
observations and make your own decision, how to deal with them.
(6 marks)
(5) Using SPSS, define a new variable, Age_Group, by combining observations
for participants younger than 30 into group 1 and all others (of age 30 and older) into
group 2. Provide either a description or a screen shot of how you did this.
(3 marks)
(6) Investigate whether there is a significant difference in mean/median Glucose
concentration between age groups. Formulate the null and alternative hypotheses;
choose, justify and perform an appropriate statistical test using SPSS; provide all
SPSS outputs; write your conclusions.
(10 marks)
(7) Show whether the proportion of participants with Glucose concentration greater
than 100 mg/dl is different between age groups that you defined previously. Formulate
the null and alternative hypotheses; choose, justify and perform an appropriate
statistical test using SPSS; provide all SPSS outputs; write your conclusions.
(10 marks)
(8) Using SPSS, produce a table of correlation coefficients. Justify the choice of
correlation coefficient, investigate the resulting table and comment on most interesting
relationships between chosen variables. Do not use Glucose and Outcome variables in
this analysis.
(4 marks)
(9) Using simple linear regression, model Glucose concentration by one of the
variables of your choice that are available in your unique dataset. Comment on
significance of intercept and slope.
(4 marks)
(10) Fit a multiple regression model with Glucose being a response variable and other
five variables excluding Outcome as predictors. Treat variable Pregnancies as an
interval scale data. Identify insignificant predictors in the model and explain why they
are insignificant.
(4 marks)
(11) Cluster your 300 observation into 10 groups using one of the linkage method and
similarity measure from the corresponding drop-down menus. Give a brief (half a page)
description of the linkage method and similarity measure chosen. Show a dendrogram
with cases labelled by Outcome. Comment on the results obtained. Provide all
SPSS outputs.
(6 marks)
(12) It is known that the incidence of diabetes in the UK is 0.6. In a small northern
village of 100 people isolated from the mainland for six months per year the pharmacy
wants to know how many insulin shots to order. We want to know what is the
probability that between A and B people will develop the disease during this period. To
perform analysis, generate two random numbers between 0 and 100 using SPSS
and paste the outputs into your report. Denote by A the smallest number and by B the
largest number out of these two generated numbers. Calculate the probability that
between A and B people develop the disease and how many shots should be ordered.
(9 marks)

Statistical Analysis of Data Using SPSS

Statistical Analysis of Data Using SPSS
Introduction and dataset
The aim of this coursework is to investigate and predict the onset of diabetes based on
various diagnostic measurements.
The dataset was originally compiled by researcher at the Johns Hopkins University
School of Medicine, from a larger database owned by the National Institute of Diabetes
and Digestive and Kidney Diseases. All patients were females at least 21 years old of
Pima Indian heritage. Note that Pima Indians have one of the highest rates of diabetes
in the world.
This dataset includes 392 observations, taken at the individual level and available from
diabetes_dataset.xlsx file in Statistical Data Analysis Coursework folder on NOW.
The key indicator of diabetes (response variable), as defined by the World Health
Organization, is a plasma glucose concentration greater than 200 mg/dl two hours
following ingestion of a 75 gm carbohydrate solution (variable Glucose).

Glucose Pregnancies BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
56 2 56 28 45 24.2 0.332 22 0
68 2 62 13 15 20.1 0.257 23 0
68 2 70 32 66 25 0.187 25 0
68 10 106 23 49 35.5 0.285 47 0
71 1 48 18 76 20.4 0.323 22 0
71 1 78 50 45 33.2 0.422 21 0
74 0 52 10 36 27.8 0.269 22 0
74 3 68 28 45 29.7 0.293 23 0
74 8 70 40 49 35.3 0.705 39 0
75 2 64 24 55 29.7 0.37 33 0
77 1 56 30 56 33.3 1.251 24 0
77 5 82 41 42 35.8 0.156 35 0
78 3 50 32 88 31 0.248 26 1
78 0 88 29 40 36.9 0.434 21 0
79 1 80 25 37 25.4 0.583 22 0
79 1 60 42 48 43.5 0.678 23 0
80 1 74 11 60 30 0.527 22 0
80 3 82 31 70 34.2 1.292 27 1
81 1 72 18 40 26.6 0.283 24 0
81 3 86 16 66 27.5 0.306 22 0
81 2 72 15 76 30.1 0.547 25 0
81 1 74 41 57 46.3 1.096 32 0
81 7 78 40 48 46.7 0.261 42 0
82 1 64 13 95 21.2 0.415 23 0
82 2 52 22 115 28.5 1.699 25 0
83 7 78 26 71 29.3 0.767 36 0
83 2 66 23 50 32.2 0.497 22 0
83 3 58 31 18 34.3 0.336 25 0
83 2 65 28 66 36.8 0.629 24 0
84 2 50 23 76 30.4 0.968 21 0
84 3 68 30 106 31.9 0.591 25 0
84 0 64 22 66 35.8 0.545 21 0
84 1 64 23 115 36.9 0.471 28 0
84 0 82 31 125 38.2 0.233 23 0
84 4 90 23 56 39.5 0.159 25 0
85 4 58 22 49 27.8 0.306 28 0
86 5 68 28 71 30.2 0.364 24 0
86 1 66 52 65 41.3 0.917 29 0
87 2 58 16 52 32.7 0.166 25 0
87 1 78 27 32 34.6 0.101 22 0
87 1 60 37 75 37.2 0.509 22 0
87 1 68 34 77 37.6 0.401 24 0
88 5 66 21 23 24.4 0.342 30 0
88 3 58 11 54 24.8 0.267 22 0
88 2 58 26 16 28.4 0.766 22 0
88 2 74 19 53 29 0.229 22 0
88 1 62 24 44 29.9 0.422 23 0
88 1 78 29 76 32 0.365 29 0
88 12 74 40 54 35.3 0.378 48 0
88 1 30 42 99 55 0.496 26 1
89 1 24 19 25 27.8 0.559 21 0
89 1 66 23 94 28.1 0.167 21 0
89 3 74 16 85 30.4 0.551 38 0
89 1 76 34 37 31.2 0.192 23 0
90 2 80 14 55 24.4 0.249 24 0
90 1 62 18 59 25.1 1.268 25 0
90 1 62 12 43 27.2 0.58 24 0
90 4 88 47 54 37.7 0.362 29 0
91 1 54 25 100 25.2 0.234 23 0
91 4 70 32 88 33.1 0.446 22 0
91 0 68 32 210 39.9 0.381 25 0
92 1 62 25 41 19.5 0.482 25 0
92 12 62 7 258 27.6 0.926 44 1
92 6 62 32 126 32 0.085 46 0
93 0 60 25 92 28.7 0.532 22 0
93 6 50 30 64 28.7 0.356 23 0
93 2 64 32 160 38 0.674 23 1
93 0 100 39 72 43.4 1.021 35 0
94 2 68 18 76 26 0.561 21 0
94 2 76 18 66 31.6 0.649 23 0
94 7 64 25 79 33.3 0.738 41 0
94 0 70 27 115 43.5 0.347 21 0
95 1 66 13 38 19.6 0.334 25 0
95 1 60 18 58 23.9 0.26 22 0
95 1 74 21 73 25.9 0.673 36 0
95 2 54 14 88 26.1 0.748 22 0
95 1 82 25 180 35 0.233 43 1
95 0 80 45 92 36.5 0.33 26 0
95 0 85 25 36 37.4 0.247 24 1
95 0 64 39 105 44.6 0.366 22 0
96 4 56 17 49 20.8 0.34 26 0
96 2 68 13 49 21.1 0.647 26 0
96 3 56 34 115 24.7 0.944 39 0
96 1 64 27 87 33.2 0.289 21 0
96 5 74 18 67 33.6 0.997 43 0
97 1 64 19 82 18.2 0.299 21 0
97 1 66 15 140 23.2 0.487 22 0
97 0 64 36 100 36.8 0.6 25 0
97 7 76 32 91 40.9 0.871 32 1
98 0 82 15 84 25.2 0.299 22 0
98 6 58 33 190 34 0.43 43 0
98 2 60 17 120 34.7 0.198 22 0
99 3 80 11 64 19.3 0.284 30 0
99 2 70 16 44 20.4 0.235 27 0
99 3 62 19 74 21.8 0.279 26 0
99 4 76 15 51 23.2 0.223 21 0
99 2 52 15 94 24.6 0.637 21 0
99 3 54 19 86 25.6 0.154 24 0
99 6 60 19 54 26.9 0.497 32 0
99 5 54 28 83 34 0.499 30 0
99 2 60 17 160 36.6 0.453 21 0
99 1 72 30 18 38.6 0.412 21 0
100 1 74 12 46 19.5 0.149 28 0
100 1 66 15 56 23.6 0.666 26 0
100 1 72 12 70 25.3 0.658 28 0
100 12 84 33 105 30 0.488 46 0
100 0 70 26 50 30.8 0.597 21 0
100 3 68 23 81 31.6 0.949 28 0
100 1 66 29 196 32 0.444 42 0
100 2 66 20 90 32.9 0.867 28 1
100 14 78 25 184 36.6 0.412 46 1
100 2 54 28 105 37.8 0.498 24 0
100 2 68 25 71 38.5 0.324 26 0
100 8 74 40 215 39.4 0.661 43 1
100 2 70 52 57 40.5 0.677 25 0
100 0 88 60 110 46.8 0.962 31 0
101 2 58 35 90 21.8 0.155 22 0
101 2 58 17 265 24.2 0.614 23 0
101 1 50 15 36 24.2 0.526 26 0
101 10 76 48 180 32.9 0.171 63 0
102 0 86 17 105 29.3 0.695 27 0
102 3 44 20 94 30.8 0.4 26 0
102 0 78 40 90 34.5 0.238 24 0
102 7 74 40 105 37.2 0.204 45 0
102 0 64 46 78 40.6 0.496 21 0
102 2 86 36 120 45.5 0.127 23 1
103 1 80 11 82 19.4 0.491 22 0
103 4 60 33 192 24 0.966 33 0
103 3 72 30 152 27.6 0.73 27 0
103 6 72 32 190 37.7 0.324 55 0
103 1 30 38 83 43.3 0.183 33 0
104 0 64 23 116 27.8 0.454 23 0
104 6 74 18 156 29.9 0.722 41 1
104 0 64 37 64 33.6 0.51 22 1
105 6 70 32 68 30.8 0.122 37 0
105 2 80 45 191 33.7 0.711 29 1
105 2 58 40 94 34.9 0.225 25 0
105 5 72 29 325 36.9 0.159 28 0
105 0 64 41 142 41.5 0.173 22 0
106 2 56 27 165 29 0.426 22 0
106 2 64 35 119 30.5 1.4 34 0
106 3 54 21 158 30.9 0.292 24 0
106 1 70 28 135 34.2 0.142 22 0
106 0 70 37 148 39.4 0.605 22 0
107 3 62 13 48 22.9 0.678 23 1
107 1 72 30 82 30.8 0.821 24 0
107 2 74 30 100 33.6 0.404 23 0
107 0 62 30 74 36.6 0.757 25 1
108 6 44 20 130 24 0.813 35 0
108 2 62 32 56 25.2 0.128 21 0
108 2 62 10 278 25.3 0.881 22 0
108 2 52 26 63 32.5 0.318 22 0
108 1 60 46 178 35.5 0.415 24 0
108 5 72 43 75 36.1 0.263 33 0
109 1 38 18 120 23.1 0.407 26 0
109 1 56 21 135 25.2 0.833 23 0
109 1 60 8 182 25.4 0.947 21 0
109 8 76 39 114 27.9 0.64 31 1
109 1 58 18 116 28.5 0.219 22 0
109 4 64 44 99 34.8 0.905 26 1
109 5 62 41 129 35.8 0.514 25 1
110 4 76 20 100 28.4 0.118 27 0
110 2 74 29 125 32.4 0.698 27 0
111 1 62 13 182 24 0.138 23 0
111 3 90 12 78 28.4 0.495 29 0
111 3 58 31 44 29.5 0.43 22 0
111 4 72 47 207 37.1 1.39 56 1
112 2 68 22 94 34.1 0.315 26 0
112 9 82 32 175 34.2 0.26 36 1
112 1 72 30 176 34.4 0.528 25 0
112 1 80 45 132 34.8 0.217 24 0
112 2 86 42 160 38.4 0.246 28 0
112 2 78 50 140 39.4 0.175 24 0
113 3 50 10 85 29.5 0.626 25 0
114 7 76 17 110 23.8 0.466 31 0
114 1 66 36 200 38.1 0.289 21 0
114 0 80 34 285 44.2 0.167 27 0
115 1 70 30 96 34.6 0.529 32 1
115 3 66 39 140 38.1 0.15 28 0
116 4 72 12 87 22.1 0.463 37 0
116 3 74 15 105 26.3 0.107 24 0
116 1 78 29 180 36.1 0.496 25 0
117 2 90 19 71 25.2 0.313 21 0
117 0 66 31 188 30.8 0.493 22 0
117 4 64 27 120 33.2 0.23 24 0
117 1 60 23 106 33.8 0.466 27 0
117 1 88 24 145 34.5 0.403 40 1
117 5 86 30 105 39.1 0.251 42 0
117 0 80 31 53 45.2 0.089 24 0
118 1 58 36 94 33.3 0.261 23 0
118 0 84 47 230 45.8 0.551 31 1
119 1 54 13 50 22.3 0.205 24 0
119 6 50 22 176 27.1 1.318 33 1
119 0 64 18 92 34.9 0.725 23 0
119 1 44 47 63 35.5 0.28 25 0
119 1 88 41 170 45.3 0.507 26 0
119 1 86 39 220 45.6 0.808 29 1
120 9 72 22 56 20.8 0.733 48 0
120 0 74 18 63 30.5 0.285 26 0
120 1 80 48 200 38.9 1.162 41 0
120 2 76 37 105 39.7 0.215 29 0
120 11 80 37 150 42.3 0.785 48 1
120 3 70 30 135 42.9 0.452 30 0
121 5 72 23 112 26.2 0.245 30 0
121 0 66 30 165 34.3 0.203 33 1
121 1 78 39 74 39 0.261 28 0
121 2 70 32 95 39.1 0.886 23 0
122 2 60 18 106 29.8 0.717 22 0
122 1 64 32 156 35.1 0.692 30 1
122 2 76 27 200 35.9 0.483 26 0
122 2 52 43 158 36.2 0.816 28 0
122 1 90 51 220 49.7 0.325 31 1
123 4 80 15 176 32 0.443 34 0
123 9 70 44 94 33.1 0.374 40 0
123 6 72 45 230 33.6 0.733 34 0
123 5 74 40 77 34.1 0.269 28 0
123 2 48 32 165 42.1 0.52 26 0
123 3 100 35 240 57.3 0.88 22 0
124 0 56 13 105 21.8 0.452 21 0
124 7 70 33 215 25.5 0.161 37 0
124 8 76 24 600 28.7 0.687 52 1
124 2 68 28 205 32.9 0.875 30 1
124 3 80 33 130 33.2 0.305 26 0
124 9 70 33 402 35.4 0.282 34 0
125 1 70 24 110 24.3 0.221 25 0
125 4 70 18 122 28.9 1.144 45 1
125 6 68 30 120 30 0.464 32 0
125 10 70 26 115 31.1 0.205 41 1
125 1 50 40 167 33.3 0.962 28 1
125 2 60 20 140 33.8 0.088 31 0
126 8 74 38 75 25.9 0.162 39 0
126 0 86 27 120 27.4 0.515 21 0
126 1 56 29 152 28.7 0.801 21 0
126 5 78 27 22 29.6 0.439 40 0
126 0 84 29 215 30.7 0.52 24 0
126 8 88 36 108 38.5 0.349 49 0
126 3 88 41 235 39.3 0.704 27 0
127 2 58 24 275 27.7 1.6 25 0
127 2 46 21 335 34.4 0.176 22 0
127 4 88 11 155 34.5 0.598 28 0
127 0 80 37 210 36.3 0.804 23 0
128 1 82 17 183 27.5 0.115 22 0
128 0 68 19 180 30.5 1.391 25 1
128 1 98 41 58 32 1.321 33 1
128 3 72 25 190 32.4 0.549 27 1
128 1 88 39 110 36.5 1.057 37 1
128 1 48 45 194 40.5 0.613 24 1
128 2 78 37 182 43.3 1.224 31 1
129 6 90 7 326 19.6 0.582 60 0
129 3 64 29 115 26.4 0.219 28 1
129 4 60 12 231 27.5 0.527 31 0
129 2 74 26 205 33.2 0.591 25 0
129 4 86 20 270 35.1 0.231 23 0
129 10 76 28 122 35.9 0.28 39 0
129 3 92 49 155 36.4 0.968 32 1
129 7 68 49 125 38.5 0.439 43 1
129 0 110 46 130 67.1 0.319 26 1
130 1 70 13 105 25.9 0.472 22 0
130 3 78 23 79 28.4 0.323 34 1
130 1 60 23 170 28.6 0.692 21 0
131 1 64 14 415 23.7 0.389 21 0
131 4 68 21 166 33.1 0.16 28 0
133 7 88 15 155 32.4 0.262 37 0
133 1 102 28 140 32.8 0.234 45 1
134 9 74 33 60 25.9 0.46 81 0
134 0 58 20 291 26.4 0.352 21 0
134 6 70 23 130 35.4 0.542 29 1
134 6 80 37 370 46.2 0.238 46 1
135 0 94 46 145 40.6 0.284 26 0
135 0 68 42 250 42.3 0.365 24 1
136 7 74 26 135 26 0.647 51 0
136 11 84 35 130 28.3 0.26 42 1
136 5 84 41 88 35 0.286 35 1
136 15 70 32 110 37.1 0.153 43 1
136 1 74 50 204 37.4 0.399 24 0
137 0 68 14 148 24.8 0.143 21 0
137 0 40 35 168 43.1 2.288 33 1
138 0 60 35 167 34.6 0.534 21 1
138 11 74 26 144 36.1 0.557 50 1
139 0 62 17 210 22.1 0.207 21 0
139 5 64 35 140 28.6 0.411 26 0
139 1 46 19 83 28.7 0.654 22 0
139 5 80 35 160 31.6 0.361 25 1
139 1 62 41 480 40.7 0.536 21 0
140 1 74 26 180 24.1 0.828 23 0
140 12 82 43 325 39.2 0.528 58 1
140 0 65 26 130 42.6 0.431 24 1
141 2 58 34 128 25.4 0.699 24 0
142 2 82 18 64 24.7 0.761 21 0
142 7 60 33 190 28.8 0.687 61 0
142 7 90 24 480 30.4 0.128 43 1
143 1 74 22 61 26.2 0.256 21 0
143 1 86 30 330 30.1 0.892 23 0
143 11 94 33 146 36.6 0.254 51 1
143 1 84 23 310 42.4 1.076 22 0
144 4 58 28 140 29.5 0.287 37 0
144 2 58 33 135 31.6 0.422 25 1
144 5 82 26 285 32 0.452 58 1
144 6 72 27 228 33.9 0.255 40 0
144 1 82 46 180 46.1 0.335 46 1
145 13 82 19 110 22.2 0.245 57 0
145 9 88 34 165 30.3 0.771 53 1
145 9 80 46 130 37.9 0.637 40 1
146 2 70 38 360 28 0.337 29 1
146 4 85 27 100 28.9 0.189 27 0
146 2 76 35 194 38.2 0.329 29 0
147 4 74 25 293 34.9 0.385 30 0
148 4 60 27 318 30.9 0.15 29 1
148 10 84 48 237 37.6 1.001 51 1
149 1 68 29 127 29.3 0.349 42 1
150 7 66 42 342 34.7 0.718 42 0
150 7 78 29 126 35.2 0.692 54 1
151 6 62 31 120 35.5 0.692 28 0
151 12 70 40 271 41.8 0.742 38 1
151 8 78 32 210 42.9 0.516 36 1
152 13 90 33 29 26.8 0.731 43 1
152 9 78 34 171 34.2 0.893 33 1
152 0 82 39 272 41.5 0.27 27 0
153 1 82 42 485 40.6 0.687 23 0
153 13 88 37 140 40.6 1.174 39 0
154 6 74 32 193 29.3 0.839 39 0
154 9 78 30 100 30.9 0.164 45 0
154 4 72 29 126 31.3 0.338 37 0
154 4 62 31 284 32.8 0.237 23 0
154 6 78 41 140 46.1 0.571 27 0
155 2 74 17 96 26.6 0.433 27 1
155 11 76 28 150 33.3 1.353 51 1
155 8 62 26 495 34 0.543 46 1
155 2 52 27 540 38.7 0.24 25 1
155 5 84 44 545 38.7 0.619 34 0
156 9 86 28 155 34.3 1.189 42 1
157 1 72 21 168 25.6 0.123 24 0
157 2 74 35 440 39.4 0.134 30 0
158 3 64 13 387 31.2 0.295 24 0
158 3 76 36 245 31.6 0.851 28 1
158 3 70 30 328 35.5 0.344 35 1
158 5 84 41 210 39.4 0.395 29 1
160 7 54 32 175 30.5 0.588 39 1
161 10 68 23 132 25.5 0.326 47 1
162 0 76 56 100 53.2 0.759 25 1
163 3 70 18 105 31.6 0.268 28 1
163 17 72 41 114 40.9 0.817 47 1
164 1 82 43 67 32.8 0.341 50 0
165 6 68 26 168 33.6 0.631 49 0
165 0 76 43 255 47.9 0.259 26 0
165 0 90 33 680 52.3 0.427 23 0
166 5 72 19 175 25.8 0.587 51 1
167 1 74 17 144 23.4 0.447 33 1
167 8 106 46 231 37.6 0.165 43 1
168 7 88 42 321 38.2 0.787 40 1
169 3 74 19 125 29.9 0.268 31 1
170 3 64 37 225 34.5 0.356 30 1
171 3 72 33 135 33.3 0.199 24 1
171 9 110 24 240 45.4 0.721 54 1
172 1 68 49 579 42.4 0.702 28 1
173 4 70 14 168 29.7 0.361 33 1
173 3 78 39 185 33.8 0.97 31 1
173 3 84 33 474 35.7 0.258 22 1
173 3 82 48 465 38.4 2.137 25 1
173 0 78 32 265 46.5 1.159 58 0
174 3 58 22 194 32.9 0.593 36 1
174 2 88 37 120 44.5 0.646 24 1
176 3 86 27 156 33.3 1.154 52 1
176 8 90 34 300 33.7 0.467 58 1
177 0 60 29 478 34.6 1.072 21 1
179 8 72 42 130 32.7 0.719 36 1
179 0 50 36 159 37.8 0.455 22 1
180 3 64 25 70 34 0.271 26 0
180 0 90 26 90 36.5 0.314 35 1
180 0 78 63 14 59.4 2.42 25 1
181 8 68 36 495 30.1 0.615 60 1
181 1 64 30 180 34.1 0.328 38 1
181 7 84 21 192 35.9 0.586 51 1
181 1 78 42 293 40 1.258 22 1
181 0 88 44 510 43.3 0.222 26 1
184 4 78 39 277 37 0.264 31 1
186 8 90 35 225 34.5 0.423 37 1
187 7 50 33 392 33.9 0.826 34 1
187 3 70 22 200 36.4 0.408 36 1
187 7 68 39 304 37.7 0.254 41 1
187 5 76 27 207 43.6 1.034 53 1
188 0 82 14 185 32 0.682 22 1
189 1 60 23 846 30.1 0.398 59 1
189 5 64 33 325 31.2 0.583 29 1
191 3 68 15 130 30.9 0.299 34 0
193 1 50 16 375 25.9 0.655 24 0
195 7 70 33 145 25.1 0.163 55 1
196 1 76 36 249 36.5 0.875 29 1
196 8 76 29 280 37.5 0.605 57 1
197 2 70 45 543 30.5 0.158 53 1
197 4 70 39 744 36.7 2.329 31 0
198 0 66 32 274 41.3 0.502 28 1

From the tabulated figures:
(1) Generate two random numbers between 2 and 7 and provide SPSS output.
(1 mark)
(2) Using SPSS, erase columns corresponding to your generated numbers (e.g. if
one of the generated numbers is 5 then erase column C5, etc). Describe how you did
this and provide the sequence of actions (e.g. Calc->Descriptive Stats->….)
(2 mark)
(3) Using SPSS select a random sample of 300 observations (n = 300) from your
dataset. Provide the sequence of actions of how you did this.
(1 mark)
Your unique dataset will now consist of 300 rows and seven columns including
Glucose, Age and Outcome.
Investigating your unique dataset
(4) For your unique dataset summarise information about your observations and present
graphically the frequency distributions for all variables that are left in your unique
dataset including Glucose but excluding Outcome variables. Comment on unusual
observations and make your own decision, how to deal with them.
(6 marks)
(5) Using SPSS, define a new variable, Age_Group, by combining observations
for participants younger than 30 into group 1 and all others (of age 30 and older) into
group 2. Provide either a description or a screen shot of how you did this.
(3 marks)
(6) Investigate whether there is a significant difference in mean/median Glucose
concentration between age groups. Formulate the null and alternative hypotheses;
choose, justify and perform an appropriate statistical test using SPSS; provide all
SPSS outputs; write your conclusions.
(10 marks)
(7) Show whether the proportion of participants with Glucose concentration greater
than 100 mg/dl is different between age groups that you defined previously. Formulate
the null and alternative hypotheses; choose, justify and perform an appropriate
statistical test using SPSS; provide all SPSS outputs; write your conclusions.
(10 marks)
(8) Using SPSS, produce a table of correlation coefficients. Justify the choice of
correlation coefficient, investigate the resulting table and comment on most interesting
relationships between chosen variables. Do not use Glucose and Outcome variables in
this analysis.
(4 marks)
(9) Using simple linear regression, model Glucose concentration by one of the
variables of your choice that are available in your unique dataset. Comment on
significance of intercept and slope.
(4 marks)
(10) Fit a multiple regression model with Glucose being a response variable and other
five variables excluding Outcome as predictors. Treat variable Pregnancies as an
interval scale data. Identify insignificant predictors in the model and explain why they
are insignificant.
(4 marks)
(11) Cluster your 300 observation into 10 groups using one of the linkage method and
similarity measure from the corresponding drop-down menus. Give a brief (half a page)
description of the linkage method and similarity measure chosen. Show a dendrogram
with cases labelled by Outcome. Comment on the results obtained. Provide all
SPSS outputs.
(6 marks)
(12) It is known that the incidence of diabetes in the UK is 0.6. In a small northern
village of 100 people isolated from the mainland for six months per year the pharmacy
wants to know how many insulin shots to order. We want to know what is the
probability that between A and B people will develop the disease during this period. To
perform analysis, generate two random numbers between 0 and 100 using SPSS
and paste the outputs into your report. Denote by A the smallest number and by B the
largest number out of these two generated numbers. Calculate the probability that
between A and B people develop the disease and how many shots should be ordered.
(9 marks)

Categorical (Nominal) Dependent Variables – Logit (Logistic Regression)

  • Here is an introductory/survey video of Logit Analysis, which allows us to analyze nominal dependent variables. Regression only allow us to work with continuous variables.

Video: Introduction to Logit Analysis:
https://youtu.be/ANi_PpkTSJA
Note: This Extra Credit Assignment is a bit tougher than the other ones, so it is worth a bonus of up to 10% of the final grade  if you get everything right. The other assignments are worth 7% each.

  • Afterwatching the video, try this extra credit assignment:
    Prompt:
    Answer Part 1, Part 2, and Part 3. Given the following coefficients from a logit analysis, and the sample data values given for two respondents, calculate the probability of a person liking  a dark-colored imported car over a light-colored imported car. Your answers are probabilities. Show your work. Use Word or PDF format for submission to Turnitin.com (link below). You may need to hand-write the formula and show your work on paper, then photograph or scan it into a file. That’s OK, but typing it into Word is preferred, if you can figure it out.

The Dependent Variable (DV) is “Prefers Dark colored imported car.” This measure is labeled”PrefDark” in the data
= 0 if preference is for a light colored car,
= 1 if preference is for a dark-colored car.
Here are the Independent Variables  (IVs):
Age in years (no intervals – labeled “Age” in the data)
Gender (measure is labeled “Gender” in the data)
= 0 if male,
= 1 if female.
Education level (measure is labeled EducLevel in the data)
= 0 if completed high school only
= 1 if completed Associate’s degree (Community College)
= 2 if completed Undergraduate degree (BA or BS)
= 3 if completed a Graduate degree
Income per year (in Euros, measure is labeled Income))
Consider, also, these coefficients for each measure (data point), calculated by running a Logit analysis on the data sample for the DV, PrefDark:
Coefficients and Constant
Age             0.101
Gender        0.34
EducLevel  –5.1
Income        0.000142
Constant      3.22
Assume all coefficients and the constant are statistically significant (you can’t ignore them).
Part 1 (4 points):
Now consider this person, Respondent 1:
Age = 24
Gender = 1 (female)
EducLevel = 2 (Undergraduate degree)
Income/year =  Euros 38000
What is the probability this person prefers a dark-colored imported car?
Part 2 (4 points):
Additionally, consider this other person, Respondent 2:
54 year old male, with a graduate degree, earning Euros 58000 per year.
What is the probability this person prefers a dark-colored imported car?
Hint: Use the formula given in the video for calculating P(Yi=yi).
Show your work, please.
Part 3 (2 points)
Which Respondent has a higher probability of preferring a dark-colored car?
This is quite straightforward if you have Parts 1 and 2 correct.
 

Categorical (Nominal) Dependent Variables – Logit (Logistic Regression)

  • Here is an introductory/survey video of Logit Analysis, which allows us to analyze nominal dependent variables. Regression only allow us to work with continuous variables.

Video: Introduction to Logit Analysis:
https://youtu.be/ANi_PpkTSJA
Note: This Extra Credit Assignment is a bit tougher than the other ones, so it is worth a bonus of up to 10% of the final grade  if you get everything right. The other assignments are worth 7% each.

  • Afterwatching the video, try this extra credit assignment:
    Prompt:
    Answer Part 1, Part 2, and Part 3. Given the following coefficients from a logit analysis, and the sample data values given for two respondents, calculate the probability of a person liking  a dark-colored imported car over a light-colored imported car. Your answers are probabilities. Show your work. Use Word or PDF format for submission to Turnitin.com (link below). You may need to hand-write the formula and show your work on paper, then photograph or scan it into a file. That’s OK, but typing it into Word is preferred, if you can figure it out.

The Dependent Variable (DV) is “Prefers Dark colored imported car.” This measure is labeled”PrefDark” in the data
= 0 if preference is for a light colored car,
= 1 if preference is for a dark-colored car.
Here are the Independent Variables  (IVs):
Age in years (no intervals – labeled “Age” in the data)
Gender (measure is labeled “Gender” in the data)
= 0 if male,
= 1 if female.
Education level (measure is labeled EducLevel in the data)
= 0 if completed high school only
= 1 if completed Associate’s degree (Community College)
= 2 if completed Undergraduate degree (BA or BS)
= 3 if completed a Graduate degree
Income per year (in Euros, measure is labeled Income))
Consider, also, these coefficients for each measure (data point), calculated by running a Logit analysis on the data sample for the DV, PrefDark:
Coefficients and Constant
Age             0.101
Gender        0.34
EducLevel  –5.1
Income        0.000142
Constant      3.22
Assume all coefficients and the constant are statistically significant (you can’t ignore them).
Part 1 (4 points):
Now consider this person, Respondent 1:
Age = 24
Gender = 1 (female)
EducLevel = 2 (Undergraduate degree)
Income/year =  Euros 38000
What is the probability this person prefers a dark-colored imported car?
Part 2 (4 points):
Additionally, consider this other person, Respondent 2:
54 year old male, with a graduate degree, earning Euros 58000 per year.
What is the probability this person prefers a dark-colored imported car?
Hint: Use the formula given in the video for calculating P(Yi=yi).
Show your work, please.
Part 3 (2 points)
Which Respondent has a higher probability of preferring a dark-colored car?
This is quite straightforward if you have Parts 1 and 2 correct.
 

Regression Modeling

Assignment Content

  1. Purpose 
    This assignment provides an opportunity to develop, evaluate, and apply bivariate and multivariate linear regression models.
    Resources: Microsoft Excel®, DAT565_v3_Wk5_Data_File
    Instructions:
    The Excel file for this assignment contains a database with information about the tax assessment value assigned to medical office buildings in a city. The following is a list of the variables in the database:

    • FloorArea: square feet of floor space
    • Offices: number of offices in the building
    • Entrances: number of customer entrances
    • Age: age of the building (years)
    • AssessedValue: tax assessment value (thousands of dollars)
    • Use the data to construct a model that predicts the tax assessment value assigned to medical office buildings with specific characteristics.
    • Construct a scatter plot in Excel with FloorArea as the independent variable and AssessmentValue as the dependent variable. Insert the bivariate linear regression equation and r^2 in your graph. Do you observe a linear relationship between the 2 variables?
    • Use Excel’s Analysis ToolPak to conduct a regression analysis of FloorArea and AssessmentValue. Is FloorArea a significant predictor of AssessmentValue?
    • Construct a scatter plot in Excel with Age as the independent variable and AssessmentValue as the dependent variable. Insert the bivariate linear regression equation and r^2 in your graph. Do you observe a linear relationship between the 2 variables?
    • Use Excel’s Analysis ToolPak to conduct a regression analysis of Age and Assessment Value. Is Age a significant predictor of AssessmentValue?
    • Construct a multiple regression model.
    • Use Excel’s Analysis ToolPak to conduct a regression analysis with AssessmentValue as the dependent variable and FloorAreaOfficesEntrances, and Age as independent variables. What is the overall fit r^2? What is the adjusted r^2?
    • Which predictors are considered significant if we work with α=0.05? Which predictors can be eliminated?
    • What is the final model if we only use FloorArea and Offices as predictors?
    • Suppose our final model is:
    • AssessedValue = 115.9 + 0.26 x FloorArea + 78.34 x Offices
    • What wouldbe the assessed value of a medical office building with a floor area of 3500 sq. ft., 2 offices, that was built 15 years ago? Is this assessed value consistent with what appears in the database?
    • Submit your assignment.
      Resources
    • Center for Writing Excellence
    • Reference and Citation Generator
    • Grammar and Writing Guides

Correlational Analysis

Correlational Analysis
Correlational Analysis
Before beginning this assignment, please watch the following videos on correlation:Correlation –
The Basic Idea Explained
(Links to an external site.)
Correlation Basics
(Links to an external site.)
Study Description: A school educator is interested in determining the potential relationship
between grade point average (GPA) and IQ scores among ninth graders. The educator takes a
random sample of 30 ninth graders aged 14 years old and administers the Wechsler Intelligence
Scale for Children-Fourth Edition (WISC-IV). The WISC-IV includes a Full Scale IQ (FSIQ;
however, for this assignment we will just call it IQ).
Output file: See Week 5 SPSS Output.pdf file
(Links to an external site.)
.
Answer the following Questions:
* Hypotheses – Formulate null and alternative hypotheses. What do you think is the relationship
between IQ scores and GPA?
* Variables – Describe the scale of measurement (nominal, ordinal, interval, or ratio) for each of
the variables.
* Correlation – Write an overview of the results of the correlation (at least two paragraphs),
including the appropriate and necessary statistical results within sentences and in proper APA
formatting. Be sure to provide sufficient explanation for any numbers presented. Consider the
following in your overview and conclusions:
* Is there a significant correlation between IQ scores and GPA? If so, what does a significant
correlation mean?
* Using the correlation table and scatterplot, explain whether the relationship is positive,
negative, or no correlation.
*Describe the strength of the relationship (e.g. very strong, moderate, weak, etc.).
* What do the results tell us about your hypotheses?
* What conclusions can we draw from these results? What conclusions can we NOT make using
these results?
Write a total 550 words in response to these questions.
APA format
Size 12 font
Times New Roman
Double Spaced
At least 550 words – not counting the title, abstract and reference page
Can use other scholarly sources IF needed.
Reccomended References:
Benedict K (2014, April 11). Correlation – The Basic Idea Explained [Video file]. Retrieved from
Correlation – The Basic Idea Explained (Links to an external site.)
Diem, K. G. (2002). A step-by-step guide to developing effective questionnaires and survey
procedures for program evaluation & research. Available at
http://njaes.rutgers.edu/pubs/publication.asp?pid=FS995 (Links to an external site.)
Mariampolski, H. (2001). Qualitative vs. quantitative. Qualitative Market Research, 22-25. SAGE
Publications Ltd. doi: 10.4135/9781412985529.n13
Rice, G. T. (2005). Developing high quality multiple-choice test questions. Available at
http://circle.adventist.org/files/jae/en/jae200567043006.pdf (Links to an external site.)
Smith, Lara (2013, November 18). Correlation Basics [Video file]. Retrieved from Correlation
Basics

Statistics Project

The first thing you have to do is chose a topic, collect data, and then estimate relationships using the techniques of Chapters of multiple regression and regression analysis: model building.

  1. Do not use any of the data sets that come with different textbooks. Too many other people have used those data sets. The data source should to be one that you have actually found and verified.
  1. Make sure you explain what you are doing clearly with references. The project/paper will have to be properly written. Make sure you following proper citation rules for any information you got from another source, Only APA style.
  1. The sample size should of at least five independent variables. The independent variables should not be part of the dependent variable by construction.
  1. The project should involve applying the regression analysis in excel only.
  1. Somewhere the project you should have information on relevant descriptive statistics.  Relevant does not mean everything Excel can print out.  At a minimum it should include the mean and standard deviation of the variables involved.  However, you may decide that other statistics might provide useful information (e.g. in some cases, the range of the variables might be relevant).  This information could either be on a separate table of descriptive statistics or combined with regression tables.
  1. For the regression tables, as mentioned in the information posted on Moodle, you should not just copy from Excel but instead type proper tables.  You should have a regression table that has both your original model (before you eliminate variables) and your final Model.  The sample projects contain some examples.  Note, you do not need separate tables for the two models.

 
NOTE: REGRESSIONS SHOULD BE RUN IN EXCEL ONLY

Statistics Project

The first thing you have to do is chose a topic, collect data, and then estimate relationships using the techniques of Chapters of multiple regression and regression analysis: model building.

  1. Do not use any of the data sets that come with different textbooks. Too many other people have used those data sets. The data source should to be one that you have actually found and verified.
  1. Make sure you explain what you are doing clearly with references. The project/paper will have to be properly written. Make sure you following proper citation rules for any information you got from another source, Only APA style.
  1. The sample size should of at least five independent variables. The independent variables should not be part of the dependent variable by construction.
  1. The project should involve applying the regression analysis in excel only.
  1. Somewhere the project you should have information on relevant descriptive statistics.  Relevant does not mean everything Excel can print out.  At a minimum it should include the mean and standard deviation of the variables involved.  However, you may decide that other statistics might provide useful information (e.g. in some cases, the range of the variables might be relevant).  This information could either be on a separate table of descriptive statistics or combined with regression tables.
  1. For the regression tables, as mentioned in the information posted on Moodle, you should not just copy from Excel but instead type proper tables.  You should have a regression table that has both your original model (before you eliminate variables) and your final Model.  The sample projects contain some examples.  Note, you do not need separate tables for the two models.

 
NOTE: REGRESSIONS SHOULD BE RUN IN EXCEL ONLY

Correlation vs Causation

Correlation vs Causation 
Please Answer all the question step by step
1. In your own words define what is a correlation. How is CORRELATION DIFFERENT THAN CAUSATION?
2. Explain why it seems logical to see if there is an association between NUMBER OF CASES and ADVERSE EVENTS?
3. Describe the reason why we selected these two variables for correlation evaluation.
4. Explain why it does not make a difference which data go into which column.
5. Define and report what is the Rsqr value and the p value.
Interpret the Rsqr coefficient.
Write something informative about the outcome.
Use references (like the calculator site, others you may find online or on class site).