Hw 5 Solutions
1a)
proc glm;
class trt;
model weight=trt;
output out=two residual=ehat predicted=yhat;
run;
The GLM Procedure
Dependent Variable: weight
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 1 115.200000 115.200000 2.15 0.1596
Error 18 963.600000 53.533333
Corrected Total 19 1078.800000
These results suggest that there is no treatment effect.
(F = 2.15, p-value = 0.1596).
1b)
i) Independence This assumption is not satisfied. Within each pot are two
plants. It is usually not safe to assume that two plants within a single
pot are independent. Treatments were assigned to pots rather than plants, so
pots are the experimental unit. When there are multiple observations
corresponding to any one experimental unit, it is usually not appropriate to
assume that the observations are independent.
ii) Constant variance The assumption is roughly satisfied.
The variation of residuals in the vertical direction is roughly the same
for both treatments.
iii) Normal distribution of errors This also is roughly satisfied. The
residuals have a pattern that is rather uniform, but there are no outliers
and most points lie along the line in the normal probability plot.
1c) The experimental unit is the pot.
1d) The observational unit is the plant.
1e) This problem asks you to compute one mean for each experimental unit
(pot) and conduct an analysis of those means as if they were the original
data. You could have easily computed the means by hand and typed them into
SAS. It is also possible to get SAS to compute the means for you using
the following code.
proc means data=one;
var weight;
by trt pot;
output out=two mean=mean;
run;
proc print;
run;
Obs trt pot _TYPE_ _FREQ_ mean
1 1 1 0 2 15.0
2 1 2 0 2 15.5
3 1 3 0 2 15.0
4 1 4 0 2 15.5
5 1 5 0 2 15.0
6 2 6 0 2 20.0
7 2 7 0 2 20.0
8 2 8 0 2 20.5
9 2 9 0 2 20.0
10 2 10 0 2 19.5
proc glm data=two;
class trt;
model mean=trt;
run;
The GLM Procedure
Dependent Variable: mean
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 1 57.60000000 57.60000000 576.00 <.0001
Error 8 0.80000000 0.10000000
Corrected Total 9 58.40000000
F = 576 with 1 & 8 df.
p-value <0.0001
We can conclude that the plants treated with the placebo had significantly
greater weights then the plants treated with the fungal pathogen.
1f) The errors in the first analysis were negatively correlated; thus the
variance is overestimated in the first analysis. We dont see a significant
treatment effect initially because although the true variation among
experimental units treated alike is quite low, it seems high because of the
negative correlation among plants within pots. The negative correlations makes
it seem like there is more variation in response to treatment than there really
is. Note that the negative correlation can be observed by examining the
residual plot with the points labeled by pot number. The higher the residual
corresponding to one plant in a pot, the lower the residual for the other plant
in the same pot. The better one plant does, the worse its partner does. This
can sometimes occur in field experiments where fast growing varieties shade
neighboring varieties.
2a)
i) Independence We have no information about this in the problem, so we will
assume independence holds.
ii) Constant variance This is violated. The variation of the residuals
increases with dose. This suggests that the variation of the error terms is not
constant.
iii) Normality This also is violated somewhat. There are outliers present in
the lower tail.
2b)
The best transformation is the cube root. The log transformation over corrects,
leaving the low-dose observations more variable than the high dose observations.
The square root is better than the log, but it still looks like there is more
variability at high doses than at low doses.
The cube root transformation is a good compromise between square root and log.
The SAS code is below:
proc glm data=one;
class dose;
model count=dose;
output out=two residual=ehat predicted=yhat;
run;
proc univariate plot data=two;
var ehat;
run;
proc plot data=two;
plot ehat*yhat;
run;
data three; set one;
logct=log(count);
run;
proc glm data=three;
class dose;
model logct=dose;
output out=four residual=ehatlog predicted=yhatlog;
run;
proc univariate plot data=four;
var ehatlog;
run;
proc plot data=four;
plot ehatlog*yhatlog;
run;
data five; set one;
sqrtct=count**(1/2);
run;
proc glm data=five;
class dose;
model sqrtct=dose;
output out=six residual=ehatsqrt predicted=yhatsqrt;
run;
proc univariate plot data=six;
var ehatsqrt;
run;
proc plot data=six;
plot ehatsqrt*yhatsqrt;
run;
data seven; set one;
cubertct=count**(1/3);
run;
proc glm data=seven;
class dose;
model cubertct=dose;
output out=eight residual=ehatcubert predicted=yhatcubert;
run;
proc univariate plot data=eight;
var ehatcubert;
run;
proc plot data=eight;
plot ehatcubert*yhatcubert;
run;