Regression Analysis

Transcription

Regression Analysis
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
InVivoStat
Regression Analysis
Module
Tipsheet
Page 1 of 15
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
The Regression Analysis module in InVivoStat is available within the Additional
Analyses sub-menu in the Statistics drop-down menu and is entitled ‘Regression
Analysis’. The user interface is:
The Regression Analysis module performs linear regression and multiple linear
regression. The user can fit a model that includes continuous factors, multiple
treatment (factorial) factors, other design (blocks) factors and a single covariate. All
interactions involving the continuous and treatment factors are included in the
statistical model but none of the interactions involving the blocking factors are
included. The user can also check the interactions involving the covariate by choosing
the ‘Assess covariate interactions’ option in the Output Options window.
Page 2 of 15
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
1 Setting up the model
Once the dataset has been opened, the user can select the variables for the analysis by
dragging and dropping them from the ‘Available variables’ list into the ‘Response’,
‘Continuous factors’, ‘Treatments (factorial)’, ‘Other design (blocks)’ and ‘Covariate’
boxes.
Once selected, the user has the option of applying a transformation to the response
variable and the continuous factors, either log10, loge, square root, arcsine or rank. If
selected the covariate will be transformed using the same transformation, unless
otherwise specified by the user.
If a covariate is selected, then the user has the option of selecting the ‘Primary factor’.
This factor is used to categorise the scatterplot (produced in the output). The Primary
factor should be one of the factors of interest to the experimenter.
Page 3 of 15
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
2 Selecting the analysis options
There are several results from the regression analysis that are available to the user.
These are selected before running the analysis.
These include:
1) ANOVA table
Produces overall tests of the effect of the terms in the statistical model.
2) Coefficients
If only one continuous factor is selected, the model coefficients (slope and intercept)
are automatically generated. These are calculated separately for the combinations of
any treatment factors. If more than one continuous factor is selected, then the
coefficients of the model parameters can be generated by selecting this option.
3) Adjusted R-squared
Allows the user to check how successful the model fit is in explaining the variation in
the data. Both the R-squared and Adjusted R-squared statistics are given.
4) Significance level
The default is 5%, although this can be changed.
Diagnostic plots:
5) Residuals vs. predicted plot
Allows the user to check the variance assumption of the parametric analysis.
6) Normal probability plot
Allows the user to check the normality assumption of the parametric analysis.
7) Cook’s distance plot
Allows the user to check for outliers in the dataset.
8) Leverage plot
Allows the user to check the effect of the individual observations on the model.
Page 4 of 15
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
3 Output Results
Response and covariate
InVivoStat identifies the response being analysed and also the covariate (if one is
selected). This section also describes any transformations that have been applied.
Scatterplots of the raw data, including best-fit regression lines
InVivoStat produces a scatterplots of the raw data. This should be used to identify
possible outliers. On the plot the X-axis corresponds to the levels of the continuous
factors and the Y-axis corresponds to the response. The plots are categorised by the
combinations of the treatment factors. Note the best-fit lines are not related to the
statistical model fitted in this module but simply the best-fit lines through the data as
given on the plot, i.e. they are not adjusted for any covariate, blocks or unequal
sample sizes.
Estimates of the coefficients of the best-fit regression lines
If a single continuous factor is selected, then a table of the coefficients (slope and
intercept) of the best-fit regression lines is given. This table is categorised by the
levels of the treatment factors where appropriate.
Categorised scatterplot of the raw data (ANCOVA only)
When fitting a covariate in a statistical analysis, certain assumptions are made. This
plot allows the user to test these assumptions. Underneath the plot is a list of the
assumptions and also advice on how the plot can be used to assess them.
ANOVA/ANCOVA table
The ANOVA/ANCOVA table gives tests of the overall effect of the model terms.
InVivoStat presents the Type I model fit within this module. Below the table any
statistically significant effects are listed.
Table of model coefficients
If requested, this table contains the model coefficients. By adding together these
coefficients, where appropriate, the user can identify the regression equations.
R-squared and Adjusted R-squared statistics
If requested the R-squared and Adjusted R-squared statistics are given.
Diagnostic plots
If requested InVivoStat produces the residuals vs. predicted plot, the normal
probability plot, the Cook’s distance plot and the Leverage plot. The residuals plotted
on the residuals vs. predicted plot are the standardized residuals as these can provide a
test for outliers. Any observation with a residual greater (or less than) 3 could be
considered an outlier.
Analysis description and references
A description of the analysis performed is given. Finally a list of references for the
methods applied in the analysis is given.
Page 5 of 15
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
4 Sample output
Options:
Page 6 of 15
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
InVivoStat Linear Regression Analysis
Response and covariate
The Observation response is currently being analysed by the Linear Regression
Analysis module, with Baseline observation fitted as a covariate.
Scatterplots of the raw data, including best-fit
regression lines
Note: The best-fit regression lines included on the plot are not adjusted for the
covariate.
Page 7 of 15
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
Estimates of the coefficients of the best-fit
regression lines
Intercept estimate
Slope estimate
F Control
10.6405
-0.4881
F Treatment
-11.3019
0.5872
M Control
6.7366
-0.2998
M Treatment
-10.3445
0.5310
Categorisation factor level combinations
Note: The estimates of the regression coefficients are not adjusted for the covariate.
Covariate plot of the raw data (ignoring continuous
factor)
Page 8 of 15
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
Tip: Is it worth fitting the covariate? You should consider the following:
a) Is there a relationship between the response and the covariate?... It is only worth
fitting the covariate if there is a strong positive (or negative) relationship between
them. The lines on the plot should not be horizontal.
b) Is the relationship similar for all treatments?... The lines on the plot should be
approximately parallel.
c) Is the covariate influenced by the treatment?... We assume the covariate is not
influenced by the treatment so there should be no separation of the treatment groups
along the x-axis on the plot.
These issues are discussed in more detail in Morris (1999).
Analysis of Covariance (ANCOVA) table
Sums of
squares
Degrees of
freedom
Mean
square
Fvalue
p-value
Baseline
observation
0.05
1
0.049
0.48
0.4935
Bodyweight
0.01
1
0.015
0.15
0.7056
Gender
0.03
1
0.025
0.25
0.6213
Treatment group
0.05
1
0.051
0.50
0.4873
Bodyweight *
Gender
0.09
1
0.090
0.89
0.3555
Bodyweight *
Treatment group
0.53
1
0.530
5.23
0.0318
Gender * Treatment
group
0.01
1
0.008
0.08
0.7843
Bodyweight *
Gender * Treatment
group
0.00
1
0.000
0.00
0.9767
Residuals
2.33
23
0.101
Comment: ANCOVA table calculated using a Type I model fit, see Armitage et al.
(2001).
Conclusion: There is a statistically significant effect of Bodyweight * Treatment
group.
Page 9 of 15
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
Table of model coefficients
Estimate Lower 95% CI Upper 95% CI
(Intercept)
11.652
-7.503
30.807
Baseline observation
0.220
-0.230
0.670
Bodyweight
-0.543
-1.476
0.390
GenderM
-3.428
-28.259
21.402
Treatment group
-20.657
-53.472
12.158
Bodyweight * GenderM
0.165
-1.046
1.376
Bodyweight * Treatment group
1.010
-0.601
2.622
GenderM * Treatment group
0.502
-39.658
40.662
Bodyweight * GenderM * Treatment group
-0.028
-2.001
1.945
Note: These model coefficients can be added together to obtain the model-based
estimates of the relationships between the factors and the response, see Chambers and
Hastie (1992).
R-squared and Adjusted R-squared statistics
R-squared Adjusted R-sq
Estimate
0.2477
-0.0140
The R-squared is the fraction of the variance explained by the model. A value close to
1 implies the statistical model fits the data well. Unfortunately adding additional
variables to the statistical model will always increase R-sq, regardless of their
importance. The Adjusted R-sq adjusts for the number of terms in the model and may
decrease if over-fitting has occurred. If there is a large difference between R-sq and
Adjusted R-sq, then non-significant terms may have been included in the statistical
model.
Page 10 of 15
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
Diagnostic plots
Tip: On this plot look to see if the spread of the points increases as the predicted
values increase. If so the response may need transforming.
Tip: Any observation with a residual less than -3 or greater than 3 (SD) should be
investigated as a possible outlier.
Page 11 of 15
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
Tip: Check that the points lie along the dotted line. If not then the data may be nonnormally distributed.
Page 12 of 15
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
Cook's distance plot
This plot should be used to assess whether there are any potential outliers in the
dataset. Observations where the Cook's distance are above the cut-off line should be
investigated further. Note the cut-off line has been calculated using the 4/n approach,
where n is the number of observations in the dataset.
Page 13 of 15
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
Leverage plot
This plot indicates the relative influence of the observations. Observations with a high
leverage may be unduly influencing the statistical model.
Analysis description
The data were analysed using an ANCOVA approach, with continuous factor
Bodyweight and treatment factors Gender, Treatment group and Baseline observation
as the covariate.
For more information on the theoretical approaches that are implemented within this
module, see Bate and Clark (2014).
Page 14 of 15
InVivoStat User Guides – Regression Analysis
Version 1.0
May 2015
Statistical references
Bate ST and Clark RA. (2014). The Design and Statistical Analysis of Animal
Experiments. Cambridge University Press.
Armitage P, Matthews JNS and Berry G. (2001). Statistical Methods in Medical
Research. 4th edition; John Wiley & Sons. New York.
Chambers JM and Hastie TJ. (1992). Statistical Models in S. Wadsworth and BrooksCole advanced books and software.
Morris TR. (1999). Experimental Design and Analysis in Animal Sciences. CABI
publishing. Wallingford, Oxon (UK).
R references
R Development Core Team (2013). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria. URL
http://www.R-project.org.
Barret Schloerke, Jason Crowley, Di Cook, Heike Hofmann, Hadley Wickham,
Francois Briatte, Moritz Marbach and Edwin Thoen (2014). GGally: Extension to
ggplot2. R package version 0.4.5. http://CRAN.R-project.org/package=GGally
Erich Neuwirth (2011). RColorBrewer: ColorBrewer palettes. R package version 1.05. http://CRAN.R-project.org/package=RColorBrewer
H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.
H. Wickham. Reshaping data with the reshape package. Journal of Statistical
Software, 21(12), 2007.
Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis.
Journal of Statistical Software, 40(1), 1-29. URL http://www.jstatsoft.org/v40/i01/.
Hadley Wickham (2012). scales: Scale functions for graphics. R package version
0.2.3. http://CRAN.R-project.org/package=scales
John Fox and Sanford Weisberg (2011). An {R} Companion to Applied Regression,
Second Edition. Thousand Oaks CA: Sage. URL:
http://socserv.socsci.mcmaster.ca/jfox/Books/Companion
Lecoutre, Eric (2003). The R2HTML Package. R News, Vol 3. N. 3, Vienna, Austria.
Louis Kates and Thomas Petzoldt (2012). proto: Prototype object-based
programming. R package version 0.3-10. http://CRAN.R-project.org/package=proto
Page 15 of 15