Lorelei Howard and Nick Wright MfD 2008 t-tests, anova and regression

 Date 20.08.2017 Size 13.89 Kb. #28254
• Lorelei Howard and Nick Wright MfD 2008

Overview

• Why do we need statistics?
• P values
• T-tests
• ANOVA

Why do we need statistics?

• To enable us to test experimental hypotheses
• H0 = null hypothesis
• H1 = experimental hypothesis
• In terms of fMRI
• Null = no difference in brain activation between these 2 conditions
• Exp = there is a difference in brain activation between these 2 conditions

2 types of statistics

• Descriptive Stats
• e.g., mean and standard deviation (S.D)
• Inferential statistics

So how do we know whether the effect observed in our sample was genuine?

• So how do we know whether the effect observed in our sample was genuine?
• We don’t
• Instead we use p values to indicate our level of certainty that our results represent a genuine effect present in the whole population

P values

• P values = the probability that the observed result was obtained by chance
• i.e. when the null hypothesis is true
• α level is set a priori (Usually 0.05)
• If p < α level then we reject the null hypothesis and accept the experimental hypothesis
• 95% certain that our experimental effect is genuine
• If however, p > α level then we reject the experimental hypothesis and accept the null hypothesis

Two types of errors

• Type I error = false positive
• α level of 0.05 means that there is 5% risk that a type I error will be encountered
• Type II error = false negative

Hypothetical experiment

• Time
• Q – does viewing pictures of the Simpson and the Griffin family activate the same brain regions?
• Condition 1 = Simpson family faces
• Condition 2 = Griffin family faces

Calculating T

• Group 1
• Group 2
• Difference between the means divided by the pooled standard error of the mean

• Time

Degrees of freedom

• = number of unconstrained data points
• Which in this case = number of data points – 1.
• Can use t value and df to find the associated p value
• Then compare to the α level

Different types of t-test

• 2 sample t tests
• One sample t tests
• compare the mean of one sample to a given value

Another approach to group differences

• Analysis Of VAriance (ANOVA)
• Variances not means
• Multiple groups
• e.g. Different facial expressions
• H0 = no differences between groups
• H1 = differences between groups

Calculating F

• F = the between group variance divided by the within group variance
• the model variance/error variance
• for F to be significant the between group variance should be considerably larger than the within group variance

What can be concluded from a significant ANOVA?

• There is a significant difference between the groups
• NOT where this difference lies
• Finding exactly where the differences lie requires further statistical analyses

Different types of ANOVA

• One-way ANOVA
• One factor with more than 2 levels
• Factorial ANOVAs
• More than 1 factor
• Mixed design ANOVAs
• Some factors independent, others related

Conclusions

• T-tests assess if two group means differ significantly
• Can compare two samples or one sample to a given value
• ANOVAs compare more than two groups or more complicated scenarios
• They use variances instead of means

• Howell. Statistical methods for psychologists
• Howitt and Cramer. An introduction to statistics in psychology
• Huettel. Functional magnetic resonance imaging (especially chapter 12)
• Acknowledgements
• MfD Slides 2005 – 2007

PART 2

• Correlation
• Regression
• Relevance to GLM and SPM

Correlation

• Strength and direction of the relationship between variables
• Scattergrams
• Y
• X
• Y
• Y
• X
• Y
• Y
• Y
• Positive correlation
• Negative correlation
• No correlation

Describe correlation: covariance

• A statistic representing the degree to which 2 variables vary together
• Covariance formula
• cf. variance formula
• but…
• the absolute value of cov(x,y) is also a function of the standard deviations of x and y.

Describe correlation: Pearson correlation coefficient (r)

• Equation
• r = -1 (max. negative correlation); r = 0 (no constant relationship); r = 1 (max. positive correlation)
• Limitations:
• s = st dev of sample

Summary

• Correlation
• Regression
• Relevance to SPM

Regression

• Regression: Prediction of one variable from knowledge of one or more other variables.
• Regression v. correlation: Regression allows you to predict one variable from the other (not just say if there is an association).
• Linear regression aims to fit a straight line to data that for any value of x gives the best prediction of y.

Best fit line, minimising sum of squared errors

• Describing the line as in GCSE maths: y = m x + c
• Here, ŷ = bx + a
• ŷ : predicted value of y
• b: slope of regression line
• a: intercept
• Residual error (ε): Difference between obtained and predicted values of y (i.e. y- ŷ).
• Best fit line (values of b and a) is the one that minimises the sum of squared errors (SSerror) (y- ŷ)2
• ε
• ε = residual
• = ŷ, predicted
• ŷ = bx + a

How to minimise SSerror

• Minimise (y- ŷ)2 , which is (y-bx+a)2
• Plotting SSerror for each possible regression line gives a parabola.
• Minimum SSerror is at the bottom of the curve where the gradient is zero – and this can found with calculus.
• Take partial derivatives of (y-bx-a)2 and solve for 0 as simultaneous equations, giving:
• Values of a and b
• Sums of squared error (SSerror)
• min SSerror

How good is the model?

• We can calculate the regression line for any data, but how well does it fit the data?
• Total variance = predicted variance + error variance
• sy2 = sŷ2 + ser2
• Also, it can be shown that r2 is the proportion of the variance in y that is explained by our regression model
• r2 = sŷ2 / sy2
• Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get:
• ser2 = sy2 (1 – r2)
• From this we can see that the greater the correlation the smaller the error variance, so the better our prediction

Is the model significant?

• i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean?
• F-statistic:
• F
• (dfŷ,dfer)
• =
• sŷ2
• ser2
• =......=
• r2 (n - 2)2
• 1 – r2
• complicated
• rearranging
• And it follows that:
• t(n-2) =
• r (n - 2)
• √1 – r2
• So all we need to
• know are r and n !

Summary

• Correlation
• Regression
• Relevance to SPM

General Linear Model

• Linear regression is actually a form of the General Linear Model where the parameters are b, the slope of the line, and a, the intercept.
• y = bx + a +ε
• A General Linear Model is just any model that describes the data in terms of a straight line
• 
• 
• 
• 
• 
• 
• 
• +
• =
• +
• Y
• X
• data vector (Voxel)
• design matrix
• parameters
• error vector
• ×
• =
• One voxel: The GLM
• Our aim: Solve equation for β – tells us how much BOLD signal is explained by X

Multiple regression

• Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y
• The different x variables are combined in a linear way and each has its own regression coefficient:
• y = b0 + b1x1+ b2x2 +…..+ bnxn + ε
• The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y.
• i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for

SPM

• Linear regression is a GLM that models the effect of one independent variable, x, on one dependent variable, y
• Multiple Regression models the effect of several independent variables, x1, x2 etc, on one dependent variable, y
• Both are types of General Linear Model
• This is what SPM does and will be explained soon…

Summary

• Correlation
• Regression
• Relevance to SPM
• Thanks!