Lorelei Howard and Nick Wright MfD 2008 t-tests, anova and regression



Download 13.89 Kb.
Date20.08.2017
Size13.89 Kb.
#28254
  • Lorelei Howard and Nick Wright MfD 2008

Overview

  • Why do we need statistics?
  • P values
  • T-tests
  • ANOVA

Why do we need statistics?

  • To enable us to test experimental hypotheses
    • H0 = null hypothesis
    • H1 = experimental hypothesis
  • In terms of fMRI
    • Null = no difference in brain activation between these 2 conditions
    • Exp = there is a difference in brain activation between these 2 conditions

2 types of statistics

  • Descriptive Stats
    • e.g., mean and standard deviation (S.D)
  • Inferential statistics

Issues when making inferences

So how do we know whether the effect observed in our sample was genuine?

  • So how do we know whether the effect observed in our sample was genuine?
    • We don’t
  • Instead we use p values to indicate our level of certainty that our results represent a genuine effect present in the whole population

P values

  • P values = the probability that the observed result was obtained by chance
    • i.e. when the null hypothesis is true
  • α level is set a priori (Usually 0.05)
  • If p < α level then we reject the null hypothesis and accept the experimental hypothesis
    • 95% certain that our experimental effect is genuine
  • If however, p > α level then we reject the experimental hypothesis and accept the null hypothesis

Two types of errors

  • Type I error = false positive
    • α level of 0.05 means that there is 5% risk that a type I error will be encountered
  • Type II error = false negative

t-tests

Hypothetical experiment

  • Time
  • Q – does viewing pictures of the Simpson and the Griffin family activate the same brain regions?
  • Condition 1 = Simpson family faces
  • Condition 2 = Griffin family faces

Calculating T

  • Group 1
  • Group 2
  • Difference between the means divided by the pooled standard error of the mean

How do we apply this to fMRI data analysis?

  • Time

Degrees of freedom

  • = number of unconstrained data points
  • Which in this case = number of data points – 1.
  • Can use t value and df to find the associated p value
  • Then compare to the α level

Different types of t-test

  • 2 sample t tests
  • One sample t tests
    • compare the mean of one sample to a given value

Another approach to group differences

  • Analysis Of VAriance (ANOVA)
    • Variances not means
  • Multiple groups
    • e.g. Different facial expressions
  • H0 = no differences between groups
  • H1 = differences between groups

Calculating F

  • F = the between group variance divided by the within group variance
    • the model variance/error variance
  • for F to be significant the between group variance should be considerably larger than the within group variance

What can be concluded from a significant ANOVA?

  • There is a significant difference between the groups
  • NOT where this difference lies
  • Finding exactly where the differences lie requires further statistical analyses

Different types of ANOVA

  • One-way ANOVA
    • One factor with more than 2 levels
  • Factorial ANOVAs
    • More than 1 factor
  • Mixed design ANOVAs
    • Some factors independent, others related

Conclusions

  • T-tests assess if two group means differ significantly
  • Can compare two samples or one sample to a given value
  • ANOVAs compare more than two groups or more complicated scenarios
  • They use variances instead of means

Further reading

  • Howell. Statistical methods for psychologists
  • Howitt and Cramer. An introduction to statistics in psychology
  • Huettel. Functional magnetic resonance imaging (especially chapter 12)
  • Acknowledgements
  • MfD Slides 2005 – 2007

PART 2

  • Correlation
  • Regression
  • Relevance to GLM and SPM

Correlation

  • Strength and direction of the relationship between variables
  • Scattergrams
  • Y
  • X
  • Y
  • Y
  • X
  • Y
  • Y
  • Y
  • Positive correlation
  • Negative correlation
  • No correlation

Describe correlation: covariance

  • A statistic representing the degree to which 2 variables vary together
    • Covariance formula
    • cf. variance formula
  • but…
  • the absolute value of cov(x,y) is also a function of the standard deviations of x and y.

Describe correlation: Pearson correlation coefficient (r)

  • Equation
    • r = -1 (max. negative correlation); r = 0 (no constant relationship); r = 1 (max. positive correlation)
  • Limitations:
  • s = st dev of sample

Summary

  • Correlation
  • Regression
  • Relevance to SPM

Regression

  • Regression: Prediction of one variable from knowledge of one or more other variables.
  • Regression v. correlation: Regression allows you to predict one variable from the other (not just say if there is an association).
  • Linear regression aims to fit a straight line to data that for any value of x gives the best prediction of y.

Best fit line, minimising sum of squared errors

  • Describing the line as in GCSE maths: y = m x + c
  • Here, ŷ = bx + a
    • ŷ : predicted value of y
    • b: slope of regression line
    • a: intercept
    • Residual error (ε): Difference between obtained and predicted values of y (i.e. y- ŷ).
  • Best fit line (values of b and a) is the one that minimises the sum of squared errors (SSerror) (y- ŷ)2
  • ε
  • ε = residual
  • = ŷ, predicted
  • ŷ = bx + a

How to minimise SSerror

  • Minimise (y- ŷ)2 , which is (y-bx+a)2
  • Plotting SSerror for each possible regression line gives a parabola.
  • Minimum SSerror is at the bottom of the curve where the gradient is zero – and this can found with calculus.
  • Take partial derivatives of (y-bx-a)2 and solve for 0 as simultaneous equations, giving:
  • Values of a and b
  • Sums of squared error (SSerror)
  • Gradient = 0
  • min SSerror

How good is the model?

  • We can calculate the regression line for any data, but how well does it fit the data?
  • Total variance = predicted variance + error variance
  • sy2 = sŷ2 + ser2
  • Also, it can be shown that r2 is the proportion of the variance in y that is explained by our regression model
  • r2 = sŷ2 / sy2
  • Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get:
  • ser2 = sy2 (1 – r2)
  • From this we can see that the greater the correlation the smaller the error variance, so the better our prediction

Is the model significant?

  • i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean?
  • F-statistic:
  • F
  • (dfŷ,dfer)
  • =
  • sŷ2
  • ser2
  • =......=
  • r2 (n - 2)2
  • 1 – r2
  • complicated
  • rearranging
  • And it follows that:
  • t(n-2) =
  • r (n - 2)
  • √1 – r2
  • So all we need to
  • know are r and n !

Summary

  • Correlation
  • Regression
  • Relevance to SPM

General Linear Model

  • Linear regression is actually a form of the General Linear Model where the parameters are b, the slope of the line, and a, the intercept.
  • y = bx + a +ε
  • A General Linear Model is just any model that describes the data in terms of a straight line
  • 
  • 
  • 
  • 
  • 
  • 
  • 
  • +
  • =
  • +
  • Y
  • X
  • data vector (Voxel)
  • design matrix
  • parameters
  • error vector
  • ×
  • =
  • One voxel: The GLM
  • Our aim: Solve equation for β – tells us how much BOLD signal is explained by X

Multiple regression

  • Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y
  • The different x variables are combined in a linear way and each has its own regression coefficient:
  • y = b0 + b1x1+ b2x2 +…..+ bnxn + ε
  • The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y.
  • i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for

SPM

  • Linear regression is a GLM that models the effect of one independent variable, x, on one dependent variable, y
  • Multiple Regression models the effect of several independent variables, x1, x2 etc, on one dependent variable, y
  • Both are types of General Linear Model
  • This is what SPM does and will be explained soon…

Summary

  • Correlation
  • Regression
  • Relevance to SPM
  • Thanks!


Download 13.89 Kb.

Share with your friends:




The database is protected by copyright ©sckool.org 2023
send message

    Main page