- Maria Joao Rosa
- SPM Homecoming 2008
- Wellcome Trust Centre for Neuroimaging
**Statistic formulations** - P(A): probability of event A occurring
- P(A|B): probability of A occurring given B occurred
- P(B|A): probability of B occurring given A occurred
- P(A,B): probability of
**A and B** occurring simultaneously (joint probability of A and B) - Joint probability of A and B
- P(A,B) = P(A|B)*P(B) = P(B|A)*P(A)
- P(B|A) = P(A|B)*P(B)/P(A)
- Which is
**Bayes Rule** - Bayes’ Rule is very often referred to Bayes’ Theorem, but it is not really a theorem, and should more properly be referred to as Bayes’ Rule (Hacking, 2001).
**Reverend Thomas Bayes (1702 – 1761) ** - Reverend Thomas Bayes was a minister interested in probability and stated a form of his famous rule in the context of solving a somewhat complex problem involving billiard balls
- It was first stated by Bayes in his ‘Essay towards solving a problem in the doctrine of chances’, published in the
*Philosophical Transactions of the Royal Society of London *in 1764. **Conditional probability** - P(A|B): conditional probability of A given B
- Q: When are we considering conditional probabilities?
- A: Almost always!
- Examples:
- Lottery chances
- Dice tossing
**Conditional probability** - Examples (cont’):
- P(Brown eyes|Male): (P(A|B) with A := Brown eyes, B := Male)
- What is the probability that a person has brown eyes, ignoring everyone who is not a male?
- Ratio: (being a male with brown eyes)/(being a male)
- Probability ratio: probability that a person is both male and has brown eyes to the probability that a person is male
- P(Male) = P(B) = 0.52
- P(Brown eyes) = P(A) = 0.78
- P(Male with brown eyes) = P(A,B) = 0.38
- P(A|B) = P(B|A)*P(A)/P(B) = P(A,B)/P(B) = 0.38/0.52 = 0.73..
**Flipping it around (Bayes idea):** - You could also calculate now what’s the prob. of being a male if you have brown eyes P(B|A) = P(A|B)*P(B)/P(A) = 0.73*0.52/0.78 = 0.4871…
**Statistic terminology** - P(A) is called
**the ***marginal *or** ***prior* probability of A (since it is the probability of A *prior* to having any information about B) - Similarly:
- P(B):
**the ***marginal *or** ***prior* probability of B - P(A|B) is called
**the likelihood** function for A given B. - P(B|A):
**the posterior probability** of B given A* *(since it* *depends on having information about A) **Bayes Rule** - P(B|A) = P(A|B)*P(B)/P(A)
**“likelihood”** function for B (for fixed A) **“posterior” **probability of B given A **prior**** **probabilities of B, A (“priors”) - It relates to the conditional density of a parameter (posterior probability) with its unconditional density (prior, since depends on information present before the experiment).
- The likelihood is the probability of the data given the parameter and represents
**the data now available**. - Bayes’ Theorem for a given parameter
- p (data) = p (data) p () / p (data)
- 1/P (data) is basically
- a normalizing constant
- The prior is the probability of the parameter and represents what was thought
**before seeing the data**. - The posterior represents what is thought
**given both prior information and the data just seen**. **Data and hypotheses…** - We have a hypotheses
*H0* (null), *H1* - We have data (Y)
- We want to check if the model that we have (
*H1) *fits our data (accept *H1 */ reject* H0) *or not* *(*H0*) - Inferential statistics:
- what is the probability that we can
*reject H0* and *accept H1* at some level of significance (, P) - These are
**a-priori decisions** even when we don’t know what the data will be and how it will behave. - Bayes:
- We get some evidence for the model (“likelihood”) and then can even compare “likelihoods” of different models
**Where does Bayes Rule come at hand?** - In diagnostic cases where we’re are trying to calculate P(Disease | Symptom) we often know P(Symptom | Disease), the probability that you have the symptom given the disease, because this data has been collected from previous confirmed cases.
- In scientific cases where we want to know P(Hypothesis | Result), the probability that a hypothesis is true given some relevant result, we may know P(Result | Hypothesis), the probability that we would obtain that result given that the hypothesis is true- this is often statistically calculable, as when we have a p-value.
**Applicability to (f)mri** - Let’s take fMRI as a relevant example
- We have:
- Measured data : Y
- Model : X
- Model estimates: , (/variance)
**What do we get with inferential statistics?** - T-statistics on the betas ( = (1,2,…)) (taking error into account) for a specific voxel we would ONLY get that there is a chance (e.g. < 5%) that there is NO effect of (e.g. 1 > 2), given the data
- But what about the
**likelihood of the model**??? - What are the chances/likelihood that 1 > 2 at some voxel or region
- Could we get some quantitative measure on that?
**What do we get with Bayes statistics?** - Here, the idea (Bayes) is to use our post-hoc knowledge (our data)
__to estimate the model__, ( also allowing us to compare hypotheses (models) and see which fits our data best) **“posterior”** distribution for X given Y **“likelihood” **of Y given X **prior**** **probabilities of Y, X (“priors”) - Now to Steve about the practical sides in SPM…
- P(X|Y) = P(Y|X)*P(X)/P(Y)
- i.e. P(|Y) = P(Y|)*P()/P(Y)
**Bayes for Beginners: Applications** **SPM uses priors for estimation in…** - spatial normalization
- segmentation
- EEG source localisation
**and Bayesian inference in…** -
- Posterior Probability Maps (PPM)
- Dynamic Causal Modelling (DCM)
**Null hypothesis significance testing** - Standard approach in science is the null hypothesis significance test (NHST)
- Low
*p* value suggests “there is not nothing” - Assumption is H0 = noise; randomness
- H0 = molecules are randomly arranged in space
- Looking unlikely…
- Kreuger (2001)
*American Psychologist* **Something vs nothing** - …If there is
**any** effect.. - Our interpretations ultimately depend on p(H0)
- “Risky” vs “safe” research…
- Better to be explicit – incorporate subjectivity when specifying hypotheses.
- Belief change =
*p*(H0) – *p*(H0 | D) - If the underlying effect δ ~= 0, no matter how small, the test statistic grows in size – is this physiological?
**The case for the defence** - Law of large numbers means that the test statistic will identify a consistent trend (δ ~= 0) with a sufficient sample size
- In SPM, we look at images of
*statistics*, not effect sizes - A highly significant statistic
*may *reflect a small non-physiological difference, with large N **BUT… as long as we are aware of this, classical inference works well for common sample sizes** - post = d + p Mpost = d Md + p Mp
- post
**Posterior Probability Distribution** **(1) Bayesian model comparison** **BUT!!! **What is p(H0) for randomness?! **Reframe the question – ***compare *alternative hypotheses/models: - If only one model, then p(y) is a normalising constant…
**Practical example (1)** **Dynamic causal modelling (DCM)** - In
*“classical”* SPM, no (flat) priors - In
*“full”* Bayes, priors might be from theoretical arguments or from independent data - In
*“empirical”* Bayes, priors derive from the same data, assuming a hierarchical model for generation of the data - Parameters of one level can be made priors on distribution of parameters at lower level
- Parameters and hyperparameters at each level can be estimated using EM algorithm
**Shrinkage prior** *In the absence of evidence* *to the contrary, parameters* *will shrink to zero* **Practical example (2)** **(2) Posterior Probability Maps** - Posterior probability distribution p(
* |Y*) **(3) Use informative priors (cutting edge!)** - Spatial constraints on fMRI activity (e.g. grey matter)
- Spatial constraints on EEG sources, e.g. using fMRI blobs
**(4) Tasters – The Bayesian Brain** **(4a) Taster: Modelling behaviour…** - Ernst & Banks (2002)
*Nature* **(4a) Taster: Modelling behaviour…** - Ernst & Banks (2002)
*Nature* **(4b) Taster: Modelling the brain…** - Friston (2005)
*Phil Trans R Soc B* **Acknowledgements and further reading** **Previous MFD talks** -
**Jean & Guillame’s SPM course slides** -
- Krueger (2001) Null hypothesis significance testing
*Am Psychol *56: 16-26 - Penny et al. (2004) Comparing dynamic causal models.
*Neuroimage *22: 1157-1172 - Friston & Penny (2003) Posterior probability maps and SPMs
*Neuroimage *19: 1240-1249 - Friston (2005) A theory of cortical responses
*Phil Trans R Soc B * *www.ualberta.ca/~chrisw/BayesForBeginners.pdf * *www.fil.ion.ucl.ac.uk/spm/doc/books/hbf2/pdfs/***Ch17**.pdf **Bayes’ ending** - Bunhill Fields Burial Ground
- off City Road, EC1
**Share with your friends:** |