You are expected to write an economic essay in which you demonstrate your basic fluency in applying econometrics on your own. This sentence best summarises the expectations towards the essays. Making these expectations more explicit, below you may find
a proposed order of work with justification on why steps should be undertaken in this order
a structure of the essay
information on where some data and additional materials may be found
Proposed order of work
When attempting any economic study one needs to first specify a question of interest. Naturally, the question you pose for the needs of this essay does not need to be original and groundbreaking for the whole discipline. However, it has to be formulated and it has to be formulated clearly. You should explain why such questions are relevant and why answers are needed. This, naturally, requires providing a theoretical background to your study. Yes, it is true that in some parts of economics we pursue GIGO approach (=put everything you like on LHS of your model and see what proves significant), but this is neither good practice nor an ultimate endpoint of this strand of research. Thus, try to (a) formulate a question; (b) explain why it is interesting; (c) root that in economic thinking and – preferably – economic theory. The best questions are the ones that specify a nice alternative, e.g.: if some theory holds, we expect to see this sign/size of a particular estimator. Obviously, other questions may be formulated as well.
Once you know your question, it may actually occur to you that perhaps you are not the first one to ask it. In fact, in majority of cases you will find that someone indeed figured this out before you and many others followed adding their contributions to the field. This is a good thing. If I were you I would be really worried if nobody did my idea before – unless you are a genius, you are probably wrong and this is the reason why nobody did it before . Thus, make use of the http://scholar.google.com. Pick the key words that describe your idea best and try browsing through the literature. If there is too many, do not just pick the first one on the list – ScholarGoogle order is not necessarily informative on what are the seminal/important works. A better option is to additionally use the filter to browse only through the most recent papers - each of them has a literature review section, great source of knowledge for the field. Remember about this and to get to know the field better. Why? Two main reasons. First, you learn about the methods and/or data sources that people used to answer the question you find interesting. You do not have to think on your own, you just pursue the path that someone else already bricked . Second, you learn about the caveats of each question. These caveats may range from an adequate set of control variables to the adequate choice of empirical technique. Remember, that there is always a catch: data are never perfect and neither are methods. This does not mean that we should not do research – this only means that we have to be aware of the shortcomings (and openly admit them).
Now most of you think probably: “hey, but we only have to run a simple OLS and diagnostics”. Yes, this is true, nothing more is expected of your empirical part. But in your paper I would like you to demonstrate the understanding of the question at hand. You need this. Already – not next year .
So now you have two elements: your question and knowledge about the field (=what variables you need, where to take the data from, how this is usually done in the literature, etc). This is the moment when you actually start working. You first try to collect the data. If the data is just free – download it and compile a dataset. If it is not – you need to be smarter. Frequently you can just Google the author(s) and find the dataset they used on their websites. If there is nothing there, you may write the author(s), explain simply that you want to replicate their findings as your assignment in econometrics and request that they make the data available to you. If this does not work – pick another question and return back to square “1”. It is just an econometrics assignment, do not waste your time on data collection, because you are supposed to learn other things with this exercise.
So now you have everything you need: question, knowledge and data. It becomes high time to start doing econometrics (and not a second earlier!). What does that mean? Probably you’ll need to work with the data:
draw histograms, time-lines and scatterplots to learn about the dataset and what can be expected of it.
Remember that you will not get from the data more information than they contain. That implies that a dummy will not become a continuous variable just because you want it to. That implies that a variable with little variance will not have a lot of explanatory power just because your theory predicts so. And so on. Learn to know your data. Data definitions (is that $mio? or local currency mio? or % of GDP? or % growth rate? etc.)
try to formulate an econometric model that will be consistent with your expectations
That implies formulating the (a) functional form and (b) the list of variables you want to include in the model.
Most of the time you will do a trial and error process to get the final specification.
try to run diagnostics and confirm if your expectations concerning the statistical quality of your model are substantiated by the data or no
Model which does not pass the diagnostic tests does not disqualify your essay as long as you interpret the findings adequately. The same applies to a model unable to confirm your theoretical approach. You do not have to do new economics. You are only supposed to prove that you are able to do econometrics for economic purposes and understand the basic concepts we covered in class.
In other words, if your model fails on – for example - heteroscedasticity, do not worry that you fail because of the data. The only things you need to do is to adequately interpret the tests, explain the consequences of heteroscedasticity and try to check if there is any way by which you could perhaps remove it from your model by changing a functional form or variables. If not – this is fine. But explain your thinking step by step.
Naturally, this is the stage where you may require assistance, especially in STATA. If you need assistance, you can always come to the office hours with the particular problem. Taking into account that there is so many of you – we will set additional meetings in the last two weeks of December classes on Friday (10th and 17th of December). However, these additional meetings are by appointment only. So you need to send me an e-mail in which you show you already have the data and tried to do something on your own.
But be warned: I will not help you in (i) finding a subject or data; (ii) specifying the regression equation; (iii) interpreting the outcomes. I will help you in (i) technically performing analyses with STATA, including graphs; (ii) inputing the data to STATA, if you found it in some other form; (iii) saving your results (both the analysis output and the graphs) so that you could redo your analyses on your own.
Yet another issue concerns your cooperation among each other. It is absolutely acceptable that two or more people work on a related topic. You cannot have the same question or regression, though. That means if someone is interested in the effect of gender on wage and someone else in the effect of age on wage, you may work together in getting the data and you can come together to work in STATA. But your essays will naturally be different, because your research questions will be different. Your regressions should eventually differ too.
Structure of essay
Your essay has to contain:
introduction in which you explain your research question, data source and expected findings
literature review - which covers the main works in the field and explains how the field evolved;
data section - which covers the data, it’s sources, shortcomings, properties (e.g. descriptive statistics, histograms, etc.)
OLS analysis section – which covers specification(s) of your model that are suited to the theoretical and conceptual frameworks discussed in (b)
diagnostics section – where you present the tests that you run, interpret their findings and explain if the (potential) problems could be solved and how
conclusion section – where you summarise what you wanted to do, what you did and what came out
bibliography list, according to standards (citations in text should be made according to standards as well; does not matter which standards, as long as you are consistent)
I will appreciate if your conclusions section would include some ideas you have about the literature (where is it right and where it is deeply mistaken according to you), but this is not a necessary step.
Your paper does not have to be long. Some best empirical papers have at the most 15 pages even using very fancy econometrics - your grade is for the common sense and skills, not for the ability to write many words.
It has to be submitted by December 31st, 2010, by e-mail (please, do not send me .docx files, because I cannot open them - .doc, .pdf or any open-source format is fine).
It has to contain your undoubtedly original and individual work. If you had previous assignments in econometrics - although I cannot verify that - you are expected to prepare a new one here.
You are expected to demonstrate your ability to perform:
plain OLS regression (and interpret it)
diagnostics (and interpret it)
visualise your main points with the use of graphs (histograms, scatterplots and others).
If you want (and can) you may use more complex econometrics, but please, come and discuss this in advance. Once again, your model does not have to be the correct econometric method for handling your research question.
Your paper needs to contain the economics of the analysis (what stands behind your regression equation), preferable substantiated by some studies in this domain. Your paper needs to contain economic - not only statistical - interpretation of your findings. It would be useful if you could also show the limitations of your study and how would you suggest to expand your own research.
Sources of data and information
Barbara Gebicka runs a nice introductory course on econometrics in CERGE-EI in Prague. The teaching materials are available freely on her website: http://home.cerge-ei.cz/gebicka/Intro.html
Also Rolf Tschernig & Harry Haupt provide these very nice slides online: http://www-cgi.uni-regensburg.de/TTEC/fileadmin/user_upload/ICIE_2009.pdf
Although the book I gave you is much better, some of you may like this very nice online introduction to STATA: http://rlab.lse.ac.uk/it/it_docs/Introduction_to_stata.pdf
NBER datasets: http://www.nber.org/data/ (you can find literarily everything there, however, US mostly)
World Trade Data
World Trade Data (John F. Helliwell) - http://www.econ.ubc.ca/helliwell/restrict/datasets/wtdata/wtdata.htm
Andrew K. Rose - http://faculty.haas.berkeley.edu/arose/RecRes.htm (his research datasets)
Data for international comparisons
Penn World Tables - http://pwt.econ.upenn.edu/ (the most popular dataset)
Centre for International Development, Harvard - http://www.cid.harvard.edu/ciddata/ciddata.html (datasets for research by Jeffrey Sachs, Robert Feenstra and Andrew Warner)
Barro-Lee dataset (1993) - http://www.nber.org/pub/barro.lee/ZIP/ (the very basis for inquiring endogenous growth theories, the whole Xavier Sala-i-Martin textbook is based on this dataset, includes human capital, financial sector development etc.)
Barro-Lee dataset (2000) - http://web.korea.ac.kr/~jwlee/ (research on education)
Sachs-Warner dataset (1995) - http://www.bris.ac.uk/Depts/Economics/Growth/sachs.htm
International Comparisons of Output and Productivity Industrial Database - http://www.ggdc.net/icop.html (for researching productivity and econmic growth)
Freedom dataset in the world - http://www.freetheworld.com/download.html
Corruption perception index - http://www.transparency.org/
Panel Study of Income Dynamics (PSID) 1968-2003 http://psidonline.isr.umich.edu/
National Longitudinal Surveys http://www.bls.gov/nls/home.htm
Resources for Economists on the Internet - http://www.rfe.org/showCat.php?cat_id=2
Living Standards Measurement Study - http://www.worldbank.org/lsms/