Friday, 15 March 2013

R Statistical Tool Assignment-6

Topic of Discussion - Introduction to Pool-Fixed-Random model estimate of Panel Data 


Panel Data also known as longitudinal or cross sectional time series data is a data set where behavior of entities are observed across time. These entities could be countries/ cities/ individuals/ companies..etc 
Panel Data helps us to control variables which we cannot observe/ measure like cultural factors or differences in business practices across companies. 


An example of a panel data ...


Gasoline


      country    year  lgaspcar    lincomep      lrpmg           lcarpcap
1    AUSTRIA 1960 4.173244 -6.474277 -0.33454761  -9.766840
2    AUSTRIA 1961 4.100989 -6.426006 -0.35132761  -9.608622
3    AUSTRIA 1962 4.073177 -6.407308 -0.37951769  -9.457257
4    AUSTRIA 1963 4.059509 -6.370679 -0.41425139  -9.343155
5    AUSTRIA 1964 4.037689 -6.322247 -0.44533536  -9.237739
6    AUSTRIA 1965 4.033983 -6.294668 -0.49706066  -9.123903
7    AUSTRIA 1966 4.047537 -6.252545 -0.46683773  -9.019822
8    AUSTRIA 1967 4.052911 -6.234581 -0.50588340  -8.934403
9    AUSTRIA 1968 4.045507 -6.206894 -0.52241255  -8.847967
10   AUSTRIA 1969 4.046355 -6.153140 -0.55911051  -8.788686
11   AUSTRIA 1970 4.080888 -6.081712 -0.59656122  -8.728200
12   AUSTRIA 1971 4.106720 -6.043626 -0.65445914  -8.635898
13   AUSTRIA 1972 4.128018 -5.981052 -0.59633184  -8.538338
14   AUSTRIA 1973 4.199381 -5.895153 -0.59444681  -8.487289
15   AUSTRIA 1974 4.018495 -5.852381 -0.46602693  -8.430404
16   AUSTRIA 1975 4.029018 -5.869363 -0.45414221  -8.382815
17   AUSTRIA 1976 3.985412 -5.811703 -0.50008372  -8.322232
18   AUSTRIA 1977 3.931676 -5.833288 -0.42191563  -8.249563
19   AUSTRIA 1978 3.922750 -5.762023 -0.46960312  -8.211041
20   BELGIUM 1960 4.164016 -6.215091 -0.16570961  -9.405527
21   BELGIUM 1961 4.124356 -6.176843 -0.17173098  -9.303149
22   BELGIUM 1962 4.075962 -6.129638 -0.22229138  -9.218070
23   BELGIUM 1963 4.001266 -6.094019 -0.25046225  -9.114932
24   BELGIUM 1964 3.994375 -6.036461 -0.27591057  -9.005491
25   BELGIUM 1965 3.951531 -6.007252 -0.34493695  -8.862581

As you can see the data has a country/state element in the index whose data is given across a particular time horizon..unlike a time series data which only has a time element in the index for a particular country/state/individual..etc



Class Objective:- Panel Data Analysis 

Panel (data) analysis is a statistical method, widely used in social scienceepidemiology, and econometrics, which deals with two-dimensional panel data.[1] The data are usually collected over time and over the same individuals and then a regression is run over these two dimensions. Multidimensional analysis is an econometric method in which data are collected over more than two dimensions (typically, time, individuals, and some third dimension)

A common panel data regression model looks like y_{it}=a+bx_{it}+\epsilon_{it}, where y is the dependent variablex is the independent variablea and b are coefficients, i and t are indices for individuals and time. The error term is very important in this analysis. Assumptions about the error term determine whether we speak of fixed effects or random effects. In a fixed effects model,  the error term is assumed to vary non-stochastically over  i or  t making the fixed effects model analogous to a dummy variable model in one dimension. In a random effects model,  the error term  is assumed to vary stochastically over  i or  t requiring special treatment of the error variance matrix

Important Facts.....
1-The key assumption in the Pooled model is that there are no unique attributes of individuals / countries/ any object over the measurement set and there are no universal effects over time.

2- In the Fixed model there is a presence of an unique attribute for a particular individual that is not random and which do not vary across time. This model is suitable in cases if we want to draw inferences about particular individuals.

3- In case of the random model there are unique , time constant attributes of individuals / countries that are effects of random variation and do not correlate with individual regressors. This model is suitable if we want to draw inferences about the entire population and not the sample only 

Given problem is the panel data named "Produc" which was already embedded in the PLM package; 
We have to find which of the models Pool/ fixed/ Random shall be applicable to the Panel Data



Tools: pFtest(fixed,pool) ; plmtest(pool) ; phtest( fixed, random) 

Commands :- 

> data("Produc", package="plm")
> head(Produc)
> pool<-plm(log(pcap)~log(hwy)+log(water)+log(util)+log(pc)+log(gsp)+log(emp)+log(unemp), data=Produc,model=("pooling"),index=c("state","year"))
> summary(pool)
# the pooling model is the regular OLS(ordinary least squares) regression model.
log(pcap) is the outcome variable while the others are predictor variables;
# index=c("state,"year") is the panel setting for the analysis 
Pr(>|t|)= Two-tail p-values test the hypothesis that each coefficient is different from 0. To reject this, the p-value has to be lower than 0.05 (95%, you could choose also an alpha of 0.10), if this is the case then you can say that the variable has a significant influence on your dependent variable (y)
# if the p-value that comes out of the test is <0.05 then our model is ok. This is a test to see whether all the coefficients in the model are different from zero.  

>fixed<-plm(log(pcap)~log(hwy)+log(water)+log(util)+log(pc)+log(gsp)+log(emp)+log(unemp), data=Produc,model=("within"),index=c("state","year"))

> summary(fixed)

> random<-plm(log(pcap)~log(hwy)+log(water)+log(util)+log(pc)+log(gsp)+log(emp)+log(unemp), data=Produc,model=("random"),index=c("state","year"))
> summary(random)

> pFtest(fixed,pool)

> plmtest(pool)
# the plmtest or the Lagrange Multiplier test  helps us to decide between random effects regression and the simple OLS regression. The Null hypothesis states that there is no variance across the data set i.e. there is no significant difference across units, i.e. no panel effect. Here the p value comes out to be as <2.26*10^-16 which is way significant than our criteria. So in this case we can reject the null hypothesis and say that the random effects of regression is more suited in this model. 

> phtest(fixed,random) # to decide between fixed or random models we run the Hausman test or the phtest(fixed, random) with the null hypothesis being that there is no correlation between the error and the regressors. So the null hypothesis is the Random Model. If the p value that comes out of the test is significant i.e. <0.05 then we go for a fixed model or else a random model.  












No comments:

Post a Comment