Model selection using AIC in the presence of one-sided information

April 23, 2018 | Author: Anonymous | Category: Documents
Report this link


Description

Journal of Statistical Planning and Inference 115 (2003) 397–411 www.elsevier.com/locate/jspi Model selection using AIC in the presence of one-sided information Anthony W. Hughesa ; ∗, Maxwell L. Kingb aSchool of Economics, University of Adelaide, Adelaide, SA 5005, Australia bDepartment of Econometrics and Business Statistics, Monash University, Clayton, Vic. 3168, Australia Received 20 September 2000; received in revised form 28 January 2002; accepted 18 February 2002 Abstract In recent decades, econometricians and statisticians have become aware of the bene6ts of using non-sample information when conducting inference. This is most notable in the 6eld of hypothesis testing where considerable e8ort has gone into developing tests that utilize non-sample information in the form of inequality constraints—it is now well known that one-sided tests generally have higher power for given size relative to corresponding two-sided tests. In this paper, we extend the principles of one-sided hypothesis testing to the related area of model selection and develop an analogue of Akaike’s information criterion that utilizes one-sided information. This criterion is widely applicable in problems where the signs of some or all the parameters are known or can be inferred on the basis of a priori information. Examples of this include selecting between variance components models (such as random e8ects models for panel data), selecting the order of a (G)ARCH model, selecting between random coe@cient models as well as the problem of variable selection in the linear regression framework. Here, we investigate the small sample performance of the new one-sided criterion relative to existing criteria using Monte Carlo simulation. We include two applications—those of variable selection in linear regression and selecting between various random e8ects models for panel data. We 6nd that the new criterion performs consistently well across a wide variety of model selection problems. c© 2002 Elsevier Science B.V. All rights reserved. MSC: C13; C5; C19 Keywords: Inequality constrained inference; Akaike’s information criterion; Kullback–Leibler information; One-sided AIC ∗ Corresponding author. E-mail address: [email protected] (A.W. Hughes). 0378-3758/02/$ - see front matter c© 2002 Elsevier Science B.V. All rights reserved. PII: S0378 -3758(02)00159 -3 398 A.W. Hughes, M.L. King / Journal of Statistical Planning and Inference 115 (2003) 397–411 1. Introduction Often when estimating a statistical model, we have access to information which is not contained in the data. Probably the most obvious and frequent examples of this involve knowledge about inequality restrictions on unknown parameter values. These can occur as the natural consequences of a theoretical understanding of the underlying process being modelled, accumulated statistical evidence or functional considerations such as variances always being non-negative. If correctly harnessed, inequality restrictions can help us improve our inferences. This improvement is most easily seen in the case of hypothesis testing. Here knowl- edge of the sign of an unknown parameter under test can allow one to use a more e@cient one-sided test in place of the standard two-sided test with, in some cases, large improvements in power. For example, King and Evans (1984) conducted an em- pirical comparison of the power of one-sided versus two-sided Lagrange multiplier tests for additive heteroscedasticity in the general linear regression model. They found that with respect to power, the one-sided test dominates its two-sided counterpart with power improvements ranging from 39% to 63%. One can imagine that for multivariate one-sided tests, these gains in power could be even more substantial. This point is now well understood and over the last two decades there has been a blossoming literature on tests designed to exploit these gains—see Wu and King (1994) for a recent review. In contrast, almost nothing has been written on model selection in the presence of inequality restrictions. This is surprising in the light of the close relationship between model selection procedures based on information criteria (IC) and the likelihood ratio test (LRT). As is well known, IC-based model selection procedures involve choosing the model with the largest maximized likelihood minus a penalty which reIects the complexity of the model. As pointed out by PJotscher (1991), this is equivalent to testing each model against all other models by means of a standard LRT and selecting that model which is accepted against all other models; the critical values of these tests being determined by the di8erences in penalties for each pair of models. The aim of this paper is to modify Akaike’s IC (AIC) procedure so that it appropri- ately utilizes inequality information on parameters when such information is present. There are two parts to this modi6cation. The 6rst involves the use of inequality con- strained estimates when maximizing the likelihood and the second involves a modi6- cation to the penalties. In the hypothesis testing literature, critical values of the LRT came from the chi-squared distribution with k degrees of freedom when k parameters are being tested and there are no inequality restrictions on these parameters. This changes to a weighted mixture of chi-squared distributions with up to k degrees of free- dom when there are inequality restrictions on the parameters under test. Consequently, critical values are smaller than for the unconstrained parameter case. By analogy, we therefore expect penalty di8erences to be smaller for IC model selection procedures in the presence of inequality restrictions. This reasoning is similar to that used by Anraku (1999) for the problem of model selection under simple order restrictions. There is a well known duality between the problems of order restricted inference and problems involving inequality constraints; in our view, considerable utility can be gained via an extension of this methodology to problems involving inequality constraints. A.W. Hughes, M.L. King / Journal of Statistical Planning and Inference 115 (2003) 397–411 399 We chose AIC because of its popularity although others have noted that, in some cir- cumstances, it can have a non-zero probability of over6tting asymptotically. However, as observed by Stone (1979), if we allow the true model to become more complex as the sample size increases, AIC will select the true model with certainty asymptot- ically. More recently, Kwek and King (1997, 1998) have compared a wide range of model selection procedures in the context of selecting di8erent autoregressive condi- tional heteroscedastic (ARCH) models using samples which range in size from 60 to 500. This is a model selection problem in which there are inequality restrictions on the parameters because variances cannot be negative. In terms of probabilities of cor- rect selection, they found AIC was one of the best performers (particularly for larger sample sizes) and Schwarz’s (1978) Bayesian IC (BIC) was clearly the worst per- former, having great di@culty selecting larger models when they were the true model. An outline of the paper is as follows. In the next section, the derivation of one-sided AIC (OSAIC) is presented. This criterion represents an asymptotically unbiased esti- mate of the Kullback–Leibler discrepancy when one-sided information is available. Section 3 outlines a series of Monte Carlo experiments designed to assess the relative performance of OSAIC and AIC for two model selection problems where one-sided in- formation is commonly available. The 6rst of these is variable selection in the classical linear regression model. In this case, the signs of the variables included in the model are implied by theory; in a practical sense, we suggest that a modeller may want to impose the restrictions and thus obtain a boost in terms of e@ciency. The second case is the problem of selecting between random e8ects models for panel data; here, the inequality restrictions are due to the fact that variances must always be non-negative. For the case of variable selection, application of the one-sided information carries the risk that the theory is inappropriate whereas enforcing the rule that variances are non-negative is risk free. One may suggest that a greater statistical return may be derived by taking additional statistical risk but the Monte Carlo results of Evans and King (1984) and Baltagi et al. (1992) suggest that signi6cant power improvements are possible with- out additional risk. In this paper we investigate whether such improvements are also possible for the case of model selection. Section 4 contains a discussion of the results of the Monte Carlo experiments. Some concluding remarks are made in the 6nal section. 2. Inequality constrained model selection Our interest is in choosing between q statistical models, each with log-likelihood function Li(�(ki)), which depends on the ki × 1 parameter vector �(ki); i = 1; : : : ; q. Inference is made using x, an n × 1 vector of observations from the unknown data generating process (DGP). Let �ˆ(ki) denote the maximum likelihood estimator of �(ki) based on assuming that model i is true. Akaike’s (1973, 1974) model selection procedure is based on an asymptotically unbiased estimator of the Kullback–Leibler discrepancy between the DGP and model i with �(ki) = �ˆ(ki), namely (�ˆ(ki)) = constant − EX [Li(�ˆ(ki))]: (1) 400 A.W. Hughes, M.L. King / Journal of Statistical Planning and Inference 115 (2003) 397–411 The aim is to 6nd the model with the smallest (�ˆ(ki)) value because this model is closest on average to the DGP. Clearly the constant term in (1) can be ignored. Akaike’s (1973) asymptotically unbiased estimator of −EX [Li(�ˆ(ki))], the nonconstant part of the Kullback–Leibler discrepancy, is known as AIC and is (up to a multiplicative constant) AICi =−Li(�ˆ(ki)) + ki: The AIC procedure involves selecting the model for which AICi is a minimum. In this section, we derive an asymptotically unbiased estimate of the Kullback– Leibler information for the case where one-sided inequality constraints apply for some or all the parameters under consideration. We assume that this information holds with certainty and that there are pi such constraints on ki parameters in model i. Our aim is to develop a model selection procedure analogous to AIC that makes appropriate use of one-sided inequality constraints on model parameters. 2.1. One-sided AIC (OSAIC) Let g(x|� ∗) be the joint density of the true DGP with � ∗ being the true parameter vector. In attempting to estimate the nonconstant term in (1), Akaike (1973, 1974) de6nes the mean expected log-likelihood for model i as = Li(ki) = EX ( QLi(�ˆ(ki))) = ∫ QLi�ˆ(ki))g(x|� ∗) dx; (2) where QLi(�ˆ(ki)) is the expected log-likelihood function evaluated at �ˆ(ki), the (un- constrained) maximum likelihood estimator for �(ki) in model i. The analogue to this expression, which takes into account the one-sided inequality constraints, is (2) with �ˆ(ki) replaced by �˜(ki), which is the inequality constrained maximum likelihood estimator for model i. For ease of exposition, we will begin by assuming pi = ki; i.e., all parameters are inequality constrained. Taking a Taylor’s series expansion of QLi(�) about � ∗ in the case where Li(�) is the true log-likelihood function, we obtain QLi(�)≈ QLi(� ∗) + (�− � ∗)′EX (q(�)|�=� ∗) + 12 (�− � ∗)′EX (H (�)|�=� ∗)(�− � ∗); (3) where q(�) is the score vector and H (�) is the Hessian matrix. As the expected score vector evaluated at the true parameter value is zero, the second term of (3) vanishes leaving: QLi(�) ≈ QLi(� ∗)− 12 (�− � ∗)′I |�=� ∗(�− � ∗); (4) where I is the information matrix. Now consider the case where �= �˜(ki). If the true model occurs at the point in the parameter space where all the inequality constrained parameters lie on the boundary of the feasible set, then the distribution of twice the second term in (4) is asympto- tically equivalent to the distribution under H0 of the inequality constrained Wald test A.W. Hughes, M.L. King / Journal of Statistical Planning and Inference 115 (2003) 397–411 401 statistic as outlined by GouriTeroux and Monfort (1995). This asymptotic distribution is a probability weighted mixture of chi-squared distributions of the form pi∑ m=0 w(pi; m)�2(m); where w(pi; m) are probability weights such that ∑ m w(pi; m) = 1 and � 2(m) is the chi-squared distribution with m degrees of freedom. Taking expectations of both sides of (4) for this particular case, we obtain = Li(ki) = EX ( QLi(�˜(ki))) ≈ QLi(� ∗)− 12 pi∑ m=0 w(pi; m)m: (5) If we now consider the Taylor’s series expansion of the true value of the log-likelihood, Li(� ∗), around the inequality constrained maximum likelihood estimate �˜(ki), we obtain Li(� ∗)≈ Li(�˜(ki)) + (� ∗ − �˜(ki))′(q(�)|�=�˜(ki)) + 12 (� ∗ − �˜(ki))′(H (�)|�=�˜(ki))(� ∗ − �˜(ki)): (6) Again the second term vanishes, this time because for the jth element of � either �˜j(ki) = �∗j or qj(�|�=�˜(ki)) = 0. Thus we have: Li(� ∗) ≈ Li(�˜(ki))− 12 (� ∗ − �˜(ki))′I |�=�˜(ki)(� ∗ − �˜(ki)); (7) where the second term of this expression is similar to the second term of (4) evaluated at the inequality constrained maximum likelihood estimates. Taking expectations as before, we obtain EX (Li(� ∗)) ≈ EX (Li(�˜(ki)))− 12 pi∑ m=0 w(pi; m)m or equivalently QLi(� ∗) ≈ QLi(�˜(ki))− 12 pi∑ m=0 w(pi; m)m: (8) Because the maximized log-likelihood, Li(�˜(ki)), is an asymptotically unbiased estima- tor of QLi(�˜(ki)), combining (5) and (8) gives = Li(ki) ≈ Li(�˜(ki))− pi∑ m=0 w(pi; m)m: (9) We now consider the case of pi ¡ki and where the true model occurs at a point in the parameter space at which ri of the pi inequality constraints hold exactly. Proceeding as above, we again get (4) but now the distribution of twice the second term in (4) is asymptotically equivalent to the distribution under H0 of the partially inequality constrained Wald test statistic as outlined by Kodde and Palm (1986) and Wu and 402 A.W. Hughes, M.L. King / Journal of Statistical Planning and Inference 115 (2003) 397–411 King (1994). This asymptotic distribution is of the form ri∑ m=0 w(ri; m)�2(ki − ri + m); and implies (5) now becomes = Li(ki) = QLi(� ∗)− 12 ri∑ m=0 w(ri; m)(ki − ri + m): (10) Taking the Taylor’s series expansion of Li(� ∗) around �˜(ki), we still obtain (6) and (7), but now (9) becomes QLi(� ∗) ≈ QLi(�˜(ki))− 12 ri∑ m=0 w(ri; m)(ki − ri + m): (11) Again estimating QLi(�˜(ki)) using Li(�˜(ki)) and combining (10) and (11) gives = Li(ki) ≈ Li(�˜(ki))− ri∑ m=0 w(ri; m)(ki − ri + m): (12) When ri = 0, (12) reduces to = Li(ki) ≈ Li(�˜(ki))− ki; (13) which, when �˜(ki) is replaced by �ˆ(ki), is Akaike’s (1973, 1974) result in the uncon- strained case. Eq. (12) is similar to the criterion developed by Anraku (1999), although our criterion encompasses the case where there is a combination of unconstrained and inequality constrained parameters in the choice set. If ri = ki in (12), this criterion is identical to that developed by Anraku (1999). Our results suggest selecting the model for which = Li(ki) is largest, i = 1; : : : ; q, where = Li(ki) is calculated using the RHS of (9), (12) or (13) when ri = ki, 0¡ri ¡ki, or ri = 0, respectively. Because � ∗ is un- known and hence we cannot know ri, the number of inequality constraints that hold exactly, this procedure cannot be applied in practice. Our 6ndings suggest that the AIC penalty function should be decreased as ri increases. The two extremes are to use (13), which is regular AIC with unconstrained estimates replaced by inequality constrained estimates and (12) with pi = ri. We call the former AIC1 and the latter OSAIC. We recommend the use of OSAIC for the following reasons. Information criteria, such as AIC are made up of two constituent parts—the penalty function, which is related to parsimony and the maximized log-likelihood function which measures the goodness of 6t of the model. We know that the maximized log-likelihood can invariably be increased by removing a binding constraint, so adding extraneous unconstrained parameters will have the e8ect of increasing the maximized log-likelihood. However, when extraneous inequality constrained parameters are added, there is a good chance the maximized log-likelihood cannot be increased because to do so, the extra parameter value would have to violate the constraint. The role of the penalty function is to provide a benchmark for judging whether the addition of extra parameters increases A.W. Hughes, M.L. King / Journal of Statistical Planning and Inference 115 (2003) 397–411 403 the maximized log-likelihood more than would be expected if these parameters were extraneous. Because inequality constraints reduce parameter uncertainty and also reduce the average increase in the maximized log-likelihood function, we should be using the penalty which assumes the inequality constraints hold exactly. 2.2. Determination of the weights OSAIC, like multivariate one-sided tests, relies on the determination of the weights, w(ri; m), m = 1; : : : ; ri; because of the importance of one-sided testing, these weights have been given a considerable amount of attention in the statistics literature. Despite this, closed form expressions for w(ri; m) are only available for problems of low dimen- sionality (ri ¡ 4) since the weights depend on the computation of multi-dimensional normal integrals. Early work, by authors such as Kudoˆ (1963) and Gupta (1963), in- volved formulating the problem, describing its complexity and providing the results given below. For a more detailed discussion of the problem and its solutions, see GouriTeroux and Monfort (1995), Shapiro, (1988) or Wu and King (1994). In the case of ri = 1, due to symmetry, we have w(1; 0) = w(1; 1) = 0:5. For ri = 2, however, correlation between the constrained coe@cients causes asymmetry so that w(2; 0)=(cos−1�)=2�, w(2; 1)=0:5 and w(2; 2)=0:5−w(2; 0), where � is the correlation obtained as follows. Let �i denote the covariance matrix of �ˆ(ki) and let Ri be a 2×ki matrix such that the two exact constraints can be written as Ri�(ki)= 0. Then � is the correlation obtained from treating Ri�iR′i as a 2× 2 covariance matrix. For ri = 3 we have w(3; 3) = (2�− cos−1(�12)− cos−1(�13)− cos−1(�23))=(4�); w(3; 2) = (3�− cos−1(�12:3)− cos−1(�13:2)− cos−1(�23:1))=(4�); w(3; 1) = (2�− cos−1(�12)− cos−1(−�13)− cos−1(�23))=(4�); w(3; 0) = 1− w(3; 3)− w(3; 2)− w(3; 1); where �ij and �ij:k are correlation and partial correlation coe@cients obtained from treating Ri�iR′i as a 3 × 3 covariance matrix where Ri is now the appropriate 3 × ki exact constraint matrix. For higher values of ri, the weights have to be obtained numerically. A computer program for this purpose has been written by Bohrer and Chow (1978) while Wolak (1987) has described how the weights can be generated by Monte Carlo simulation. 3. The Monte Carlo experiments In order to investigate and compare the small sample properties of OSAIC, AIC1 and AIC in the presence of inequality constraints, two Monte Carlo experiments were conducted. The 6rst involved regressor selection in the context of the classical linear regression model with the inequality constraints being in the form of knowledge of 404 A.W. Hughes, M.L. King / Journal of Statistical Planning and Inference 115 (2003) 397–411 the signs of particular regression coe@cients, something that is relatively common, particularly in econometric applications of this model. The second experiment is set in the context of a linear regression model based on panel data where there is a question of which variance components (individual e8ects and=or time e8ects) should be included in the model. Inequality constraints naturally occur in this problem because variances cannot be negative. In the remainder of this paper, AIC2 denotes AIC applied ignoring any inequality constraints. 3.1. Variable selection in the classical linear regression model In this experiment, we chose q = 8 so the problem is one of choosing between the eight possible classical linear regression models each containing x1j and nested within: yj = �0x1j + �1x2j + �2x3j + �3x4j + uj; j = 1; : : : ; n; (14) where uj ∼ IN (0; 2); �0; : : : ; �3 are regression coe@cients and x1j; : : : ; x4j are obser- vations on four nonstochastic regressors. Because regressor behaviour impacts on the small sample properties of each of the criteria, we considered 6ve di8erent design matrices for (14) with n= 30 and 60. These are: X1: A constant and three sets of pseudo-random numbers from the uniform distribution with range (5,10). These values were generated once and then held 6xed. X2: A constant, Australian quarterly real consumption data and the same series lagged one and two periods. This matrix has a high degree of positive correlation between the constrained regressors. X3: A constant and three quarterly seasonal dummies. X4: The same as X2 but the unlagged consumption is multiplied by negative one. This induces negative correlation between the second and third and second and fourth regressors while there is positive correlation between the third and fourth regressors. X5: Four quarterly seasonal dummies. In this case the regressors are all orthogonal. We assume �0 is unconstrained and that �1; �2 and �3 are inequality constrained in the sense that they are known to be nonnegative. The true values of �1; �2 and �3 were set to all possible combinations of 0, 0.5, 1.0, 1.5 and 2.0. Our results are invariant to the true value of �0 which was held constant. The standard deviation of the errors, , was set to 5, 9000, 2, 9000 and 2 for X1–X5, respectively. Thus the problem of choosing between 8 models nested in (14) was considered for 125 values of � ∗, for 5 di8erent data sets and 2 sample sizes. Where required, inequality constrained parameter estimates were calculated using the IMSL quadratic programming subroutine DQPROG. 3.2. Model selection in the random e9ects framework The purpose of our second experiment is to assess the relative performance of OSAIC and AICI when we select between di8erent representations of the variance covariance matrix, a situation where one-sided information is commonly available. A.W. Hughes, M.L. King / Journal of Statistical Planning and Inference 115 (2003) 397–411 405 One such framework is that where the variance of the error term can be separated into di8erent components. Possibly the most interesting of these models, particularly from an econometric perspective, is the random e8ects model for panel data which has become increasingly popular in recent years. The two-way random e8ects model can be expressed as yit = x′it� + uit ; uit = "i + #t + vit ; i = 1; : : : ; N ; t = 1; : : : T; "i ∼ IN (0; 2"), #t ∼ IN (0; 2#), vit ∼ IN (0; 2v), where yit is a dependent variable of interest mapping N individuals over T time periods and xit is an associated explanatory variable vector. In estimating this model, we may restrict either the individual or time random e8ects whose variances are 2" and 2 #, respectively, to be zero—in which case we get a one-way error components model. Conversely, we can restrict both compo- nents to be zero so that the model 6ts into the classical linear regression framework. Thus there are four potential models depending on which combinations of 2" and 2 # are zero or positive. See Baltagi et al. (1992) for further discussion. This experiment involved simulating samples from each of the models, then estimat- ing each of the models by inequality constrained maximum likelihood (or OLS when 2"= 2 #=0) and then selecting a model using each of OSAIC and AIC1. AIC2 was not considered in this experiment due to the di@culty in interpreting models with negative variance estimates. We used four sample sizes, namely N =25, T =10; N =15, T =10; N =15, T =15; and N =100, T =5. Throughout, only one regressor was used. It was generated as the stationary process xit = 0:5xit−1 + wt; wt ∼ IN (0; 1); t = 1; : : : ; T; independently for each value of i; i=1; : : : ; N , and then held constant in the simulations. The values of 2" and 2 # were set to all possible combinations of 0, 2 and 4, and 2 v was set to 30 for each of the simulations. The inequality constrained estimation was performed using the IMSL subroutine DNCONF. 4. Results and discussion 4.1. Variable selection in the classical linear regression model In total, the experimental design involved the estimation of 30 000 probabilities of selecting a model. This is a large number of probabilities to digest, so we have sum- marized these results in Tables 1 and 2. Fuller results are available from the authors on request. In Table 1 we present average probabilities of selecting the correct model using AIC2, AIC1 and OSAIC. The probabilities in the ith column (under Mi) are estimated probabilities averaged over all DGPs for model Mi. M1 is (14) with no zero regression coe@cients, M2, M3 and M4 are (14) with, respectively, only �3=0, �2=0 and �1=0 with all other coe@cients non-zero. Models M5, M6 and M7 have, respectively, only 406 A.W. Hughes, M.L. King / Journal of Statistical Planning and Inference 115 (2003) 397–411 Table 1 Average estimated probabilities of selecting the correct model (regressors) in the linear regression model using AIC2, AIC1 and OSAIC M1 M2 M3 M4 M5 M6 M7 M8 UA WA X1, n = 30 AIC2 0.326 0.361 0.394 0.382 0.450 0.413 0.475 0.573 0.360 0.422 AIC1 0.319 0.398 0.434 0.421 0.546 0.518 0.582 0.756 0.382 0.496 OSAIC 0.454 0.458 0.516 0.514 0.538 0.519 0.536 0.560 0.479 0.512 Xl, n = 60 AIC2 0.494 0.494 0.513 0.525 0.537 0.536 0.557 0.584 0.506 0.530 AIC1 0.490 0.543 0.565 0.580 0.636 0.653 0.668 0.768 0.536 0.613 OSAIC 0.620 0.586 0.622 0.626 0.584 0.608 0.593 0.563 0.614 0.600 X2, n = 30 AIC2 0.027 0.047 0.064 0.047 0.372 0.335 0.366 0.631 0.073 0.236 AIC1 0.002 0.025 0.049 0.024 0.499 0.452 0.493 0.864 0.067 0.301 OSAIC 0.497 0.217 0.000 0.214 0.449 0.414 0.454 0.774 0.358 0.377 X2, n = 60 AIC2 0.335 0.462 0.460 0.447 0.603 0.608 0.608 0.680 0.410 0.525 AIC1 0.327 0.507 0.507 0.487 0.764 0.760 0.759 0.897 0.440 0.626 OSAIC 0.706 0.576 0.574 0.572 0.495 0.480 0.484 0.799 0.635 0.586 X3, n = 30 AIC2 0.153 0.227 0.216 0.217 0.359 0.351 0.337 0.576 0.201 0.304 AIC1 0.152 0.272 0.262 0.266 0.447 0.427 0.409 0.733 0.227 0.371 OSAIC 0.208 0.366 0.355 0.365 0.469 0.467 0.443 0.507 0.293 0.397 X3, n = 60 AIC2 0.269 0.358 0.355 0.354 0.453 0.472 0.481 0.584 0.324 0.416 AIC1 0.268 0.420 0.414 0.414 0.549 0.566 0.573 0.742 0.357 0.493 OSAIC 0.346 0.503 0.499 0.496 0.551 0.556 0.561 0.490 0.426 0.500 X4, n = 30 AIC2 0.053 0.176 0.195 0.047 0.378 0.336 0.366 0.631 0.120 0.273 AIC1 0.047 0.197 0.226 0.027 0.682 0.511 0.499 0.753 0.142 0.368 OSAIC 0.131 0.218 0.253 0.355 0.722 0.486 0.472 0.567 0.231 0.400 X4, n = 60 AIC2 0.391 0.510 0.504 0.447 0.597 0.608 0.608 0.680 0.451 0.543 AIC1 0.388 0.566 0.564 0.503 0.856 0.763 0.758 0.787 0.490 0.648 OSAIC 0.525 0.511 0.507 0.798 0.845 0.543 0.557 0.585 0.568 0.609 X5, n = 30 AIC2 0.250 0.326 0.306 0.305 0.416 0.410 0.388 0.533 0.291 0.367 AIC1 0.241 0.356 0.334 0.337 0.510 0.493 0.483 0.759 0.308 0.439 OSAIC 0.387 0.445 0.424 0.429 0.503 0.487 0.486 0.585 0.416 0.468 X5, n = 60 AIC2 0.494 0.494 0.513 0.525 0.537 0.535 0.557 0.584 0.505 0.530 AIC1 0.490 0.543 0.565 0.580 0.636 0.653 0.668 0.768 0.536 0.612 OSAIC 0.620 0.586 0.622 0.626 0.584 0.608 0.593 0.563 0.614 0.600 A.W. Hughes, M.L. King / Journal of Statistical Planning and Inference 115 (2003) 397–411 407 Table 2 Average estimated probabilities of selecting the correct linear regression model, an overparameterized linear model, an underparameterized linear model and a mis6tted linear model using AIC2, AIC1 and OSAIC Correct model Over parameterized model Under parameterized model Mis6tted model UA WA UA WA UA WA UA WA Xl, n = 30 AIC2 0.360 0.422 0.063 0.180 0.539 0.335 0.038 0.063 AIC1 0.382 0.496 0.034 0.098 0.566 0.376 0.017 0.029 OSAIC 0.479 0.512 0.069 0.190 0.426 0.257 0.026 0.041 Xl, n = 60 AIC2 0.506 0.530 0.070 0.188 0.397 0.240 0.027 0.042 AIC1 0.536 0.613 0.038 0.103 0.415 0.265 0.012 0.019 OSAIC 0.614 0.600 0.076 0.202 0.293 0.173 0.017 0.025 X2, n = 30 AIC2 0.073 0.236 0.049 0.154 0.662 0.334 0.063 0.124 AIC1 0.067 0.301 0.006 0.029 0.694 0.361 0.058 0.128 OSAIC 0.358 0.377 0.092 0.201 0.334 0.154 −075 0.123 X2, n = 60 AIC2 0.410 0.525 0.056 0.156 0.483 0.222 0.051 0.096 AIC1 0.440 0.626 0.013 0.040 0.498 0.234 0.049 0.100 OSAIC 0.635 0.586 0.124 0.262 0.188 0.073 0.053 0.079 X3, n = 30 AIC2 0.201 0.304 0.044 0.148 0.716 0.467 0.040 0.080 AIC1 0.227 0.371 0.035 0.104 0.727 0.503 0.011 0.022 OSAIC 0.293 0.397 0.056 0.183 0.629 0.381 0.021 0.039 X3, n = 60 AIC2 0.324 0.416 0.050 0.158 0.590 0.362 0.036 0.064 AIC1 0.357 0.493 0.038 0.108 0.599 0.386 0.007 0.013 OSAIC 0.426 0.500 0.061 0.191 0.499 0.284 0.014 0.024 X4, n = 30 AIC2 0.120 0.273 0.033 0.134 0.772 0.438 0.074 0.155 AIC1 0.142 0.368 0.019 0.081 0.800 0.472 0.039 0.079 OSAIC 0.231 0.400 0.041 0.147 0.688 0.376 0.040 0.077 X4, n = 60 AIC2 0.451 0.543 0.056 0.156 0.442 0.203 0.052 0.098 AIC1 0.490 0.648 0.032 0.093 0.456 0.216 0.022 0.043 OSAIC 0.568 0.609 0.075 0.209 0.336 0.150 0.020 0.031 X5, n = 30 AIC2 0.291 0.367 0.057 0.175 0.610 0.387 0.042 0.072 AIC1 0.308 0.439 0.028 0.089 0.642 0.433 0.022 0.038 OSAIC 0.416 0.468 0.064 0.179 0.487 0.298 0.033 0.055 X5, n = 60 AIC2 0.505 0.530 0.066 0.184 0.458 0.279 0.030 0.049 AIC1 0.536 0.612 0.033 0.095 0.477 0.308 0.016 0.027 OSAIC 0.614 0.600 0.072 0.192 0.338 0.202 0.023 0.037 408 A.W. Hughes, M.L. King / Journal of Statistical Planning and Inference 115 (2003) 397–411 �3; �2 and �1, non-zero out of �1; �2 nd �3. M8 is (14) with �1=�2=�3=0. The ninth column in Table 1 provides an unweighted average (UA) of estimated probabilities of correct selection across all 125 DGPs. In recognition of the fact that di8erent models have di8erent numbers of DGPs (e.g., M8 has one while M1 has 64 DGPs), the last column is the weighted average (WA) of estimated probabilities of correct selection that gives equal weight to each model (e8ectively a simple average of the 6rst 8 columns). Table 2 provides a comparison of the UA and WA of estimated probabilities of each of the procedures choosing (i) the correct model; (ii) an overparameterized model (at least one parameter in the chosen model is zero in the DGP); (iii) an underparam- eterized model (the chosen model has fewer non-zero parameters than the DGP) and (iv) a mis6tted model (there is at least one regressor in the DGP not in the chosen model and vice versa). When the average probability of correct selection is used to measure the relative e@ciency of the criteria, OSAIC outperformed both AIC1 and AIC2 in every experi- ment considered. OSAIC also performed better than AIC2 in every experiment when the weighted average of correct selection was used as a performance measure. When this measure was used to compare OSAIC with AIC1, we see that for n=30, OSAIC outperformed AIC1 on every occasion. However, when n = 60, AIC1 outperformed OSAIC for four of the 6ve design matrices considered. A comparison of OSAIC and AIC1 based on conditional averages reveals that gen- erally, OSAIC outperformed AICI when there are two or three inequality constrained parameters contained in the true model, but where there are one or no such para- meters, AIC1 generally performed better than OSAIC. This is to be expected because the only di8erence between AIC1 and OSAIC is the penalty function which is smaller for OSAIC, therefore favouring models with more parameters. In comparing OSAIC with AIC2, we see that only when the smallest model is true did AIC2 perform com- parably to OSAIC. This indicates that gains in estimator e@ciency are able to outweigh the disadvantage of a smaller penalty when attempting to select a small correct model. The e8ect of correlation between the regressors on model selection performance is marked. For X2, for instance, there is extreme multicollinearity which leads to a rela- tively di@cult model selection problem. In this case, the bene6t of OSAIC can clearly be seen. The penalty for OSAIC for the largest model considered in the experiment is 2.709 whereas for AIC the penalty is 5; the penalty for the smallest model considered is 2 for both OSAIC and AIC. This means that OSAIC is less likely than AIC to incorrectly omit variables that are highly correlated with other variables considered in the model selection problem. It can be seen from Table 1 that OSAIC’s performance in correctly selecting the larger models far exceeds the performance of AIC1 and AIC2 for this design matrix. The e8ect of increasing the sample size with AIC-type criteria is that it leads to a decrease in the probability of under6tting the true process, but does not necessarily lead to an increase in the probability of correct selection. This can clearly be seen in Table 1 for all design matrices under consideration. As OSAIC penalizes additional one-sided parameters less heavily than AIC1, OSAIC has a greater tendency to overparameterize than underparameterize relative to AICI. A.W. Hughes, M.L. King / Journal of Statistical Planning and Inference 115 (2003) 397–411 409 Table 3 Average estimated probabilities of selecting the correct random e8ects model using AIC1 and OSAIC Time and Individual Time Classical UA WA individual e8ects e8ects e8ects linear regression N = 25, T = 10; 500 replications AIC1 0.85 0.45 0.65 0.86 0.72 0.71 OSAIC 0.91 0.34 0.71 0.73 0.72 0.67 N = 15; T = 15; 500 replications AIC1 0.80 0.82 0.83 0.84 0.82 0.83 OSAIC 0.88 0.78 0.80 0.73 0.82 0.80 N = 15, T = 10; 500 replications AIC1 0.65 0.52 0.58 0.88 0.63 0.66 OSAIC 0.77 0.46 0.64 0.75 0.67 0.66 N = 100, T = 5; 200 replications AIC1 0.92 0.91 0.89 0.87 0.91 0.90 OSAIC 0.96 0.84 0.84 0.73 0.88 0.84 This can be seen clearly in Table 2 for every design matrix and sample size. If the aim of the practitioner is to test hypotheses using the selected model, then the relative costs of underparameterization and overparameterization must be considered. Whilst the cost of overparameterization is a loss of estimator e@ciency and a loss of power in subsequent hypothesis testing, the cost of under-parameterization is an omitted variable bias which will bring the validity of any hypothesis test that follows into question. Though neither error is desirable, we believe that the cost of overparameterization is relatively less than underparameterization. Given this, if the average probability of correct selection is the same for two criteria, the one that underparameterizes least is preferable. The Monte Carlo results suggest that OSAIC’s overall performance is generally better than AIC1’s, in terms of correct selection, and because this is coupled with a lower average probability of underparameterization, OSAIC can be preferred to AIC1. 4.2. Model selection in the random e9ects framework The results from the second experiment are contained in Table 3. The output is presented in a similar manner to that for variable selection. Generally the results for these experiments are similar to those obtained for variable selection in the linear regression model. OSAIC’s performance in correctly identifying larger models in the choice set is better than that for AIC1. OSAIC, however, tends to select the correct model less often than AIC1 in cases where the true model is the classical linear regression model. Under these conditions, OSAIC has a higher probability of over6tting the true process. If the number of time periods is decreased whilst keeping the number of individuals 6xed, OSAIC performs better at identifying a time e8ects model relative to AIC1. In 410 A.W. Hughes, M.L. King / Journal of Statistical Planning and Inference 115 (2003) 397–411 this case, the relative performance of OSAIC and AIC1 is similar when the classical regression model, the individual e8ects model or the two-way random e8ects model is the true model. When we increase the number of individuals whilst keeping the number of time periods 6xed, the relative performance of AIC1 and OSAIC remains fairly constant. In the large sample case, where N = 100 and T = 5, we see that the familiar asymptotic properties of AIC-type criteria become apparent. In this case the probability of under6tting the true process is much lower than for the small sample cases but there is still a signi6cant probability of overparameterization for both criteria. 5. Concluding remarks The loss associated with the extraneous inclusion of parameters with reliable one- sided information attached is less than if such information is unavailable. The crite- rion developed in this paper, called OSAIC, embodies this notion by penalizing the inclusion of one-sided parameters less than AIC, whilst penalizing the inclusion of two-sided parameters at the same rate as AIC. OSAIC is found to be an asymptoti- cally unbiased estimator of the Kullback-Leibler information given certain assumptions about the parameters of the DGP. If these assumptions are valid, AIC is asymptotically biased—implying that OSAIC has the same theoretical justi6cation as AIC does for the two-sided case. In Monte Carlo experiments, OSAIC was found to outperform AIC2, thus indicating that one-sided information can be utilized to improve model selection-based inference. This adds to the one-sided hypothesis testing literature—providing further evidence in favour of the use of inequality constrained estimation techniques in the presence of one-sided information. Whilst OSAIC does not uniformly outperform AIC1, it has many good properties. Not least of these is the fact that OSAIC performs better than AIC1 in di@cult model selection problems—when the sample is small and where there is a high degree of multicollinearity present in the design matrix. Given these 6ndings, we recommend the use of OSAIC where one-sided information is present concerning the parameters of the model. Acknowledgements This research was supported by an ARC grant. Earlier versions of this paper were presented at the Australasian Meeting of the Econometric Society held at Armidale and to seminars at Monash University and the University of New South Wales. We wish to thank Catherine Forbes, Kevin Fox, Clive Granger, David Harris and Ping Wu for their constructive comments. All remaining errors are our responsibility. References Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, F. (Eds.), Second International Symposium on Information Theory. Akademiai Kiado, Budapest, pp. 267–281. A.W. Hughes, M.L. King / Journal of Statistical Planning and Inference 115 (2003) 397–411 411 Akaike, H., 1974. A new look at statistical model identi6cation. IEEE Trans. Control 41, 716–723. Anraku, K., 1999. An information criterion for parameters under a simple order restriction. Biometrika 86, 141–152. Baltagi, B.H., Chang, Y.J., Li, Q., 1992. Monte Carlo results on several new and existing tests for the error component model. J. Econom. 54, 95–120. Bohrer, R., Chow, W., 1978. Weights for one-sided multivariate inference. Appl. Statist. 27, 100–104. GouriTeroux, C., Monfort, A., 1995. Statistics and Econometrics Models, Vol. 2 (translated by Q. Vuong). Cambridge University Press, Cambridge. Gupta, S.S., 1963. Probability integrals of multivariate normal and multivariate t. Ann. Math. Statist. 34, 792–838. King, M.L., Evans, M.A., 1984. A joint test for serial correlation and heteroscedasticity. Econom. Lett. 16, 297–302. Kodde, D.A., Palm, F.C., 1986. Wald criteria for jointly testing equality and inequality restrictions. Econometrica 54, 1243–1248. Kudoˆ, A., 1963. A multivariate analogue of the one-sided test. Biometrika 50, 403–418. Kwek, K.T., King, M.L., 1997. Model selection and optimal penalty function estimation for conditional heteroskedastic processes. In: Bardsley, P., Martin, V.L. (Eds.). Proceedings of the Econometric Society Australasian Meeting, Melbourne University, Melbourne, Vol. 2, pp. 143–176. Kwek, K.T., King, M.L., 1998. Information criteria in conditional heteroskedastic models: a Bayesian prior approach to penalty function building. Paper presented at the 1998 Australasian Meeting of the Econometric Society, Canberra. PJotscher, B., 1991. E8ects of model selection on inference. Econom. Theory 7, 163–185. Schwarz, G., 1978. Estimating the dimension of model. Ann. Statist. 6, 461–464. Shapiro, A., 1988. Towards a uni6ed theory of inequality constrained testing in multivariate analysis. Int. Statist. Rev. 56, 49–62. Stone, M., 1979. Comments on model selection criteria of Akaike and Schwarz. J. Roy. Statist. Soc. B 41, 276–278. Wolak, F.A., 1987. An exact test for multiple inequality constraints in the linear regression model. J. Amer. Statist. Assoc. 82, 782–793. Wu, P.X., King, M.L., 1994. One-sided hypothesis testing: a survey. Pakistan J. Statist. 10, 261–300. Model selection using AIC in the presence of one-sided information Introduction Inequality constrained model selection One-sided AIC (OSAIC) Determination of the weights The Monte Carlo experiments Variable selection in the classical linear regression model Model selection in the random effects framework Results and discussion Variable selection in the classical linear regression model Model selection in the random effects framework Concluding remarks Acknowledgements References


Comments

Copyright © 2024 UPDOCS Inc.