内生性之应对（下）：方法篇--遗漏变量-反向因果-测量误差-自选择

学术 2024-10-17 10:00 山西

🍓 课程推荐：2024 机器学习与因果推断专题
主讲老师：司继春；张宏亮
课程时间：2024 年 11 月 9-10 日；16-17日
课程咨询：王老师 18903405450（微信）

温馨提示： 文中链接在微信中无法生效。请点击底部「阅读原文」。或直接长按/扫描如下二维码，直达原文：

作者：郭佳佳 (中山大学)
E-Mail:guojj37@mail2.sysu.edu.cn

Source: Hill, A. D., S. G. Johnson, L. M. Greco, E. H. O’Boyle,S. L. Walter, 2021, Endogeneity: A review and agenda for the methodology-practice divide affecting micro and macro research, Journal of Management, 47 (1): 105-143. -Link-, -PDF-, PDF2, Appendix, -cited-

上一篇推文已经详细论述内生性的定义及四种来源，本篇推文在此基础上对每一种内生性问题总结研究中常用的解决方法。

1. 遗漏变量

1.1. 通过设计避免/最小化内生性风险：

实验室实验-将参与者随机分为实验组和对照组。对照组不变，操纵实验组。必须能够操纵预测变量并随机分配组。可行性低，缺乏外部有效性和普遍性，基本很难实现。

`相关文献：Griffin, R., & Kacmar, K. M. 1991. Laboratory research in management: Misconceptions and missed opportunities.Journal of Organizational Behavior, 12: 301-311.-PDF-

田野实验-在自然环境中进行，以增加参与者的外部有效性。研究者在实验组中操纵预测变量缺乏随机分配增加了替代解释的威胁。

`相关文献：Podsakoff, P. M., & Podsakoff, N. P. (2018). Experimental designs in management and leadership research: Strengths, limitations, and recommendations for improving publishability. The Leadership Quarterly.-PDF-

自然实验-寻找自然状态下产生的近似随机试验，创建处理组和对照组，通常不被研究者操纵。对照组和处理组可能在系统方式上不同。

`相关文献：Grant, A. M., & Wall, T. D. 2009. The neglected science and art of quasi-experimentation: Why-to, when-to, and how-to advice for organizational researchers. Organizational Research Methods, 12: 653-686.-PDF-

`Greenberg, J., & Tomlinson, E. C. 2004. Situated experiments in organizations: Transplanting the lab to the field. Journal of Management, 30: 703-724.-PDF-

准实验-通过分析干预或意外的外生性事件发生前后的数据，来建立因果关系。

`相关文献：Anderson-Cook, C. M. (2005). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Journal of the American Statistical Association, 100(470), 708–708.-PDF-

1.2. 增加控制/代理变量

有些遗漏变量可能不可用或不可观察。如果遗漏变量不可用，可以考虑使用代理变量替换遗漏变量，所以现在

例如，代表员工能力的可能是教育水平，代表公司公共关系能力的可能是公司总部所在的媒体市场规模。代理变量需与具有相关关系，但是用测量也会包含某种测量误差,即

其中是关系的强度，是无法解释的部分。须使用理论和先前的研究表明与相关来证明代理变量的合理性（未观察到的甚至可以用多个代理变量，...来代替）。需注意的是：随意包含控制变量也会造成一定偏差。

`相关文献：Becker, T. E. (2005). Potential Problems in the Statistical Control of Variables in Organizational Research: A Qualitative Analysis With Recommendations. Organizational Research Methods, 8(3), 274–289. -Link-, -PDF-

`Bernerth, Jeremy B.; Aguinis, Herman (2016). A Critical Review and Best-Practice Recommendations for Control Variable Usage. Personnel Psychology, 69(1), 229–283. -PDF-

`James A. Breaugh (2008). Important considerations in using statistical procedures to control for nuisance variables in non-experimental studies. , 18(4), 282–293.-PDF-

`Peter A. Frost (1979). Proxy Variables and Specification Bias. The Review of Economics and Statistics, 61(2), 323–325.-Link-, -PDF-

`B. T. McCallum (1972). Relative Asymptotic Bias from Errors of Omission and Measurement. Econometrica, 40(4), 757–758.-PDF-

`Spector, P. E.; Brannick, M. T. (2011). Methodological Urban Legends: The Misuse of Statistical Control Variables. Organizational Research Methods, 14(2), 287–305.-PDF-

1.3. 固定效应

如果理论或证据表明遗漏变量在一个群体内是恒定的或是随时间不变的，那么用个体固定效应来估计模型可以解决这个问题，以解释未观察到的异质性。

例如，在同一个行业中，所有公司对某个行业的感知来说可能是相同的，那么固定行业效应则可以解决这一问题。或者如果有理论或证据表明遗漏变量（例如企业能力）并不随时间发生显著变化，那么对于企业的固定效应也可以解决问题。

注意事项:首先，固定效应并不能解决所有的内质性问题，但在所有具有相同固定效应的观测中，忽略的变量是恒定的情况下，它们确实有效(Antonakis, Bastardoz，& Rönkkö， 2019-PDF-)。

其次，固定效应分析评估的是组内效应，而非组间效应。例如，固定效应可以解释企业声誉的变化如何影响企业绩效;他们无法解释为什么一些公司的表现与其他公司不同。

Bliese等人(2020)对固定效应的局限性和潜力进行了全面的回顾，同时也注意到控制住组内均值的随机效应模型可以代替固定效应模型，而且可以得到组内效应和组间效应的无偏估计。-PDF-

John; Bastardoz, Nicolas; Rönkkö, Mikko (2019). On Ignoring the Random Effects Assumption in Multilevel Models: Review, Critique, and Recommendations. Organizational Research Methods, (), 109442811987745–.-PDF-

Bliese, Paul D.; Schepker, Donald J.; Essman, Spenser M.; Ployhart, Robert E. (2019). Bridging Methodological Divides Between Macro- and Microresearch: Endogeneity and Methods for Panel Data. Journal of Management, (), 014920631986801–.-PDF-

相关推文：固定效应还是随机效应？

Stata：固定效应分析新命令-sumhdfe
FE！FE！面板固定效应模型：你用对了吗
Stata：双向固定效应模型中是否要控制公司年龄？

1.4. 工具变量

使用外生的工具变量拟合我们所关注的内生核心解释变量，一定程度上可以缓解内生变量的估计偏误。

工具变量设定检验-对工具变量的假设条件进行检验。工具变量必须满足（1）与内生变量相关（2）仅通过内生变量与 Y 间接相关。

相关文献：Baum, Christopher F.; Schaffer, Mark E.; Stillman, Steven (2003). Instrumental Variables and GMM: Estimation and Testing. The Stata Journal: Promoting communications on statistics and Stata, 3(1), 1–31.-PDF-

R. L. Basmann (1960). On Finite Sample Distributions of Generalized Classical Linear Identifiability Test Statistics. Journal of the American Statistical Association, 55(292), 650–659.-PDF-

Lars Peter Hansen (1982). Large Sample Properties of Generalized Method of Moments Estimators. Econometrica, 50(4), 1029–1054.-PDF-

J. D. Sargan (1958). The Estimation of Economic Relationships using Instrumental Variables. Econometrica, 26(3), 393–415.-PDF-

Stock, James H; Wright, Jonathan H; Yogo, Motohiro (2002). A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments. Journal of Business & Economic Statistics, 20(4), 518–529.-PDF-

工具变量估计工具变量模型可以以各种方式估计，包括两阶段最小二乘法（2SLS）、三阶段最小二乘法（3SLS）、最大似然法（ML）和广义矩法（GMM）。各种估计技术在它们的效率和对各种假设的稳健性方面不同。

以两步方程法为例，第一步：用工具变量来预测内生变量

第二步：用第一步算得的预测值替换内生变量

举个例子，工具变量可以预测公司声誉，但与公司业绩无关，有关系也仅是通过对公司声誉的影响。

相关文献：(2SLS)Angrist, J. D., & Imbens, G. W. 1995. Identification and estimation of local average treatment effects. National Bureau of Economic Research.-PDF-

(GMM)Hansen, L. P. 1982. Large sample properties of generalized method of moments estimators. Econometrica: Journal of the Econometric Society, 50: 1029-1054.-PDF-

(GMM)Newey, W. K., & West, K. D. 1987. Hypothesis testing with efficient method of moments estimation. International Economic Review, 28: 777-787.-Link-,-PDF-

识别工具变量的策略

1、寻找影响内生变量的随机过程。

2、使用滞后变量。使用内生变量的滞后值作为工具变量。滞后变量必须与内生变量相关，并且与因变量无关。例如，先前的企业声誉可能影响当前的企业声誉，但与当前的企业绩效没有直接关系，但滞后变量仍必须满足工具变量的要求。

相关文献：Reed, W. R. 2015. On the practice of lagging variables to avoid simultaneity. Oxford Bulletin of Economics and Statistics, 77: 897-905.-PDF-

3、模型隐含的工具变量（model-implied instrumental variable,MIIV）。在联立方程模型的框架内，可以找到模型隐含的工具变量，依赖于现有的观测变量来创建。检验时依赖更强的模型假设。

相关文献：Bollen, K. A., & Bauer, D. J. 2004. Automating the selection of model-implied instrumental variables. SociologicalMethods & Research, 32: 425-452.-PDF-

4、外生技术Exotic Techniques。有时内生性通过假设变量和误差项的分布形式来解决。识别假设可能比传统工具所需的假设更难满足。

相关文献：Bollen, K. A. 2012. Instrumental variables in sociology and the social sciences. Annual Review of Sociology, 38:37-72.-PDF-

相关推文：Stata新命令-pdslasso：众多控制变量和工具变量如何挑选？

好IV坏控制：一个好的工具变量一定是个坏的控制变量
Stata：使用历史工具变量评估长期效应-esteta
数字经济的工具变量
IV在哪里？奇思妙想的工具变量
twostepweakiv：弱工具变量有多弱？
IV：工具变量不满足外生性怎么办？
IV：可以用内生变量的滞后项做工具变量吗？
Stata: 工具变量法 (IV) 也不难呀！

2. 反向因果simultaneity

2.1. 工具变量

上述方法也可以解决反向因果问题。但是在存在反向因果的情况下，工具变量可能更难找到。

滞后内生变量-使用滞后的内生变量。如果预测变量或因变量序列相关，则可能无法解决内生性。

相关文献：Fair, R. C. 1970. The estimation of simultaneous equation models with lagged endogenous variables and first order serially correlated errors. Econometrica: Journal of the Econometric Society, 38(3): 507-516.-PDF-

Bellemare, M. F., Masaki, T., & Pepinsky, T. B. 2017. Lagged explanatory variables and the estimation of causal effects. Journal of Politics, 79: 949-963.-PDF-

动态面板模型-如果研究使用的是面板数据，那么使用动态面板模型（Dynamic Panel Data）将会是不错的选择。动态面板模型是解释变量中包含被解释变量滞后项的模型，可以理解为过去会在某种程度上影响未来的模型，其中包括一阶差分GMM(FD-GMM) 或系统GMM (SYS-GMM)。方程中的误差项不存在序列相关。

相关文献：Arellano, M., & Bond, S. 1991. Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Review of Economic Studies, 58: 277-297.-PDF-

Ballinger, G. A. 2004. Using generalized estimating equations for longitudinal data analysis. Organizational Research Methods, 7: 127-150.-PDF-

Bergh, D. D. 1993. Don’t “waste” your time! The effects of time series errors in management research: The case of ownership concentration and research and development spending. Journal of Management, 19: 897-914.-PDF-

相关推文：面板数据

Stata：动态面板数据模型与xtabond2应用
Stata实操陷阱：动态面板数据模型

3. （二）使用外生冲击事件Exogenous Events

使用外生冲击事件来建立因果关系方向的准实验。关键的识别假设是，该事件并非预期中的。

相关文献：Angrist, J. D., & Krueger, A. B. 1999. Empirical strategies in labor economics. In Eds. A. Ashenfelter and D. Card Handbook of labor economics: 1277-1366. Elsevier.-PDF-

Angrist, J. D., & Pischke, J.-S. 2010. The credibility revolution in empirical economics: How better research design is taking the con out of econometrics. Journal of Economic Perspectives, 24(2): 3-30.-PDF-

4. 测量误差

测量误差导致内生性在实证中相对少见，如果知道测量误差的原因，就可把这一问题转化为遗漏变量问题。

4.1. 模型测量误差

使用隐变量法（SEM）来解释测量误差。在大多数情况下，测量误差的方差必须是已知且服从正态分布的。

相关文献：Bound, J., Brown, C., & Mathiowetz, N. 2001. Measurement error in survey data. In Handbook of Econometrics: 3705-3843: Elsevier.-PDF-

Griliches, Z., & Hausman, J. A. 1986. Errors in variables in panel data. Journal of Econometrics, 31: 93-118.-PDF-

Hausman, J. A. 1977. Errors in variables in simultaneous equation models. Journal of Econometrics, 5: 389-401.-PDF-

工具估计Instrumental Estimation-使用一个具有测量误差的变量作为另一个具有测量误差的变量的估计。有时也称为指标变量法（indicator variable method）。

两个变量的系统误差必须互不相关，它要求使用一个理论上不相关的变量，用相同或相似的量表、效价、参照物等来测量。由于其与预测变量和结果变量的关系被假设为零，因此任何观测到的协变都被假设为common method variance （CMV）的函数。由于标记变量是外生的，因此可以处理方法方差，或“covar-out”。这在功能上与工具变量外生的要求相同。如上所述，使用多个措施来解决一个措施中的限制可能提供证据，如果度量收敛，估计的关系对测量误差是稳健的。

相关文献：Griliches, Z. 1977. Estimating the returns to schooling: Some econometric problems. Econometrica: Journal of the Econometric Society, 45: 1-22.-Link-，-PDF-

解决CMV-旨在减少common method variance（CMV）的设计和统计技术，CMV是测量误差引起内生性的来源。偏差的方向和强度取决于数据收集策略、分析模型的类型、CMV对观察变量的对称效应以及样本数量。

相关文献：Podsakoff, P. M., MacKenzie, S. B., Lee, J.- Y., & Podsakoff, N. P. 2003. Common method biases in behavioral research: a critical review of the literature and recommended remedies. Journal of Applied Psychology, 88:879.-PDF-

Podsakoff, P. M., MacKenzie, S. B., & Podsakoff, N. P. 2012. Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63: 539-569.-PDF-

Siemsen, E., Roth, A., & Oliveira, P. 2010. Common method bias in regression models with linear, quadratic, and interaction effects. Organizational Research Methods, 13: 456-476.-PDF-

相关推文：

第三种内生性：衡量偏误(测量误差)如何解决？-eivreg-sem

5. 样本选择

5.1. Heckman 选择模型

Heckman 模型分两阶段，第一步使用 Probit 回归模型，并根据回归结果计算逆米尔斯比 (IMR)，第二步是将 IMR 带入模型进行回归。Heckman (1979) 使用了 Heckman 模型估计女性劳动供给和工资率水平。该模型在近些年的研究也有不少应用，例如：

Weigelt (2013) 研究了客户公司如何在市场安排下从供应商的 IT 能力中获得性能收益，以及公司在 IT 业务内包和外包的不同情况下，供应商的 IT 能力和公司运营能力交互效应对公司业绩的影响。文章第一阶段采用 Probit 模型对公司 IT 业务内包或是外包发生的可能性进行估计，并从中得到 IMR；第二阶段将样本拆分为内包组和外包组进行分组回归，并加入 IMR 来修正样本自选择偏差。

相关文献：Heckman J J. Sample selection bias as a specification error[J]. Econometrica: Journal of the econometric society, 1979: 153-161.-PDF-

Clougherty, J. A., Duso, T., & Muck, J. 2016. Correcting for self-selection based endogeneity in management research: Review, recommendations and simulations. Organizational Research Methods, 19: 286-347.-PDF-

相关推文：面板数据

xtheckmanfe：面板Heckman模型的固定效应估计

6. 处理组选择

6.1. 遗漏变量方法

如果内生变量是连续的并且由受试者或上下文“选择”，则用于解决遗漏变量的方法是适用的。

6.2. Heckman 处理估计

使用第一阶段 probit 模型来预测被“处理”的可能性。方程的 Mill 比率被用作样本第二阶段模型的控制变量，来估计处理效应。这个模型的一些变化是可用的，但都需要一个工具变量或其他识别假设。

相关文献：Hamilton, B. H., & Nickerson, J. A. 2003. Correcting for endogeneity in strategic management research. Strategic Organization, 1: 51-78.-PDF-

Wolfolds, S. E., & Siegel, J. 2019. Misaccounting for endogeneity: The peril of relying on the Heckman two-step method without a valid instrument. Strategic Management Journal, 40: 432-462.-PDF-

6.3. 平均处理效应估计

DID 双重差分- 对于二元处理的自选择问题上，可以用DID来计算处理组参与者的处理效果相较于控制组潜在处理效果的差异。在某些组随时间接受处理而其他组未处理的情况下，将面板数据方法应用于组平均。只有当处理组和对照组都随时间具有平行趋势时，才能避免内生性。

相关文献：Athey, S., & Imbens, G. W. 2006. Identification and inference in nonlinear difference-in-differences models.Econometrica, 74: 431-497.-PDF-

Bertrand, M., Duflo, E., & Mullainathan, S. 2004. How much should we trust differences-in-differences estimates? The Quarterly Journal of Economics, 119: 249-275.-PDF-

相关推文：

Stata：各种DID估计量的比较分析
DID新进展：异质性多期DID估计的新方法-csdid
倍分法DID

合成控制组-通过样本匹配、广义精确匹配或倾向得分匹配PSM创建对照组。只有当选择或可观察或可忽略处理的假设适用时，才能避免内生性。

相关文献：Caliendo, M., & Kopeinig, S. 2008. Some practical guidance for the implementation of propensity score matching. Journal of economic surveys, 22: 31-72.-PDF-

Li, M. 2013. Using the propensity score method to estimate causal effects: A review and practical guide.Organizational Research Methods, 16: 188-226.-PDF-

Rubin, Donald B. (2006). Matched Sampling for Causal Effects || Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome. , 10.1017/CBO9780511810725(11), 185–192.-PDF-

Stuart, E. A. 2010. Matching methods for causal inference: A review and a look forward. Statistical Science: A Review Journal of the Institute of Mathematical Statistics, 25: 1.-PDF-

相关推文：合成控制法

Stata：合成控制法介绍-synth2
Stata：合成控制法的预测区间-scpi
合成控制法简介
Stata：合成控制法程序分享
合成控制法 (Synthetic Control Method) 及 Stata实现
FAQs答疑-2021寒假-Stata高级班-Day3-连玉君-RDD-合成控制法

6.4. 断点回归RDDs

断点回归的思路是研究某一断点处的政策效应，因为在特定断点处，实验组和对照组可以认为是本质上差异不大的。断点回归分为清晰断点回归和模糊断点回归，区别是断点是否能完全分割实验组和对照组。处理组的选择必须由一个连续变量的临界值或阈值来决定，且这一连续变量在阈值附近不应出现跳跃。

相关文献：Hahn, J., Todd, P., & Van der Klaauw, W. 2001.Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica, 69: 201-209.-PDF-

Imbens, G. W., & Lemieux, T. 2008. Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142: 615-635.-PDF-

Lee, D. S., & Lemieux, T. 2010. Regression discontinuity designs in economics. Journal of Economic Literature, 48: 281-355.-PDF-

Donald L. Thistlewaite;Donald T. Campbell; (2017). Regression-Discontinuity Analysis: An Alternative to the Ex-Post Facto Experiment . Observational Studies, (), –.-PDF-

相关推文：

Stata+R：一文读懂精确断点回归-RDD
Stata：断点回归分析-RDD-文献和命令
Stata：RDD-中可以加入控制变量
Stata：时间断点回归RDD的几个要点
Stata：断点回归分析-(RDD)-文献和命令
Stata：断点回归RDD简明教程
Stata: 断点回归 (RDD) 中的平滑性检验
FAQs答疑-2021寒假-Stata高级班-Day3-连玉君-RDD-合成控制法

7. 总结

如果内生性是一种疾病，希望那些治疗它的人

（1）提供明确的诊断，具体指出内生性原因。

（2）证明选择的内生性处理方法是合理的，解决方法需与原因相匹配。参考以往实证文献而不去参考计量理论会存在一定的不足。首先，前人的实证研究可能是有缺陷的。其次，每一种方法论的选择都必须针对特定的研究背景。

（3）尽可能提高有关内生性处理的透明度，帮助读者明白该方法的充分必要性。

8. 主要参考文献

Hill A D, Johnson S G, Greco L M, et al. Endogeneity: A review and agenda for the methodology-practice divide affecting micro and macro research[J]. Journal of Management, 2021, 47(1): 105-143. -PDF-

9. 相关推文

Note：产生如下推文列表的 Stata 命令为：
lianxh 内生因果
安装最新版 lianxh 命令：
ssc install lianxh, replace

专题：论文写作

论文中因果推断的经典图形

专题：计量专题

因果推断：哪本教材适合我？
因果推断新书在线读：Causal Inference-The Mixtape

专题：IV-GMM

IV专题- 内生性检验与过度识别检验
IV专题: 内生性检验与过度识别检验

专题：内生性-因果推断

内生性！内生性！解决方法大集合
IV-面板内生性：严格外生性如何检验？
因果推断：双重机器学习-ddml
Stata：内生性随机边界模型-xtsfkk
一组动图读懂因果推断
第三种内生性：衡量偏误(测量误差)如何解决？-eivreg-sem
因果推断：混杂因素敏感性分析实操(下)-tesensitivity
因果推断：混杂因素敏感性分析理论(上)
Stata因果推断：hettreatreg-用OLS估计异质性处理效应
Stata：因果推断方法综述和Stata操作
fect：基于面板数据的因果推断（上）-T218a
fect：基于面板数据的因果推断（下）-T218b
因果推断：未测量混杂因素的敏感性分析-T249
内生性：来源及处理方法-幻灯片下载
用FE-固定效应模型能做因果推断吗？
locmtest：非线性模型的内生性检验
经典文献回顾：政策评价-因果推断的计量方法
因果推断好书：Causal-Inference-Measuring-the-Effect-of-X-on-y
Stata因果推断新书：The-SAGE-Handbook-of-Regression-Analysis-and-Causal-Inference
第三种内生性：衡量偏误(测量误差)如何检验-dgmtest？
Stata新命令：konfound - 因果推断的稳健性检验

专题：其它

50问-T2：面板数据因果推断常见问题-对话徐轶青老师

专题：分位数回归

Stata：面板分位数模型估计及内生性初探

🍉 扫码加入连享会微信群，提问交流更方便

http://mp.weixin.qq.com/s?__biz=MzU5MjYxNTgwMg==&mid=2247503083&idx=1&sn=81cd739c75b94f6adc6acd8d26bd4f2c

君泉计量

交流学习经验，探讨论文写作