本文列出了Stata进行双重机器学习ddml操作部分案例
完整ddml论文复刻结果等详见推文:
🔺DDML:双重机器学习(Stata中Python相关设置) (qq.com)
😆2024Stata暑假班--精彩片段--机器学习与断点回归
😅2024Stata暑假班--精彩片段--机器学习与合成控制法
🌈2024Stata暑假班--精彩片段--双重机器学习DDML
🤣机器学习&因果推断--2024Stata暑假班--精彩片段 (qq.com)
🎈AER大运河论文讲解--2024Stata暑假班--精彩片段--异质性DID专题 (qq.com)
1、查看命令版本号
which ddml
which pystacked
2、Partially linear model
2.1 ddml crossfit报错
use sipp1991.dta, clear
global Y net_tfa
global D e401
global X tw age inc fsize educ db marr twoearn pira hown
set seed 42
ddml init partial, kfolds(2)
ddml E[Y|X]: reg $Y $X
如下命令报错
ddml estimate, robust
报错提示为:
ddml model not cross-fitted; call `ddml crossfit` first
r(198);
解决方案:需要先进行ddml crossfit
ddml命令升级更新命令为
ddml update
help ddml
use sipp1991.dta, clear
. global Y net_tfa
. global D e401
. global X tw age inc fsize educ db marr twoearn pira hown
. set seed 42
. ddml init partial, kfolds(2)
warning - model m0 already exists
all existing model results and variables will
be dropped and model m0 will be re-initialized
. ddml E[Y|X]: reg $Y $X
Learner Y1_reg added successfully.
. ddml E[Y|X]: pystacked $Y $X, type(reg) method(rf)
Learner Y2_pystacked added successfully.
. ddml E[D|X]: reg $D $X
Learner D1_reg added successfully.
. ddml E[D|X]: pystacked $D $X, type(reg) method(rf)
Learner D2_pystacked added successfully.
. ddml desc
Model: partial, crossfit folds k=2, resamples r=1
Dependent variable (Y): net_tfa
net_tfa learners: Y1_reg Y2_pystacked
D equations (1): e401
e401 learners: D1_reg D2_pystacked
Specifications: 4 possible specs
. ddml estimate, robust
ddml model not cross-fitted; call `ddml crossfit` first
r(198);
. ddml crossfit
Cross-fitting E[y|X] equation: net_tfa
Cross-fitting fold 1 2 ...completed cross-fitting
Cross-fitting E[D|X] equation: e401
Cross-fitting fold 1 2 ...completed cross-fitting
. ddml estimate, robust
DDML estimation results:
spec r Y learner D learner b SE
opt 1 Y2_pystacked D1_reg 7044.518(1126.896)
opt = minimum MSE specification for that resample.
Min MSE DDML model
y-E[y|X] = Y2_pystacked_1 Number of obs = 9915
D-E[D|X,Z]= D1_reg_1
------------------------------------------------------------------------------
| Robust
net_tfa | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
e401 | 7044.518 1126.896 6.25 0.000 4835.843 9253.193
_cons | -317.8379 352.8666 -0.90 0.368 -1009.444 373.768
------------------------------------------------------------------------------
. ddml estimate, robust allcombos
DDML estimation results:
spec r Y learner D learner b SE
1 1 Y1_reg D1_reg 5397.208(1130.776)
2 1 Y1_reg D2_pystacked 6705.740 (878.656)
* 3 1 Y2_pystacked D1_reg 7044.518(1126.896)
4 1 Y2_pystacked D2_pystacked 6979.699 (753.471)
* = minimum MSE specification for that resample.
Min MSE DDML model
y-E[y|X] = Y2_pystacked_1 Number of obs = 9915
D-E[D|X,Z]= D1_reg_1
------------------------------------------------------------------------------
| Robust
net_tfa | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
e401 | 7044.518 1126.896 6.25 0.000 4835.843 9253.193
_cons | -317.8379 352.8666 -0.90 0.368 -1009.444 373.768
------------------------------------------------------------------------------
. ddml estimate, robust spec(1) replay
DDML estimation results:
spec r Y learner D learner b SE
opt 1 Y2_pystacked D1_reg 7044.518(1126.896)
opt = minimum MSE specification for that resample.
DDML model, specification 1
y-E[y|X] = Y1_reg_1 Number of obs = 9915
D-E[D|X,Z]= D1_reg_1
------------------------------------------------------------------------------
| Robust
net_tfa | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
e401 | 5397.208 1130.776 4.77 0.000 3180.928 7613.488
_cons | -104.854 397.9023 -0.26 0.792 -884.728 675.0201
------------------------------------------------------------------------------
. webuse cattaneo2, clear
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138–154)
. global Y bweight
. global D mbsmoke
. global X prenatal1 mmarried fbaby mage medu
. set seed 42
. ddml init interactive, kfolds(5) reps(5)
warning - model m0 already exists
all existing model results and variables will
be dropped and model m0 will be re-initialized
. ddml E[Y|X,D]: pystacked $Y $X, type(reg) methods(ols gradboost)
Learner Y1_pystacked added successfully.
. ddml E[D|X]: pystacked $D $X, type(class) methods(logit gradboost)
Learner D1_pystacked added successfully.
. ddml crossfit
. ddml estimate
案例3、双重机器学习--工具变量法估计
. use AJR.dta, clear
global Y logpgp95
global D avexpr
global Z logem4
global X lat_abst edes1975 avelf temp* humid* steplow-oilres
set seed 42
ddml init iv, kfolds(30)
ddml E[Y|X]: reg $Y $X
ddml E[Y|X], vtype(none): rforest $Y $X, type(reg)
ddml E[D|X]: reg $D $X
ddml E[D|X], vtype(none): rforest $D $X, type(reg)
ddml E[Z|X]: reg $Z $X
ddml E[Z|X], vtype(none): rforest $Z $X, type(reg)
ddml crossfit, shortstack
ddml estimate, robust