MNAR又称不可忽略的缺失(non-ignorable missing,NIM),被定义为试验数据出现缺失的概率与参与者未观测到的变量相关。如一项关于药物治疗抑制肿瘤生长的临床试验,参与者在前几次随访时均展现出理想的治疗效果,但在下一次随访前突然病情恶化而退出试验,导致数据的缺失,而此种情况导致的数据缺失机制即为MNAR。通常认为,当无法判断缺失数据是MCAR或MAR的时候,可认为数据缺失为MNAR并进行相应的处理,或做敏感性分析以辅助评估对缺失数据机制假设的稳健性及可靠性。但实际上,即使可通过比较分析或构建模型以区分MCAR与MAR,当前依然无法准确判断缺失数据属于MAR还是MNAR。对于MAR与MNAR的区分仍主要依赖于临床科研工作者的专业判断,且只能在某种程度上“猜测”缺失数据是否属于MNAR。
在临床试验中,一般会根据发生伴发事件后疗效的结果的不同假设构建不同的联合分布。常见的方法Return to Baseline(R2B)、Jump to Reference(J2R)、Copy Reference(CR),Copy Increment in Reference(CIR)、Unconditional refence(UR)。
Return to Baseline(R2B)适合一些停止治疗后疗效指标迅速回到起始水平的短效药物。
Jump to Reference(J2R)适用于加载试验中,停用了试验药物但依然继续使用标准治疗,疗效指标变化至与参照组相同的水平;也适用于失访。
Copy Reference(CR)适用于必须完成所有方案中规定的治疗才能观测到疗效指标变化的情况。
Copy Increment in Reference(CIR)适合停止治疗后,已有的疗效结果不会立即消失,但也不会有进一步改善的情景,会缓慢的退化。 适用于退化性疾病,像阿尔茨海默病。
Unconditional refence(UR)适用于因不良事件停止和失访,Jump to Reference(J2R)也适用。
PMM 框架中基于对照的模式插补,由于临床上合理、透明且易于实施而获得广泛接受和使用。基于对照的模式插补最初由 Little R 等人(Little 和 Yau,1996 年)提出,基于“as treated”模型的想法,该模型根据患者在退出后接受的实际治疗和剂量来估算缺失值,探索的假设之一是drop-out后恢复为 control group。因此,假设在退出后,试验组中未观察到的值遵循对照组中观察到的值的路径。基于这一假设,仅使用对照组中的观测值来推导出参数的后验分布,从中估算对照组和试验组中的缺失值。这种方法是保守的,因为它倾向于减少试验组和对照组之间的差异,但不是非常保守,因为它仍然允许通过使用试验组中的先验观察值作为预测因子来产生结转效应。
MNAR 语句中的选项 MODEL 允许指定用于对缺失值分布进行建模的观测子集。基于对照的模式插补是通过指定对照组作为观察的子集来实现的。基于对照的模式插补是通过指定对照组作为观察的子集来实现的。
下面介绍几种Sas code的实现:
1.Ratitch和O'Kelly,2011的Sas code :
data DATAIN_MONO_IMP1 DATAIN_MONO_REST1;
set DATAIN_MONO;
if TRT=1 and LASTVIS >=1 then output DATAIN_MONO_REST1;
else output DATAIN_MONO_IMP1;
run;
proc mi data=DATAIN_MONO_IMP1 out=DATAIN_REG_IMP1 nimpute=1 seed=234;
by _Imputation_;
var SCORE_0 SCORE_1;
monotone reg(SCORE_1);
run;
data DATAIN_IMP1;
set DATAIN_MONO_REST1 DATAIN_REG_IMP1;
run;
data DATAIN_MONO_IMP2 DATAIN_MONO_REST2; set DATAIN_IMP1;
if TRT=1 and LASTVIS >=2 then output DATAIN_MONO_REST2;
else output DATAIN_MONO_IMP2;
run;
proc sort data= DATAIN_MONO_IMP2; by _Imputation_; run;
proc mi data=DATAIN_MONO_IMP2 out=DATAIN_REG_IMP2
nimpute=1 seed=345;
by _Imputation_;
var SCORE_0 SCORE_1 SCORE_2;
monotone reg(SCORE_2);
run;
data DATAIN_IMP2;
set DATAIN_MONO_REST2 DATAIN_REG_IMP2;
run;
data DATAIN_MONO_IMP3 DATAIN_MONO_REST3; set DATAIN_IMP2;
if TRT=1 and LASTVIS=3 then output DATAIN_MONO_REST3;
else output DATAIN_MONO_IMP3;
run;
proc sort data= DATAIN_MONO_IMP3; by _Imputation_; run;
proc mi data= DATAIN_MONO_IMP3 out= DATAIN_REG_IMP3 nimpute=1 seed=456;
by _Imputation_;
var SCORE_0 SCORE_1 SCORE_2 SCORE_3;
monotone reg(SCORE_3);
run;
data DATAIN_IMP3;
set DATAIN_MONO_REST3 DATAIN_REG_IMP3;
run;
2.2019年Li的 Sas code,请查看前一篇文章
模式混合模型
渣渣东,公众号:流行病学与卫生统计学模式混合模型(PMM)的Sas code和SAP撰写
3.GSK 5 Marco(one-ste
p就是joint mode
l
的意思):
The one-step macros.
* Define problem and check data
using separate covariances;
%part1A(Jobname=ConfM
,Data=Anal1
,Subject=Patient
,Response=change
,Time=Visit
,Treat=Therapy
,CatcovbyTime=PoolInv
,Covbytime=Basval
,Covgroup=Therapy
,ID=ISAE);
* Building MCMC sample;
%part1B(Jobname=ConfM
,Ndraws=2000
,thin=100
,seed=3215
,ods=listing);
* Build the imputed values;
%part2A(Jobname=ConfM_MAR
,inname=ConfM
,method=MAR
);
%part2b(Jobname=ConfM_MAR
,seed=834416
,debug=1
);
* Run the analysis section, using one variance in ANOVA;
%part3(Jobname=ConfM_MAR
,anref=PLACEBO
,label=MAR
,ods=listing
,ANCovgroup=1
);
Sequential MI macro: assume MAR
%delta_pmm(datain=diapsi
,trtname=trt
,subjname=patient
,visname=visit
,poolsite=poolinv
,basecont=%str(basval)
,baseclass=%str()
,postcont=%str(hamdtl17)
,postclass=%str()
, seed=34535499
,nimp=1000
,deltavis=%str(all)
,deltacont=0
,deltacontarm=2
,deltacontmethod=meanabs
,favorcont=low
,primaryname=hamdtl17
,analcovarcont=%str(basval)
,analcovarclass=%str(poolinv)
,trtref=1
,analmethod=ancova
,repstr=
,dataout=PSI2013_SMAR_datafull
,resout=PSI2013_SMAR_results
,fulltrtbytime=Y
,lsmopt=%str(atBasval=&Meanbase)
);
The one-step macros. Easy to change method. Use CR instead.
%part2A(Jobname=ConfM_CR
,inname=ConfM
,method=CR
,ref=PLACEBO)
%part2b(Jobname=ConfM_CR
,seed=832216)
%part3(Jobname=ConfM_CR
,Label=CR to placebo single covariance
,ods=listing);
No need to run to re-run the MCMC steps in part1B. “Inname” links to previous results.
Sequential MI macro: impute CR
%cbi_pmm(datain=diapsi
,trtname=trt
,subjname=patient
,visname=visit
,poolsite=poolinv
,basecont=%str(basval)
,baseclass=%str()
,postcont=%str(hamdtl17)
,postclass=%str()
, seed=34535499
,nimp=1000
,primaryname=hamdtl17
,analcovarcont=%str(basval)
,analcovarclass=%str(poolinv)
,trtref=1
,analmethod=ancova
,repstr=
,dataout=PSI2013_SCR_datafull
,resout=PSI2013_SCR_results
,fulltrtbytime=Y
,lsmopt=%str(atBasval=&Meanbase)
);
Sequential MI: impute UR
DATAimpute&visitREST&visit;
SET strtimp&visit;
* Put to one side subjects from experimental arm (trt=2) that have data at time-point &visit;
IF trt= 2 AND LASTVIS >=&visit
THEN
OUTPUT rest&visit;
* Select all other subjects for the time-point &visit imputation step;
ELSE
OUTPUT impute&visit;
RUN;
PROC MI DATA=impute&visitOUT=imputed&visitNIMPUTE=1 SEED=34535499;
CLASS poolinv;
BY _Imputation_;
VAR basvalpoolinv&var._&visit;
MONOTONE REGRESSION(&var._&visit= basvalpoolinv);
RUN;
dataSTRTIMP%eval(&visit+1);
set REST&visitIMPUTED&visit;
run;
proc sort data=STRTIMP%eval(&visit+1);
by _Imputation_;
run;
%part2A(Jobname=ConfM_UR
,inname=ConfM
,method=CR
,VCMethod=Zero
,ref=PLACEBO)
%part2b(Jobname=ConfM_UR
,seed=832216)
%part3(Jobname=ConfM_UR
,anref=PLACEBO
,Label=CR to placebo single variance non-conditional
,ods=listing
,ANCovgroup=1);
Now use the CIR method on same sample
%part2A(Jobname=PSI_CIR, INName=PSI, Method=CIR, Ref=1);
%part2b(Jobname=PSI_CIR);
%part3(Jobname=PSI_CIR);
4.Return-to-Baseline Imputation with MNAR
*** Create temporary records for BASELINE pattern;
data data_PSI0_1; set data_PSI0;
output;
base_pattern=1;
array postbase (4) hamdtl17_1-hamdtl17_4;
do i=1 to 4;
postbase[i]=hamdtl17_0;
end;
output;
run
*** Impute each post-baseline variable from baseline distribution;
proc mi data=data_PSI0_1seed=52387 nimpute=1 out=data_hor_rtb;
by _Imputation_;
class gender base_pattern;
var gender hamdtl17_2 - hamdtl17_4;
monotone reg (hamdtl17_2 = gender);
mnar model(hamdtl17_2 / modelobs=(base_pattern='1'));
monotone reg (hamdtl17_3 = gender);
mnar model(hamdtl17_3 / modelobs=(base_pattern='1'));
monotone reg (hamdtl17_4 = gender);
mnar model(hamdtl17_4 / modelobs=(base_pattern='1'));
run;
*** Remove temporary records prior to analysis ***;
data data_hor_rtb1; set data_hor_rtb;
if base_pattern=1 then delete;
run;
SAP模板:
1.Reference-based controlled imputation - copy reference [CR] using a joint imputation model for repeated measurements
A reference-based multiple imputation approach, copy reference (CR), will be used as a sensitivity analysis to consider a missing-not-at-random (MNAR) mechanism for monotone missing data. Mean changes from baseline in MEASURE1 will be analyzed based on data observed while the subject remains on study as well as data imputed using multiple imputation (MI) methodology for time points at which no value is observed.
Imputation of values in the reference (control) arm and any intermediate missing values will assume missing-at-random (MAR). Imputation of values in the experimental arm(s) will be done as if the subject had been a member of the reference arm both pre- and post-withdrawal. Their observed data will however count as coming from the experimental arm for the purpose of parameter estimation prior to imputation. This is a conditional approach.
{ MI assuming MAR using a joint imputation model for repeated measurements” to describe the joint model for repeated measures.}
{Describe number of imputation, random seeds, analysis model, and pooling as for “MI assuming MAR using a joint imputation model for repeated measurements”.}
{Note: The CR approach is inherently a conditional approach and as such is most naturally implemented using a sequential regression method (as described in the next section). Some software tools intended for marginal approaches may also implement a version of CR.}
2.Reference-based controlled imputation - copy reference [CR] based on sequential regression MI
A reference-based multiple imputation approach, copy reference (CR), will be used as a sensitivity analysis to consider a missing-not-at-random (MNAR) mechanism for monotone missing data. Mean changes from baseline in MEASURE1 will be analyzed based on data observed while the subject remains on study as well as data imputed using multiple imputation (MI) methodology for time points at which no value is observed.
Imputation of values in the reference (control) arm will assume MAR. Imputation of values in the experimental arm(s) will be done as if the subject had been a member of the reference arm. Imputed values in the experimental arm will be sampled using theimputation model of the reference arm, i.e., conditional on subject values observed at time points prior to discontinuation relative to the mean of the model for the reference arm.This approach does not assume a sustained benefit of experimental treatment after discontinuation and limits a post-discontinuation effect to that of reference drug and trial effect as reflected in estimated correlations between time points in the reference arm.
Intermittent (non-monotone) missing data will be imputed first based on the MAR assumption and a multivariate joint Gaussian imputation model using Markov chain Monte Carlo (MCMC) method within each treatment arm. {Insert description similar to one for MAR-based MI.}.
The remaining, monotone missing data for all subjects who discontinue study prematurely will be imputed using sequential regression multiple imputation model estimated based on data from the reference arm only. Each sequential regression model (i.e., for imputation of values at a given time point) will include explanatory variables for {list of baseline covariates, and} all previous (Baseline, Visit x,…,y) values of MEASURE1. Missing values at a given time point in reference and experimental arms will be imputed from the same imputation model, conditional on subject values observed or imputed at previous time points.
No rounding or range restrictions will be applied to imputed continuous values.
{Describe number of imputation, random seeds, analysis model, and pooling as for MAR-based MI above.}
3.Reference-based controlled imputation - jump to reference [J2R] using a joint imputation model for repeated measurements
A reference-based multiple imputation approach, jump to reference (J2R), will be used as a sensitivity analysis to consider a missing-not-at-random (MNAR) mechanism for monotone missing data. Mean changes from baseline in MEASURE1 will be analyzed based on data observed while the subject remains on study as well as data imputed using multiple imputation (MI) methodology for time points at which no value is observed.
Imputation of values in the reference (control) arm will assume MAR. For imputation of values in the experimental arm(s), it will be assumed that subjects who discontinue the study early immediately adopt a distribution with predicted mean values at future visits similar to subjects in the reference arm. This is a marginal model.
This approach assumes that any effect of experimental treatment observed prior to discontinuation immediately disappears after discontinuation. Subjects’ measurements observed prior to discontinuation are taken into account to establish subject’s “differences” from the mean outcomes of their randomized group at their respective time points. The magnitude of subject’s residuals relative to the predicted mean of their randomized arm are used before withdrawal, while the imputed residuals after withdrawal center on the predicted mean of the reference (control) arm, so that, for a subject who was worse (better) than average in the experimental arm will have imputed values worse (better) than average in the reference (control) arm.
Multiple imputation will be performed by first estimating parameters of a Multivariate Normal model for repeated measures under the assumption of missing-at-random and obtaining multiple Bayesian posterior samples of model parameters. Then the imputation model for each imputed dataset will be formed using the sampled model parameters and a modified mean allocation, where after withdrawal, parameters representing visit means for subjects in the {active} arm will be set to the means estimated for the reference arm. This approach will be implemented using {software, version}.
The Multivariate Normal model for MEASURE1 across {Visit x,…,y} will include the fixed, categorical effects of treatment, {list of baseline covariates}, visit, and treatment-by-visit interaction, as well as the continuous, fixed covariates of baseline score and baseline score-by-visit-interaction. A single shared unstructured covariance matrix will be used. {Explain if any aspect is different from model for primary analysis}.
The MCMC method will be used with single chain, 2000 tuning units and a minimum number of 4 tuning cycles, a burn of 1000, and a thin of 100 and non-informative priors for all parameters.
{Describe number of imputation, random seeds, analysis model, and pooling as for MAR-based MI above.}
4.Reference-based controlled imputation - jump to reference [J2R] based on sequential regression MI using residuals
A reference-based multiple imputation approach, jump to reference (J2R), will be used as a sensitivity analysis to consider a missing-not-at-random (MNAR) mechanism for monotone missing data. Mean changes from baseline in MEASURE1 will be analyzed based on data observed while the subject remains on study as well as data imputed using multiple imputation (MI) methodology for time points at which no value is observed.
Imputation of values in the reference (control) arm will assume MAR. For imputation of values in the experimental arm(s), it will be assumed that subjects who discontinue the study early immediately adopt a distribution of values at future visits similar to subjects with observed data in the reference arm.
This approach assumes that any effect of experimental treatment observed prior to discontinuation immediately disappears after discontinuation. Subjects’ measurements observed prior to discontinuation are taken into account to establish subject’s “differences” from the mean outcomes of their randomized group at their respective time points. The magnitude of subject’s residuals relative to their own predicted mean are used to predict forwards so that, for a subject who was worse (better) than average in the experimental arm will have imputed values worse (better) than average in the reference (control) arm.
When imputing the outcomes at a given visit the residuals from the imputation models of previous post-baseline visits are used as explanatory variables. This is in contrast to the standard sequential regression method where the absolute value, not residual, are used from previous visits.
The treatment group is included in the imputation model and this variable alone estimates the difference between treatment groups at each visit, while correlations between visits are modeled similarly for all subjects.
For subjects from the experimental arm who withdraw from study, the value of this treatment group variable is switched to be that of the reference treatment group for visits that are imputed.
Intermittent (non-monotone) missing data will be imputed first based on the MAR assumption and a multivariate joint Gaussian imputation model using Markov chain Monte Carlo (MCMC) method within each treatment arm. {Insert description similar to one for MAR-based MI.}.
The remaining, monotone missing data for all subjects who discontinue study early will be imputed using sequential regression multiple imputation on residuals as follows, iterating through post-baseline time points j=1,…,J.
In the remainder of this section, when referring to a “regression imputation model” we mean a regression model corresponding to a Bayesian draw of imputation model parameters. The steps outlined below will be performed for each of the multiply imputed replicates of the data based on a different Bayesian draw of model parameters. All steps will be performed with the treatment group variable set to the value of the reference arm for all subjects at the visits for which their values are imputed in order to enable imputation of values in the experimental treatment arm from the distribution of the reference arm.
Imputation Step for the First Post-baseline Visit: Impute missing values at time point j=1 in both treatment arms using regression imputation model, including explanatory variables for treatment, {list of baseline covariates} and baseline value of MEASURE1.
Residual Calculation Step (for each post-baseline visit j=1,…,J):For each subject, compute the residual with respect to the imputation model estimated for the imputation of visit j. This residual will be used as an explanatory variable at the imputation steps for subsequent visits. The predicted mean from the model used to calculate the residual is obtained by conditioning on the subject’s values of explanatory variables, including treatment group. The residual is obtained by subtracting from the predicted mean the observed value of MEASURE1 at visit j if available, and if not available by subtracting the value of MEASURE1 that was imputed in the Imputation Step.
Imputation Step (for each post-baseline visit j=2,…,J): Perform imputation in a similar manner as for the Imputation Step for the First Post-baseline Visit, but additionally use the residuals calculated in the Residual Calculation Step at each previous post-baseline visit as explanatory variables in the imputation model.
Repeat the Residual Calculation Step and the Imputation Step until all visits are imputed.
No rounding or range restrictions of will be applied to imputed continuous values.
{Describe number of imputation, random seeds, analysis model, and pooling as for MAR-based MI above.}
{Note: For standard MAR-based sequential regression, regressing on previous observed value or previous residuals gives the same numerical answer. Custom has been to regress on previous observed value as this is more readily available, and this is the method programmed into standard software such as the SAS MI procedure.}
Examples of possible variations:
As an alternative to “The treatment group is included in the imputation model and this variable alone estimates the difference between treatment groups at each visit, while correlations between visits are modeled separately for each arm”, separate correlations can be used for each arm.
In the “Imputation Step (for each post-baseline visit j=2,…,J)”: residuals calculated in the Residual Calculation Step at each previous post-baseline visit as and used as explanatory variables in the imputation model can be crossed with treatment.
5.Reference-based controlled imputation – copy increment from reference [CIR] using a joint imputation model for repeated measurements
A reference-based multiple imputation approach, copy increment from reference (CIR), will be used as a sensitivity analysis to consider a missing-not-at-random (MNAR) mechanism for monotone missing data. Mean changes from baseline in MEASURE1 will be analyzed based on data observed while the subject remains on study as well as data imputed using multiple imputation (MI) methodology for time points at which no value is observed.
Imputation of values in the reference (control) arm will assume MAR. For imputation of values in the experimental arm(s), it will be assumed that subjects who discontinue the study early immediately adopt a distribution with predicted mean values at future visits where change in mean from visit to visit is similar to that for those subjects in the reference arm. This method is based on a marginal imputation model approach.
This approach assumes that any effect of experimental treatment observed prior to discontinuation continues after discontinuation, but that there is no further additional effect because the subject has stopped treatment. This addresses a de facto or effectiveness based estimand. Subjects’ measurements observed prior to discontinuation are taken into account to establish subject’s “differences” from the mean outcomes of their randomized group at their respective time points.
Multiple imputation will be performed by first estimating parameters of a Multivariate Normal model for repeated measures under the assumption of missing-at-random and obtaining multiple Bayesian posterior samples of model parameters. Then the imputation model for each imputed dataset will be formed using the sampled model parameters and a modified mean allocation, where after withdrawal, parameters representing visit means for subjects in the {active} arm will be set to the mean of the withdrawer’s own treatment arm at the last observed visit plus an increment from that last observed visit to the imputed visit as estimated for the reference arm. This approach will be implemented using {software, version}.
The Multivariate Normal model for MEASURE1 across {Visit x,…,y} will include the fixed, categorical effects of treatment, {list of baseline covariates}, visit, and treatment-by-visit interaction, as well as the continuous, fixed covariates of baseline score and baseline score-by-visit-interaction. A single shared unstructured covariance matrix will be used. {Explain if any aspect is different from model for primary analysis}.
The MCMC method will be used with single chain, 2000 tuning units and a minimum number of 4 tuning cycles, a burn of 1000, and a thin of 100 and non-informative priors for all parameters.
{Describe number of imputation, random seeds, analysis model, and pooling as for MAR-based MI above.}
还有不常用的OFCMCF和OLMCF的模板,有需要的可以后台留言。
6.Reference-based controlled imputation – own first conditional mean carried forward [OFCMCF] using a joint imputation model for repeated measurements
7.Reference-based controlled imputation – own last mean carried forward [OLMCF] using a joint imputation model for repeated measurements
Now use the Own LMCF with Delta method on same sample;
%part2A(Jobname=PSI_Delta, INName=PSI, Method=OLMCF);
%part2b(Jobname=PSI_Delta);
%part3(Jobname=PSI_Delta, Delta= 0 0.15 0.15 0.15), ;
Take home message:
1. 假设伴发事件发生后,受试者接受对照治疗。这类假设在试验实施中可能不被伦理接受,因此需要通过假想策略来分析。常用的方法是基于对照的PMM,假设发生伴发事件后的分布与对照组相似。针对不同的假设情景,可以采用不同的填补方法,如jump to reference (J2R)、copy reference (CR)等。
3. %delta_pmm和%cbi_pmm对应基于 delta 调整的PMM以及基于对照的PMM。GSK的five Macro 不适用于非正态数据和time to event数据,详细细节可以到http://www.missingdata.org.uk加载数据和Marco rerun。
参考文献:
1.http://www.missingdata.org.uk-SAP Text_Describing_analyses_Final_2016-08-11
2.EMA. Guideline on missing data in confirmatory clinical trial
3.ICH. Estimands and Sensitivity Analysis in Clinical Trials
4.LITTLE R J A,RUBIN D B. Statistical analysis with missing data[M]. 2nd edition. Hoboken: Wiley,2002: 25-30.
5.BERGLUND P,HEERINGA S G. Multiple imputation of missingdata using SAS[M]. Kerry: SAS Institute,2014: 1-4.
6.BUUREN S V,BRAND J P L,GROOTHUIS-OUDSHOORN C G M,et al. Fully conditional specification in multivariate imputation[J]. J Stat Comput Sim,2006,76(12) : 1049-1064.
7.AZUR M J,STUART E A,FRANGAKIS C,et al. Multiple imputation by chained equations: what is it and how does it work? [J]. Int J Methods Psychiatr Res,2011,20(1) : 40-49.
8.RATITCH B,O'KELLY M,TOSIELLO R. Missing data in clinical trials: from clinical assumptions to statistical analysis using pattern mixture models[J]. Pharm Stat,2013,12(6) : 337-347.