Time series experiments, in which experimental units receive a sequence of treatments over time, are frequently employed in many technological companies to evaluate the performance of a newly developed policy, product, or treatment relative to a baseline control. Many existing A/B testing solutions assume a fully observable experimental environment that satisfies the Markov condition, which often does not hold in practice.
This paper studies the optimal design for A/B testing in partially observable environments. We introduce a controlled (vector) autoregressive moving average model to capture partial observability. We introduce a small signal asymptotic framework to simplify the analysis of asymptotic mean squared errors of average treatment effect estimators under various designs. We develop two algorithms to estimate the optimal design: one utilizing constrained optimization and the other employing reinforcement learning. We demonstrate the superior performance of our designs using a dispatch simulator and two real datasets from a ride-sharing company.
嘉宾介绍
Ke Sun is a Postdoctoral Fellow in the Department of Statistics at Harvard University. He earned his Ph.D. in Statistical Machine Learning from the University of Alberta, advised by Prof. Linglong Kong. During his Ph.D., he visited the London School of Economics and Political Science, advised by Prof. Chengchun Shi. His research focuses on reinforcement learning, and he has published several papers in top ML conferences such as NeurIPS and ICLR.