论文速递 | Management Science 10月文章合集

科技   教育   2024-11-14 20:33   德国  
↑↑↑↑↑点击上方蓝色字关注我们!

推文作者:胡思行




编者按

在本系列文章中,我们从运筹学顶刊 Management Science 10月份发布的47篇文章中筛选出11篇文章,并介绍基本信息,旨在帮助读者快速洞察行业最新动态。


推荐文章1



● 题目Managing Resources for Shared Micromobility: Approximate Optimality in Large-Scale Systems

共享微出行资源管理:大规模系统中的近似最优性

 原文链接:https://doi.org/10.1287/mnsc.2022.02023
● 作者Deniz Akturk, Ozan Candogan, Varun Gupta
● 发布时间:October 14, 2024
● 摘要

We consider the problem of managing resources in shared micromobility systems (bike sharing and scooter sharing). An important task in managing such systems is periodic repositioning/recharging/sourcing of units to avoid stockouts or excess inventory at nodes with unbalanced flows. We consider a discrete-time model; each period begins with an initial inventory at each node in the network, and then, customers (demand) materialize at the nodes. Each customer picks up a unit at the origin node and drops it off at a randomly sampled destination node with an origin-specific probability distribution. We model the above network inventory management problem as an infinite horizon discrete-time discounted Markov decision process (MDP) and prove the asymptotic optimality of a novel mean-field approximation to the original MDP as the number of stations becomes large. To compute an approximately optimal policy for the mean-field dynamics, we provide an algorithm with a running time that is logarithmic in the desired optimality gap. Lastly, we compare the performance of our mean field-based policy with state-of-the-art heuristics via numerical experiments, including experiments using Austin scooter-sharing data.

我们考虑了共享微出行系统(自行车共享和滑板车共享)中的资源管理问题。管理这类系统的一个重要任务是定期重新定位/充电/补给单位,以避免在流量不平衡的节点出现缺货或库存过多。我们考虑了一个离散时间模型;每个周期开始时,网络中的每个节点都有初始库存,然后,客户(需求)在节点出现。每个客户在起始节点提取一个单位,并在一个随机抽样的目的地节点放下,具有特定于起始点的概率分布。我们将上述网络库存管理问题建模为无限视界离散时间折扣马尔可夫决策过程(MDP),并证明了随着站点数量的增加,一种新的均场近似对原始MDP的渐近最优性。为了计算均场动态的近似最优策略,我们提供了一个运行时间与所需最优性差距成对数关系的算法。最后,我们通过数值实验,包括使用奥斯汀滑板车共享数据的实验,比较了我们的均场基础策略与最先进的启发式策略的性能。



推荐文章2



● 题目Large Language Model in Creative Work: The Role of Collaboration Modality and User Expertise

大型语言模型在创意工作中的应用:合作方式和用户专业度的作用

 原文链接:https://doi.org/10.1287/mnsc.2023.03014
● 作者Zenan Chen, Jason Chan
● 发布时间:October 15, 2024
● 摘要

Since the launch of ChatGPT in December 2022, large language models (LLMs) have been rapidly adopted by businesses to assist users in a wide range of open-ended tasks, including creative work. Although the versatility of LLM has unlocked new ways of human-artificial intelligence collaboration, it remains uncertain how LLMs should be used to enhance business outcomes. To examine the effects of human-LLM collaboration on business outcomes, we conducted an experiment where we tasked expert and nonexpert users to write an ad copy with and without the assistance of LLMs. Here, we investigate and compare two ways of working with LLMs: (1) using LLMs as “ghostwriters,” which assume the main role of the content generation task, and (2) using LLMs as “sounding boards” to provide feedback on human-created content. We measure the quality of the ads using the number of clicks generated by the created ads on major social media platforms. Our results show that different collaboration modalities can result in very different outcomes for different user types. Using LLMs as sounding boards enhances the quality of the resultant ad copies for nonexperts. However, using LLMs as ghostwriters did not provide significant benefits and is, in fact, detrimental to expert users. We rely on textual analyses to understand the mechanisms, and we learned that using LLMs as ghostwriters produces an anchoring effect, which leads to lower-quality ads. On the other hand, using LLMs as sounding boards helped nonexperts achieve ad content with low semantic divergence to content produced by experts, thereby closing the gap between the two types of users.

自2022年12月ChatGPT推出以来,大型语言模型(LLM)已被企业迅速采用,以协助用户完成包括创意工作在内的广泛开放式任务。尽管LLM的多功能性开启了新的人工智能与人合作方式,但如何使用LLM来增强业务成果仍不确定。为了检验人与LLM合作对业务成果的影响,我们进行了一项实验,任务是让专家和非专家用户在有和没有LLM协助的情况下撰写广告文案。在这里,我们调查并比较了两种使用LLM的工作方式:(1)将LLM用作“代笔”,承担内容生成任务的主要角色;(2)将LLM用作“反馈板”,为人类创造的内容提供反馈。我们使用在主要社交媒体平台上创建的广告产生的点击次数来衡量广告的质量。我们的结果显示,不同的合作方式对于不同类型的用户会产生非常不同的结果。将LLM作为反馈板可以提高非专家制作的广告文案的质量。然而,将LLM作为代笔并没有提供显著的好处,实际上对专家用户是有害的。我们依赖文本分析来理解机制,我们了解到使用LLM作为代笔会产生锚定效应,导致广告质量下降。另一方面,将LLM作为反馈板帮助非专家实现与专家制作的内容语义差异小的广告内容,从而缩小了两种类型用户之间的差距。


推荐文章3



● 题目Extubation Decisions with Predictive Information for Mechanically Ventilated Patients in the ICU
ICU中机械通气患者的拔管决策与预测信息
 原文链接https://doi.org/10.1287/mnsc.2021.01427
● 作者Guang Cheng, Jingui Xie, Zhichao Zheng, Haidong Luo, Oon Cheong Ooi
● 发布时间:October 24, 2024
● 摘要

Weaning patients from mechanical ventilators is a crucial decision in intensive care units (ICUs), significantly affecting patient outcomes and the throughput of ICUs. This study aims to improve the current extubation protocols by incorporating predictive information on patient health conditions. We develop a discrete-time, finite-horizon Markov decision process with predictions of future state to support extubation decisions. We characterize the structure of the optimal policy and provide important insights into how predictive information can lead to different decision protocols. We demonstrate that adding predictive information is always beneficial, even if physicians place excessive trust in the predictions, as long as the predictive model is moderately accurate. Using a comprehensive data set from an ICU in a tertiary hospital in Singapore, we evaluate the effectiveness of various policies and demonstrate that incorporating predictive information can reduce ICU length of stay by up to 3.4% and, simultaneously, decrease the extubation failure rate by up to 20.3%, compared with the optimal policy that does not utilize prediction. These benefits are more significant for patients with poor initial conditions upon ICU admission. Both our analytical and numerical findings suggest that predictive information is particularly valuable in identifying patients who could benefit from continued intubation, thereby allowing for personalized and delayed extubation for these patients.

在重症监护病房(ICU)中,为患者撤除机械呼吸机是一个关键决策,显著影响患者的治疗结果和ICU的运转效率。本研究旨在通过整合有关患者健康状况的预测信息来改进当前的拔管方案。我们开发了一个离散时间、有限视界的马尔可夫决策过程,包含对未来状态的预测,以支持拔管决策。我们描述了最优策略的结构,并提供了预测信息如何导致不同决策方案的重要见解。我们证明,即使医生对预测结果过于信任,只要预测模型的准确性适中,增加预测信息总是有益的。利用新加坡一家三级医院ICU的全面数据集,我们评估了各种政策的有效性,并证明整合预测信息可以将ICU住院时间减少高达3.4%,同时将拔管失败率降低高达20.3%,与不利用预测的最优政策相比。对于ICU入院时初始状况较差的患者,这些好处更为显著。我们的分析和数值结果表明,预测信息在识别可能从持续插管中受益的患者方面特别有价值,从而允许对这些患者进行个性化和延迟的拔管。


推荐文章4



● 题目Higher Precision Is Not Always Better: Search Algorithm and Consumer Engagement

更高的精确度并不总是更好:搜索算法与消费者参与度

 原文链接https://doi.org/10.1287/mnsc.2023.00478
● 作者Wei Zhou, Mingfeng Lin, Mo Xiao, Lu Fang
● 发布时间:October 28, 2024
● 摘要

On decentralized e-commerce platforms, search algorithms play a critical role in matching buyers and sellers. A typical search algorithm routinely refines and improves its catalog of data to increase search precision, but the effects of a more precise search are little known. We evaluate such effects via a 2019 quasiexperiment on a world-leading e-commerce platform in which the search algorithm refined some product categories into finer subgroups to allocate consumer queries to more relevant product listings. Our data cover millions of consumers’ search and purchase behaviors over six months across multiple search sessions and product categories, enabling us to investigate trade-offs over time and across categories. We find that a more precise search algorithm improves consumers’ click-through and purchase rates drastically and instantaneously, but it comes at the cost of a significant decrease in consumer engagement and unplanned purchases over a longer time horizon. On average, consumers who used to spend more time searching now conduct 5.5% fewer searches, spend 4.1% less time on the platform, and decrease their spending on related categories by 2.2% in the week after the search precision increases. Our examination of the mechanisms behind these consequences calls for more careful search algorithm designs that account for not only instant conversion based on search precision but also consumer engagement and sellers’ strategic responses in the longer horizon.

在去中心化的电子商务平台上,搜索算法在匹配买家和卖家方面发挥着关键作用。典型的搜索算法通常会完善和改进其数据目录,以提高搜索精确度,但更精确搜索的效果却鲜为人知。我们通过2019年在世界领先的电子商务平台上进行的一次准实验来评估这种效果,该平台的搜索算法将一些产品类别细化为更细的子组,以将消费者查询分配到更相关的产品列表。我们的数据涵盖了数百万消费者的搜索和购买行为,跨越六个月的多个搜索会话和产品类别,使我们能够调查随时间和跨类别的权衡。我们发现,更精确的搜索算法显著且立即提高了消费者的点击率和购买率,但代价是消费者参与度和计划外购买在更长的时间范围内显著下降。平均而言,过去花费更多时间搜索的消费者现在进行的搜索减少了5.5%,在平台上花费的时间减少了4.1%,在搜索精确度提高后的一周内,他们在相关类别上的支出减少了2.2%。我们对这些后果背后的机制的考察呼吁更谨慎的搜索算法设计,不仅要考虑基于搜索精确度的即时转化,还要考虑消费者参与度和卖家在更长时间内的战略响应。


推荐文章5



● 题目Lightning Network Economics: Topology

闪电网络经济:拓扑结构

 原文链接 :https://doi.org/10.1287/mnsc.2023.03872
● 作者Paolo Guasoni, Gur Huberman, Clara Shikhelmand
● 发布时间October 7, 2024
● 摘要

By design, the Bitcoin protocol has a low throughput. The Lightning Network (LN) is a layer-two solution built to increase throughput by cryptographically securing commitments to transactions and only occasionally converting cumulative balances into on-chain transactions. LN channels enable payments between nodes connected by a path of channels. The payment flow through a channel determines its cost. Different channel topologies can support the same underlying flows but impose different costs. This paper obtains necessary conditions for cost-minimizing topologies by identifying local cost-reducing strategies. The first local strategy entails repositioning of channels. The second entails adding hubs to handle the flows of groups of nodes. The paper also evaluates the efficiency of a global configuration, obtaining bounds on the minimum cost topology and showing the unusual circumstances in which the cost minimal structure is a hub that connects to all other nodes.

比特币协议的设计吞吐量较低。闪电网络(LN)是一个二层解决方案,通过加密安全地承诺交易,并且仅偶尔将累积余额转换为链上交易来提高吞吐量。LN通道使得通过通道路径连接的节点之间的支付成为可能。通过通道的支付流决定了其成本。不同的通道拓扑结构可以支持相同的底层流量,但会施加不同的成本。本文通过识别局部成本降低策略,获得了成本最小化拓扑结构的必要条件。第一个局部策略涉及重新定位通道。第二个策略涉及增加处理一组节点流量的枢纽。本文还评估了全局配置的效率,获得了最小成本拓扑的界限,并展示了成本最小结构是一个连接到所有其他节点的枢纽的不寻常情况。


推荐文章6



● 题目Industry-University Collaboration and Commercializing Chinese Corporate Innovation
产学合作与商业化中国企业创新
 原文链接 :https://doi.org/10.1287/mnsc.2022.00788
● 作者David H. Hsu, Po-Hsuan Hsu, Kaiguo Zhou, Tong Zhou
● 发布时间October 3, 2024
● 摘要

We construct a comprehensive data set of medium- and large-sized industrial firms and research universities in China and examine how Chinese firms’ commercialization of their technologies is related to their experience in industry-university collaboration (IUC). We propose that firms’ IUC experience constitutes an inimitable complementary asset that facilitates their technology commercialization. Our empirical analyses show that firms generate more new product sales and produce more product-oriented patents when they have more patents that are coassigned to universities or when they have more academic publications coauthored with university staff in the past. Such a relation is strengthened when firms have higher absorptive capacity, when firms are in industries that depend more on basic science, and when firms are located closer to their collaborating universities. Additional tests point out four channels through which firms’ IUC experience benefits their technology commercialization: knowledge acquisition, talent recruiting, direct technology transfers, and technological complementarity.

我们构建了一个包含中国中型和大型工业企业以及研究型大学的全面数据集,并检验了中国企业技术商业化与其在产学合作(IUC)方面的经验之间的关系。我们提出,企业的IUC经验构成了一种难以模仿的补充资产,有助于其技术商业化。我们的实证分析显示,当企业拥有更多与大学共同申请的专利或与大学员工共同撰写的学术出版物时,它们会产生更多的新产品销售,并产生更多以产品为导向的专利。当企业具有更高的吸收能力、企业所在的行业更依赖基础科学,以及企业与合作大学地理位置更近时,这种关系会得到加强。额外的测试指出了企业IUC经验对其技术商业化的四种益处渠道:知识获取、人才招聘、直接技术转移和技术互补性。


推荐文章7



● 题目Bundling and Line Extensions in Distribution Channels
分销渠道中的捆绑销售和产品线扩展
 原文链接 https://doi.org/10.1287/mnsc.2023.01326
● 作者Roman Inderst, Fabian Griem, Greg Shaffer
● 发布时间October 28, 2024
● 摘要

We show how manufacturers can benefit from contracts that incentivize retailers to purchase multiple products from the same manufacturer. We isolate two effects: first, under standard contractual inefficiencies, which give rise to double marginalization, such contracts can increase channel profits (the “improved contractual efficiency” effect); second, when a weaker product is tied to a particularly strong “must-stock” product, such contracts can also reduce a retailer’s position and shift rent to the manufacturer (the “increased rent extraction” effect). To harness these effects, we show that it can even be profitable for the manufacturer to introduce a weak product that ultimately has the effect of foreclosing a rival’s more efficient substitute. Nevertheless, unless the tying product is sufficiently strong, the overall effect on welfare can still be positive, providing manufacturers with an efficiency rationale to use against common concerns held by antitrust agencies about such practices.

我们展示了制造商如何从激励零售商购买同一制造商的多种产品合同中获益。我们分离了两种效应:首先,在导致双重边际化的标凊合同低效下,这样的合同可以增加渠道利润(“改善合同效率”效应);其次,当一个较弱的产品与特别强大的“必须库存”产品捆绑时,这样的合同还可以减少零售商的地位,并将租金转移到制造商(“增加租金提取”效应)。为了利用这些效应,我们展示了制造商甚至可以引入一个最终具有排除竞争对手更有效替代品效果的弱产品,并且这样做是有利可图的。然而,除非捆绑产品足够强大,否则对福利的总体影响仍然可以是积极的,为制造商提供了一个效率理由,以对抗反垄断机构对此类做法的常见担忧。


推荐文章8



● 题目Capturing the Benefits of Autonomous Vehicles in Ride Hailing: The Role of Market Configuration
自动驾驶汽车在网约车市场中的效益捕获:市场配置的作用
 原文链接 https://doi.org/10.1287/mnsc.2020.03112
● 作者Zhen Lian, Garrett van Ryzin
● 发布时间October 8, 2024
● 摘要

We develop an economic model of autonomous vehicle (AV) ride-hailing markets, in which uncertain aggregate demand is served with a combination of a fixed fleet of AVs and a flexible pool of human drivers (HVs). Dispatch efficiencies increase with scale because of density effects. We analyze market outcomes in this setting under four market configurations, defined by two dispatch platform structures (common platform versus independent platforms) and two levels of supply competition (monopoly AV versus competitive AV). A key result of our analysis is that the lower cost of AVs does not necessarily translate into lower prices; the price impact of AVs is ambiguous and depends critically on both the dispatch platform structure and the level of AV supply competition. In the extreme case, we show that if AVs and HVs operate on independent dispatch platforms, there is a monopoly AV supplier, and labor supply elasticity is sufficiently high, then prices are even higher than in a pure-HV market. Indeed, to guarantee consistently lower prices (relative to a pure HV market) in all scenarios and under all supply and density elasticities, a common dispatch platform between AVs and HVs is required. Furthermore, competitive AVs lead to lower prices than monopoly AVs in every such scenario. Our results illustrate the critical role that market configuration plays in realizing potential welfare gains from AVs.

我们开发了一个自动驾驶汽车(AV)网约车市场的经济模型,其中不确定的总需求由固定车队的AV和灵活的人类司机(HV)池共同服务。由于密度效应,随着规模的增加,调度效率提高。我们在四种市场配置下分析了这一设置下的市场结果,这些配置由两个调度平台结构(共同平台与独立平台)和两个供应竞争水平(垄断AV与竞争AV)定义。我们分析的一个关键结果是,AV的低成本并不一定转化为更低的价格;AV的价格影响是不明确的,并且在很大程度上取决于调度平台结构和AV供应竞争水平。在极端情况下,我们展示了如果AV和HV在独立的调度平台上运营,存在垄断AV供应商,并且劳动力供应弹性足够高,那么价格甚至比纯HV市场还要高。事实上,要保证在所有情景下和在所有供应和密度弹性下持续降低价格(相对于纯HV市场),需要AV和HV之间的共同调度平台。此外,在每种情况下,竞争性AV都会导致比垄断AV更低的价格。我们的结果表明,市场配置在实现AV的潜在福利增益中扮演着关键角色。


推荐文章9



● 题目A Minibatch Stochastic Gradient Descent-Based Learning Metapolicy for Inventory Systems with Myopic Optimal Policy
基于小批量随机梯度下降的学习型元策略在近视最优策略的库存系统中的应用

 原文链接 https://doi.org/10.1287/mnsc.2023.00920
● 作者Jiameng Lyu, Jinxing Xie, Shilin Yuan, Yuan Zhou
● 发布时间October 9, 2024
● 摘要

Stochastic gradient descent (SGD) has proven effective in solving many inventory control problems with demand learning. However, it often faces the pitfall of an infeasible target inventory level that is lower than the current inventory level. Several recent works have been successful in resolving this issue in various inventory systems. However, their techniques are rather sophisticated and difficult to apply to more complicated scenarios, such as multiproduct and multiconstraint inventory systems. In this paper, we address the infeasible target inventory-level issue from a new technical perspective; we propose a novel minibatch SGD-based metapolicy. Our metapolicy is flexible enough to be applied to a general inventory systems framework covering a wide range of inventory management problems with myopic clairvoyant optimal policy. By devising the optimal minibatch scheme, our metapolicy achieves a regret bound of O(√T) for the general convex case and O(logT) for the strongly convex case. To demonstrate the power and flexibility of our metapolicy, we apply it to three important inventory control problems, multiproduct and multiconstraint systems, multiechelon serial systems, and one-warehouse and multistore systems, by carefully designing application-specific subroutines. We also conduct extensive numerical experiments to demonstrate that our metapolicy enjoys competitive regret performance, high computational efficiency, and low variances among a wide range of applications.

随机梯度下降(SGD)已被证明在解决许多需求学习中的库存控制问题上是有效的。然而,它经常面临一个陷阱,即目标库存水平不可行,低于当前库存水平。最近的工作在各种库存系统中成功解决了这一问题。然而,它们的技术相当复杂,难以应用于更复杂的情况,如多产品和多约束库存系统。在本文中,我们从一个新的技术角度解决不可行目标库存水平问题;我们提出了一种新颖的小批量SGD基础元策略。我们的元策略足够灵活,可以应用于涵盖广泛库存管理问题的通用库存系统框架,具有近视先知最优策略。通过设计最优的小批量方案,我们的元策略在一般凸情况下实现了O(√T)的遗憾界限,在强凸情况下实现了O(logT)的遗憾界限。为了展示我们的元策略的力量和灵活性,我们将其应用于三个重要的库存控制问题:多产品和多约束系统、多级串行系统以及单一仓库和多商店系统,通过精心设计特定于应用的子程序。我们还进行了广泛的数值实验,以证明我们的元策略在广泛的应用中具有竞争力的遗憾性能、高计算效率和低方差。


推荐文章10



● 题目Private Optimal Inventory Policy Learning for Feature-Based Newsvendor with Unknown Demand
基于特征的未知需求新闻摊贩的隐私保护最优库存策略学习
 原文链接 https://doi.org/10.1287/mnsc.2023.01268
● 作者Tuoyi Zhao, Wen-Xin Zhou, 
● 发布时间October 24, 2024
● 摘要

The data-driven newsvendor problem with features has recently emerged as a significant area of research, driven by the proliferation of data across various sectors such as retail, supply chains, e-commerce, and healthcare. Given the sensitive nature of customer or organizational data often used in feature-based analysis, it is crucial to ensure individual privacy to uphold trust and confidence. Despite its importance, privacy preservation in the context of inventory planning remains unexplored. A key challenge is the nonsmoothness of the newsvendor loss function, which sets it apart from existing work on privacy-preserving algorithms in other settings. This paper introduces a novel approach to estimating a privacy-preserving optimal inventory policy within the f-differential privacy framework, an extension of the classical (ɛ,δ)-differential privacy with several appealing properties. We develop a clipped noisy gradient descent algorithm based on convolution smoothing for optimal inventory estimation to simultaneously address three main challenges: (i) unknown demand distribution and nonsmooth loss function, (ii) provable privacy guarantees for individual-level data, and (iii) desirable statistical precision. We derive finite-sample high-probability bounds for optimal policy parameter estimation and regret analysis. By leveraging the structure of the newsvendor problem, we attain a faster excess population risk bound compared with that obtained from an indiscriminate application of existing results for general nonsmooth convex loss. Our numerical experiments demonstrate that the proposed new method can achieve desirable privacy protection with a marginal increase in cost.

最近,随着数据在零售、供应链、电子商务和医疗保健等多个领域的普及,具有特征的数据驱动新闻摊贩问题已成为一个重要的研究领域。鉴于通常在基于特征的分析中使用的客户或组织数据的敏感性,确保个人隐私至关重要,以维护信任和信心。尽管其重要性,库存规划中的隐私保护尚未被探索。一个关键挑战是新闻摊贩损失函数的非平滑性,这将其与现有的隐私保护算法工作区分开来。本文介绍了在f-差分隐私框架内估计隐私保护最优库存策略的新方法,这是对经典(ɛ,δ)-差分隐私的扩展,具有几个吸引人的特性。我们开发了一种基于卷积平滑的截断噪声梯度下降算法,用于最优库存估计,以同时解决三个主要挑战:(i) 未知需求分布和非平滑损失函数,(ii) 可证明的个人层面数据隐私保证,以及(iii) 理想的统计精度。我们推导出最优策略参数估计和遗憾分析的有限样本高概率界限。通过利用新闻摊贩问题的结构,我们获得了比从对一般非平滑凸损失的现有结果的不加选择应用更快的超额群体风险界限。我们的数值实验表明,所提出的新方法可以在成本略有增加的情况下实现理想的隐私保护。


推荐文章11



● 题目A Simple and Optimal Policy Design with Safety Against Heavy-Tailed Risk for Stochastic Bandits
针对随机多臂老虎机问题的具有安全性的简单和最优策略设计,以抵御重尾风险
 原文链接 https://doi.org/10.1287/mnsc.2022.03512
● 作者David Simchi-Levi, Zeyu Zheng, Feng Zhu
● 发布时间October 30, 2024
● 摘要

We study the stochastic multi-armed bandit problem and design new policies that enjoy both optimal regret expectation and light-tailed risk for regret distribution. We first find that any policy that obtains the optimal instance-dependent expected regret could incur a heavy-tailed regret tail risk that decays slowly with T. We then focus on policies that achieve optimal worst-case expected regret. We design a novel policy that (i) enjoys the worst-case optimality for regret expectation and (ii) has the worst-case tail probability of incurring a regret larger than any regret threshold that decays exponentially with respect to T. The decaying rate is proved to be optimal for all worst-case optimal policies. Our proposed policy achieves a delicate balance between doing more exploration at the beginning of the time horizon and doing more exploitation when approaching the end, compared with standard confidence-bound-based policies. We also enhance the policy design to accommodate the “any-time” setting where T is unknown a priori, highlighting “lifelong exploration”, and prove equivalently desired policy performances as compared with the “fixed-time” setting with known T. From a managerial perspective, we show through numerical experiments that our new policy design yields similar efficiency and better safety compared to celebrated policies. Our policy design is preferable especially when (i) there is a risk of underestimating the volatility profile, or (ii) there is a challenge of tuning policy hyper-parameters. We conclude by extending our proposed policy design to the stochastic linear bandit setting that leads to both worst-case optimality in terms of regret expectation and light-tailed risk on regret distribution.

我们研究了随机多臂老虎机问题,并设计了新的策略,这些策略在期望遗憾和遗憾分布的轻尾风险方面都表现出色。我们首先发现,任何获得最优实例依赖期望遗憾的策略都可能承担随着T缓慢衰减的重尾遗憾尾部风险。然后,我们专注于实现最优最坏情况期望遗憾的策略。我们设计了一种新策略,它(i)在遗憾期望方面享有最坏情况的最优性,并且(ii)具有在任何遗憾阈值上产生比任何遗憾更大的遗憾的最坏情况尾部概率,并且随着T指数衰减。衰减率被证明对于所有最坏情况最优策略都是最优的。我们提出的策略在时间范围开始时进行更多探索和在接近结束时进行更多利用之间实现了微妙的平衡,与标准基于置信度的策略相比。我们还增强了策略设计,以适应T事先未知的“随时”设置,强调“终身探索”,并证明与已知T的“固定时间”设置相比,具有等价的理想策略性能。从管理的角度来看,我们通过数值实验表明,我们的新策略设计在效率上与知名策略相似,并且在安全性上更好。特别是当我们(i)有低估波动性概况的风险,或(ii)存在调整策略超参数的挑战时,我们的策略设计更受青睐。我们通过将我们提出的策略设计扩展到随机线性老虎机设置来结束,这在遗憾期望方面实现了最坏情况的最优性,并在遗憾分布上实现了轻尾风险


「运筹OR帷幄」原创的《鲁棒优化入门》电子书正在GitHub更新中,欢迎复制链接阅读

https://github.com/Operations-Research-Science/Ebook-An_introduction_to_robust_optimization





微信公众号后台回复

加群:加入全球华人OR|AI|DS社区硕博微信学术群

资料:免费获得大量运筹学相关学习资料

人才库:加入运筹精英人才库,获得独家职位推荐

电子书:免费获取平台小编独家创作的优化理论、运筹实践和数据科学电子书,持续更新中ing...

加入我们:加入「运筹OR帷幄」,参与内容创作平台运营

知识星球:加入「运筹OR帷幄」数据算法社区,免费参与每周「领读计划」、「行业inTalk」、「OR会客厅」等直播活动,与数百位签约大V进行在线交流



                    


        




文章须知

推文作者:胡思行

责任编辑:EvelynYao

微信编辑:疑疑

文章由『运筹OR帷幄』原创发布

如需转载请在公众号后台获取转载须知







关注我们 

       FOLLOW US




































SVG布局的工具条上可以设置动画各种参数
同时可以设置宽高比,达到SVG层和布局内容的完美对齐




SVG布局的工具条上可以设置动画各种参数
同时可以设置宽高比,达到SVG层和布局内容的完美对齐










SVG布局的工具条上可以设置动画各种参数



运筹OR帷幄
致力于成为全球最大的运筹学中文线上社区
 最新文章