在Andrew Morton送出的6.12-rc1 的内存管理的pull请求中,有多个mTHP相关feature合入Linux 6.12,显示mTHP目前仍是mm领域最活跃的开发主题之一。
我们的团队贡献了其中的3个新feature:
"mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly improves the performance of munmap() of swap-filled ptes.
"mm: count the number of anonymous THPs per size" from Barry Song. Expose additional anon THP stats to userspace for improved tuning.
"mm: enable large folios swap-in support" from Barry Song. Support the swapin of mTHP memory into appropriately-sized folios, rather than into single-page folios.
相关工作合计11个patch:
我们使能了do_swap_page()时候直接申请large folios并进行映射,目前仅针对zRAM等同步I/O设备;在munmap/zap_pte_range()场景批处理swap slot的释放,最高可见zap_pte_range() 3X性能提升;增加了per-size的mTHP总的计数和部分mapping的mTHP总的计数,便于对整个系统进行profiling。
其他还有alibaba的Baolin Wang童鞋贡献的shmem的mTHP支持:
"support large folio swap-out and swap-in for shmem" from Baolin Wang. With this series we no longer split shmem large folios into simgle-page folios when swapping out shmem.
"support shmem mTHP collapse" from Baolin Wang. Adds support for khugepaged's collapsing of shmem mTHP folios.
Google的Chris Li和腾讯的Kairui Song童鞋贡献的swap allocator优化,极大提高了mTHP swap-out分配到连续swap slots而不必split的能力:
"mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li. Greatly improves the success rate of the mTHP swap allocation.
Chris的思路如下:
mTHP swap-out在系统运行一段时间后,几乎无法申请到连续swap slots的问题之前由我们报出,我之前也为此开发了一个swap allocator评估工具,在Linux 6.11的时候已经登陆:
tools/mm: introduce a tool to assess swap entry allocation for thp_swapout
链接:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=95139d94084
还有meta的Usama Arif和Google Yu Zhao带来的
"mm: split underused THPs" from Yu Zhao. Improve THP=always policy - this was overprovisioning THPs in sparsely accessed memory areas.
这个patchset很有意思,它在deferred_split的shrinker callback中,把一个THP中没有被初始化(其实一直没人访问)的部分释放掉,拆散THP。其实是做了
(__do_huge_pmd_anonymous_page) 和 collapsed by khugepaged
(collapse_huge_page)的反动作,把一个THP里面仍然属于zero填充的部分释放掉。
最后提一下ARM的Ryan Roberts童鞋与我联合开发的通过bootcmd使能mTHP的patch:
mm: override mTHP "enabled" defaults at kernel cmdline
链接:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dd4d30d1cdbe82
比如我们可以在开机的命令行中传入类似如下的参数来指定什么样的size的mTHP用什么样的策略:
thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
这样就无需开机完成后,通过sysfs的controls接口来设置大页了,也避免了开机过程中很多进程无法用大页。