G1GC深度探索--Young gc耗时持续增长原因分析

科技 2024-09-20 17:49 河北

欢迎加入华仔的星球，你将获得: 专属的中间件专栏 / 1v1 提问 / 简历修改指导/ 学习打卡 / 每月赠书 / 社群讨论
《从四大维度开始带你精通 RocketMQ》 已爆肝完毕，基于 RocketMQ 5.1.2 版本进行源码讲解。
《从四大维度开始带你精通 Kafka》 已爆肝完毕，基于 Kafka 2.8 以及 3.x 版本进行源码讲解。
截止目前，累计输出 500w+ 字，讲解图 2000+ 张，还在持续爆肝中.. 后续还会上新更多项目和专栏，目标是打造地表最强中间件星球，戳我加入学习，已有420+小伙伴加入，电商实战项目火热更新中，结束时会有简历指导包装，需要的抓紧来。
这里说几点，解答一些疑惑，可以认真看下：
1、星球内容只会越来越完善，价格越来越贵，一年时间从69元开始发售到现在已经涨到了189元，下个月会涨价一波，所以需要抓紧来，越早越受益，别错过（中秋福利已取消，如果你真的需要来加我好友私我）。
2、只筛选认可且支持我的老铁，我不喜欢白嫖怪，尊重别人就是尊重自己。
3、对于星球年费说下，只是到期后新内容看不到，已经更完的内容相当于一次付费永久看，所以认可我内容的可以放心来，有疑问文末加我好友进行答疑。
4、除专栏外，我会提供专属的一对一答疑服务，包括不限于工作中、专栏中遇到的问题，简历修改指导、职业规划服务等。这也是星球的特色服务。

奇怪现象

你是否遇到过服务内部即使设置了线程超时时间，上游依旧报超时异常？

你是否遇到过下游服务快速响应，当前服务后续方法无耗时逻辑。但偏偏整体执行耗时极长？

你是否遇到过gc耗时规律性增长？却与流量不成比例？

你是否遇到过Eden空间突然降低且维持很久？

背景

那是一个平静的午后，突然的gc耗时告警惊扰了打工人专心编码的心。打开gc监控一看，发现虽然只有单个容器报警，但是大部分gc已经飙升到接近1s了。

将报警机器快速拉出负载后，开始查看分析gc情况。

没有full gc
年轻代频率大幅提高，耗时大幅提高
流量并未有较大起伏
使用的G1GC，设置的最大停顿时间是200ms

头痛医头

既然只是年轻代频率和耗时大幅提高，那不是手到擒来？--增加年轻代空间。

一个更大的年轻代，自然能让gc的频率降低，总吞吐提高。

进入容器，查看堆内存大小。发现服务设置的堆内存为10G，未手动设置Eden大小。

默认情况下年轻代占1/3。理论上不应该在流量没有起伏的情况下gc突然如此频繁。

查看gc日志

发现在某个时间段内，gc频率极高。同时年轻代大幅缩小。日志内容如下

[Parallel Time: 206.4 ms, GC Workers: 8]      [GC Worker Start (ms): Min: 3867072696.3, Avg: 3867072696.3, Max: 3867072696.4, Diff: 0.1]      [Ext Root Scanning (ms): Min: 179.4, Avg: 183.6, Max: 185.9, Diff: 6.5, Sum: 1468.4]      [Update RS (ms): Min: 2.3, Avg: 2.5, Max: 3.0, Diff: 0.7, Sum: 20.3]         [Processed Buffers: Min: 56, Avg: 76.2, Max: 100, Diff: 44, Sum: 610]      [Scan RS (ms): Min: 0.2, Avg: 0.3, Max: 0.4, Diff: 0.2, Sum: 2.2]      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]      [Object Copy (ms): Min: 17.6, Avg: 19.7, Max: 24.0, Diff: 6.3, Sum: 157.8]      [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.5]         [Termination Attempts: Min: 1, Avg: 100.8, Max: 128, Diff: 127, Sum: 806]      [GC Worker Other (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.8]      [GC Worker Total (ms): Min: 206.2, Avg: 206.3, Max: 206.4, Diff: 0.2, Sum: 1650.1]      [GC Worker End (ms): Min: 3867072902.5, Avg: 3867072902.6, Max: 3867072902.6, Diff: 0.1]   [Code Root Fixup: 0.0 ms]   [Code Root Purge: 0.0 ms]   [Clear CT: 0.7 ms]   [Other: 7.7 ms]      [Choose CSet: 0.0 ms]      [Ref Proc: 5.1 ms]      [Ref Enq: 0.1 ms]      [Redirty Cards: 0.1 ms]      [Humongous Register: 0.1 ms]      [Humongous Reclaim: 0.0 ms]      [Free CSet: 1.0 ms]   [Eden: 6120.0M(6120.0M)->0.0B(1356.0M) Survivors: 24.0M->24.0M Heap: 8625.7M(10.0G)->2506.0M(10.0G)]

上图可以分析出，Eden大幅缩小。

由于流量不变，young gc会更加频繁，从而大量存活对象不停被复制，最后导致单位时间内有gc耗时大幅提高。

上述猜测合情合理。为了验证想法，通过jvm监控查看分区使用情况。

观察上图，我们发现

图一已申请空间：

年轻代已申请空间快速抖动-->年轻代总空间降至低位-->年轻代总空间保持低位-->年轻代总空间恢复高位。

图二已使用空间：

年轻代正常gc-->随着总空间降低，年轻代可使用空间持续保持低位-->gc次数突增。

既然知道是年轻代原因，那就先着手解决Eden缩小的问题。

当前是G1GC，通过设置参数强制增大年轻代最小浮动空间解决-XX:G1NewSizePercent=5%--->60%

修改完重启之后，无论gc频率和gc耗时都大幅降低。

整体数据：

单容器数据:

修改完后，无论总gc单次耗时还是gc频率都大幅降低。粗算整体降低至25%。young gc耗时报警消失。

东风又起

在长时间持续观察每个容器后，发现每个容器都迎来gc突刺，观察发现因为都发生了full gc。

但是以经验来看，G1GC有Mix gc机制，即在young gc时，会尝试通过混合gc的模式清理老年代。理论上不太可能产生full gc。那究竟发生了什么？

查看gc日志，发现并未出现关于mix gc的部分。那我们先了解下Mix gc以及触发机制。

Mixed GC触发条件

当达到IHOP阈值，-XX:InitiatingHeapOccupancyPercent（默认45%）时，即，当使用内存占到堆总大小的45%的时候，G1将开始并发标记阶段 + Mixed GC
当全局并发标记结束后，统计出所有可能被回收的垃圾占Heap的比例值，如果超过5%（-XX:G1HeapWastePercent默认值5%），就会触发多轮Mixed GC
调用System.gc，且开启-XX:ExplicitGCInvokesConcurrent

答案呼之欲出，mixed gc的触发阈值默认是45%（8b12版本之前是堆整体使用率，之后是老年代使用空间占比），但是年轻代最小也堆是60%，导致老年代无法达到40%，因此无法触发mixed gc。

于是，修改G1NewSizePercent和InitiatingHeapOccupancyPercent后我们继续观察。

修改不再出现full gc，gc周期也被拉长整体稳定增长。但是峰值gc还是高达1s。

我们继续查看gc日志，尝试继续优化。

精益求精

在仔细查看gc日志后，我发现大量类似以下的gc日志。

144392.185: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 9537847296 bytes, allocation request: 2097168 bytes, threshold: 4831838190 bytes (45.00 %), source: concurrent humongous allocation]

在 G1 中，当对象超过一个 Region 的 50% ，就会判定为大对象，如果一个对象太大，会连续使用多个Humongous区域存放，且占用老年代空间。因此尝试放大G1HeapRegionSize。

正常容器（Regionsize=4M）

debug容器（Regionsize=8M）

可以发现。修改完regionSize后，由于部分的大对象不再是大对象，因此不再进入老年代。从而提高了延长了老年代达到阈值的时间。同时年轻代老年代自动调整导致的毛刺更加平缓。同时根据上图也能发现年轻代整体的可使用空间也远高于小region。

至此，gc优化结束。

但是真的结束了么？

为什么Eden会持续会缩小呢？带着这个问题，开始继续研究。

柳暗花明

这里感谢张承老师的提醒。他提到曾经出现过这样的情况，可以通过-XX:G1LogLevel=finest打印更详细的日志，有可能是StringTable过大引起的扫描耗时，之前出现过类似的情况，原因是序列化问题导致StringTable过大。

扫描耗时！一语惊醒梦中人。

我们知道gc环节中，最关键的无非是根结点扫描和对象复制的过程。而其中对象复制又是耗时较多的。但是，但凡其他某个环节耗时上来了，在遇到设置的200ms的gc耗时时间后，必然意味着留给young gc复制对象的时间的缩短，而复制对象的时间被压缩必然伴随着Eden空间的缩小。至此，对Eden的缩小有了个猜测。这里，我们后续再做验证。

通过参数-XX:G1LogLevel=finest获取日志。

{Heap before GC invocations=214 (full 0): garbage-first heap   total 10485760K, used 5631601K [0x0000000540000000, 0x0000000541001400, 0x00000007c0000000)  region size 16384K, 192 young (3145728K), 16 survivors (262144K) Metaspace       used 147267K, capacity 154132K, committed 154240K, reserved 1183744K  class space    used 17488K, capacity 18983K, committed 19072K, reserved 1048576K2024-03-06T19:41:01.013+0800: 83631.062: [GC pause (G1 Evacuation Pause) (young)Desired survivor size 201326592 bytes, new threshold 15 (max 15)- age   1:   15568080 bytes,   15568080 total- age   2:    6251328 bytes,   21819408 total- age   3:    9652016 bytes,   31471424 total- age   4:    7445392 bytes,   38916816 total- age   5:    7072600 bytes,   45989416 total- age   6:    6826088 bytes,   52815504 total- age   7:    5464128 bytes,   58279632 total- age   8:    5080296 bytes,   63359928 total- age   9:    5480008 bytes,   68839936 total- age  10:    6519056 bytes,   75358992 total- age  11:    7353016 bytes,   82712008 total- age  12:    6458120 bytes,   89170128 total- age  13:    4577144 bytes,   93747272 total- age  14:    5488152 bytes,   99235424 total- age  15:    5709096 bytes,  104944520 total 83631.063: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 10026, predicted base time: 181.10 ms, remaining time: 18.90 ms, target pause time: 200.00 ms] 83631.063: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 176 regions, survivors: 16 regions, predicted young region time: 16.59 ms] 83631.063: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 176 regions, survivors: 16 regions, old: 0 regions, predicted pause time: 197.68 ms, target pause time: 200.00 ms], 0.2069120 secs]   [Parallel Time: 202.5 ms, GC Workers: 8]      [GC Worker Start (ms):  83631062.9  83631062.9  83631062.9  83631062.9  83631062.9  83631062.9  83631062.9  83631062.9       Min: 83631062.9, Avg: 83631062.9, Max: 83631062.9, Diff: 0.1]      [Ext Root Scanning (ms):  173.3  173.1  173.0  172.3  172.6  173.2  173.1  172.8       Min: 172.3, Avg: 172.9, Max: 173.3, Diff: 1.0, Sum: 1383.4]         [Thread Roots (ms):  0.1  0.8  0.8  0.8  0.8  0.7  0.7  0.7          Min: 0.1, Avg: 0.7, Max: 0.8, Diff: 0.7, Sum: 5.4]         [StringTable Roots (ms):  187.6  189.9  189.8  188.5  190.1  189.7  189.8  189.8          Min: 187.6, Avg: 189.4, Max: 190.1, Diff: 2.5, Sum: 1515.0]         [Universe Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [JNI Handles Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [ObjectSynchronizer Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.2  0.0          Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.2]         [FlatProfiler Roots (ms):  0.0  0.1  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]         [Management Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [SystemDictionary Roots (ms):  0.0  0.0  0.0  1.4  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.2, Max: 1.4, Diff: 1.4, Sum: 1.4]         [CLDG Roots (ms):  2.8  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.4, Max: 2.8, Diff: 2.8, Sum: 2.8]         [JVMTI Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [CodeCache Roots (ms):  2.1  1.8  1.9  1.8  1.6  2.0  1.8  2.0          Min: 1.6, Avg: 1.9, Max: 2.1, Diff: 0.5, Sum: 14.9]         [CM RefProcessor Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [Wait For Strong CLD (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [Weak CLD Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [SATB Filtering (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]      [Update RS (ms):  2.0  1.8  1.9  1.8  1.5  2.0  1.7  1.9       Min: 1.5, Avg: 1.8, Max: 2.0, Diff: 0.5, Sum: 14.6]         [Processed Buffers:  36  55  78  88  78  88  96  44          Min: 36, Avg: 70.4, Max: 96, Diff: 60, Sum: 563]      [Scan RS (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0       Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]      [Code Root Scanning (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.1       Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]      [Object Copy (ms):  26.9  27.3  27.2  28.0  27.9  26.8  27.2  27.3       Min: 26.8, Avg: 27.3, Max: 28.0, Diff: 1.1, Sum: 218.7]      [Termination (ms):  0.0  0.0  0.1  0.0  0.1  0.0  0.0  0.0       Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]         [Termination Attempts:  1  98  101  88  89  67  87  82          Min: 1, Avg: 76.6, Max: 101, Diff: 100, Sum: 613]      [GC Worker Other (ms):  0.0  0.0  0.1  0.0  0.2  0.0  0.1  0.0       Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.5]      [GC Worker Total (ms):  202.2  202.2  202.3  202.2  202.4  202.2  202.2  202.1       Min: 202.1, Avg: 202.2, Max: 202.4, Diff: 0.2, Sum: 1617.7]      [GC Worker End (ms):  83631265.1  83631265.1  83631265.1  83631265.1  83631265.3  83631265.1  83631265.1  83631265.1       Min: 83631265.1, Avg: 83631265.1, Max: 83631265.3, Diff: 0.2]   [Code Root Fixup: 0.0 ms]   [Code Root Purge: 0.0 ms]   [Clear CT: 0.3 ms]   [Other: 4.1 ms]      [Choose CSet: 0.0 ms]      [Ref Proc: 2.6 ms]      [Ref Enq: 0.1 ms]      [Redirty Cards: 0.1 ms]         [Parallel Redirty:  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]         [Redirtied Cards:  2746  1281  1119  473  463  0  0  0          Min: 0, Avg: 760.2, Max: 2746, Diff: 2746, Sum: 6082]      [Humongous Register: 0.0 ms]         [Humongous Total: 0]         [Humongous Candidate: 0]      [Humongous Reclaim: 0.0 ms]         [Humongous Reclaimed: 0]      [Free CSet: 0.2 ms]         [Young Free CSet: 0.2 ms]         [Non-Young Free CSet: 0.0 ms]   [Eden: 2816.0M(2816.0M)->0.0B(3088.0M) Survivors: 256.0M->256.0M Heap: 5499.6M(10.0G)->2699.5M(10.0G)]Heap after GC invocations=215 (full 0): garbage-first heap   total 10485760K, used 2764320K [0x0000000540000000, 0x0000000541001400, 0x00000007c0000000)  region size 16384K, 16 young (262144K), 16 survivors (262144K) Metaspace       used 147267K, capacity 154132K, committed 154240K, reserved 1183744K  class space    used 17488K, capacity 18983K, committed 19072K, reserved 1048576K} [Times: user=1.62 sys=0.00, real=0.20 secs] 2024-03-06T19:41:01.221+0800: 83631.270: Total time for which application threads were stopped: 0.2095341 seconds, Stopping threads took: 0.0002805 seconds

观察日志，我们发现StringTable的耗时接近200ms。那什么是Stringtable？

其是JVM提供的针对字符串的缓存，有兴趣的同学可以自行查询资料。这里简单说明下，在很多场景中，会有大量重复的字符串，JVM为了节约空间，通过一个stringtale对字符串进行缓存。在使用字面量或者调用String的intern方法后，会询问底层的StringTable中是否存在一样的字符串。如果存在则返回，如果不存在，则放入stringtable。从而实现字符串缓存的功能。

在确认了是StringTable引发的问题后，我们首先来验证一下。

拉出某个Eden已大幅降低的容器，保证线上不受影响。
使用jmap -histoy pid命令强制gc，观察gc效果。

发现full gc后，Stringtable耗时大幅降低，但是eden依旧缩小。和预期不符，难道不是这个原因引起的？

仔细观察full gc后的日志，发现gc的真实消耗的确已经大幅降低。[Times: user=0.09 sys=0.05, real=0.02 secs]

但是为什么eden还是 4588.0M(4588.0M)->0.0B(2468.0M) ？

2024-03-02T17:45:27.190+0800: 400.411: Application time: 3.1242468 seconds{Heap before GC invocations=10 (full 0): garbage-first heap   total 10485760K, used 2790066K [0x0000000540800000, 0x0000000540c05000, 0x00000007c0800000)  region size 4096K, 632 young (2588672K), 18 survivors (73728K) Metaspace       used 136316K, capacity 142388K, committed 142592K, reserved 1173504K  class space    used 16630K, capacity 17875K, committed 17920K, reserved 1048576K2024-03-02T17:45:27.192+0800: 400.413: [Full GC (Heap Inspection Initiated GC) 2024-03-02T17:45:27.192+0800: 400.413: [Class Histogram (before full gc):  num     #instances         #bytes  class name----------------------------------------------   1:       7591768     1216542784  [C   2:        148811      488969976  [B   3:        205621      209905680  [I   4:       4033851      129083232  java.util.HashMap$Node  .....  .....
2024-03-02T17:50:33.086+0800: 706.307: Application time: 1.2055260 seconds{Heap before GC invocations=12 (full 2): garbage-first heap   total 10485760K, used 4982437K [0x0000000540800000, 0x0000000540c05000, 0x00000007c0800000)  region size 4096K, 1147 young (4698112K), 0 survivors (0K) Metaspace       used 136951K, capacity 143136K, committed 143360K, reserved 1175552K  class space    used 16658K, capacity 17919K, committed 17920K, reserved 1048576K2024-03-02T17:50:33.088+0800: 706.309: [GC pause (G1 Evacuation Pause) (young)Desired survivor size 301989888 bytes, new threshold 15 (max 15)- age   1:   43347440 bytes,   43347440 total 706.310: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 11409, predicted base time: 14.25 ms, remaining time: 185.75 ms, target pause time: 200.00 ms] 706.310: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1147 regions, survivors: 0 regions, predicted young region time: 183.96 ms]  706.310: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1147 regions, survivors: 0 regions, old: 0 regions, predicted pause time: 198.21 ms, target pause time: 200.00 ms], 0.0212262 secs]   [Parallel Time: 16.3 ms, GC Workers: 8]      [GC Worker Start (ms):  706310.8  706310.8  706310.8  706310.8  706310.8  706310.9  706310.9  706310.9       Min: 706310.8, Avg: 706310.8, Max: 706310.9, Diff: 0.1]      [Ext Root Scanning (ms):  1.3  3.1  2.0  1.2  1.2  1.2  1.2  1.2       Min: 1.2, Avg: 1.5, Max: 3.1, Diff: 1.9, Sum: 12.3]         [Thread Roots (ms):  0.6  0.1  0.6  0.6  0.6  0.6  0.6  0.6          Min: 0.1, Avg: 0.5, Max: 0.6, Diff: 0.5, Sum: 4.3]         [StringTable Roots (ms):  1.8  0.0  0.7  1.9  1.9  1.9  1.8  1.9          Min: 0.0, Avg: 1.5, Max: 1.9, Diff: 1.9, Sum: 11.8]         [Universe Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [JNI Handles Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [ObjectSynchronizer Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [FlatProfiler Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.1  0.0          Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]         [Management Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [SystemDictionary Roots (ms):  0.0  0.0  1.2  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.1, Max: 1.2, Diff: 1.2, Sum: 1.2]         [CLDG Roots (ms):  0.0  3.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.4, Max: 3.0, Diff: 3.0, Sum: 3.0]         [JVMTI Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [CodeCache Roots (ms):  1.8  1.2  2.2  1.8  1.8  1.8  1.8  1.8          Min: 1.2, Avg: 1.8, Max: 2.2, Diff: 1.0, Sum: 14.2]         [CM RefProcessor Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [Wait For Strong CLD (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [Weak CLD Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [SATB Filtering (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]      [Update RS (ms):  1.5  1.1  1.5  1.6  1.6  1.5  1.6  1.6       Min: 1.1, Avg: 1.5, Max: 1.6, Diff: 0.6, Sum: 12.0]         [Processed Buffers:  171  58  79  117  132  101  91  133          Min: 58, Avg: 110.2, Max: 171, Diff: 113, Sum: 882]      [Scan RS (ms):  0.2  0.2  0.2  0.2  0.2  0.2  0.3  0.1       Min: 0.1, Avg: 0.2, Max: 0.3, Diff: 0.1, Sum: 1.7]      [Code Root Scanning (ms):  0.0  0.0  0.5  0.0  0.0  0.0  0.0  0.0       Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5, Sum: 0.5]      [Object Copy (ms):  12.6  11.3  11.4  12.6  12.6  12.7  12.5  12.6       Min: 11.3, Avg: 12.3, Max: 12.7, Diff: 1.3, Sum: 98.4]      [Termination (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0       Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]         [Termination Attempts:  9  11  9  11  12  11  1  13          Min: 1, Avg: 9.6, Max: 13, Diff: 12, Sum: 77]      [GC Worker Other (ms):  0.3  0.0  0.1  0.1  0.1  0.1  0.0  0.0       Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 0.8]      [GC Worker Total (ms):  16.0  15.7  15.7  15.7  15.7  15.7  15.6  15.6       Min: 15.6, Avg: 15.7, Max: 16.0, Diff: 0.4, Sum: 125.7]      [GC Worker End (ms):  706326.8  706326.5  706326.5  706326.5  706326.5  706326.5  706326.5  706326.5       Min: 706326.5, Avg: 706326.5, Max: 706326.8, Diff: 0.3]   [Code Root Fixup: 0.0 ms]   [Code Root Purge: 0.0 ms]   [Clear CT: 0.6 ms]   [Other: 4.3 ms]      [Choose CSet: 0.0 ms]      [Ref Proc: 1.4 ms]      [Ref Enq: 0.1 ms]      [Redirty Cards: 0.1 ms]         [Parallel Redirty:  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]         [Redirtied Cards:  1776  898  719  458  0  0  0  0          Min: 0, Avg: 481.4, Max: 1776, Diff: 1776, Sum: 3851]      [Humongous Register: 0.1 ms]         [Humongous Total: 15]         [Humongous Candidate: 15]      [Humongous Reclaim: 0.1 ms]         [Humongous Reclaimed: 15]      [Free CSet: 0.8 ms]         [Young Free CSet: 0.8 ms]         [Non-Young Free CSet: 0.0 ms]   [Eden: 4588.0M(4588.0M)->0.0B(2468.0M) Survivors: 0.0B->132.0M Heap: 4865.7M(10.0G)->372.6M(10.0G)]Heap after GC invocations=13 (full 2): garbage-first heap   total 10485760K, used 381544K [0x0000000540800000, 0x0000000540c05000, 0x00000007c0800000)  region size 4096K, 33 young (135168K), 33 survivors (135168K) Metaspace       used 136951K, capacity 143136K, committed 143360K, reserved 1175552K  class space    used 16658K, capacity 17919K, committed 17920K, reserved 1048576K} [Times: user=0.09 sys=0.05, real=0.02 secs]  2024-03-02T17:50:33.110+0800: 706.331: Total time for which application threads were stopped: 0.0240503 seconds, Stopping threads took: 0.0002914 seconds

继续查看日志，发现真实耗时与预估耗时相差极大。

706.310: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1147 regions, survivors: 0 regions, old: 0 regions, predicted pause time: 198.21 ms, target pause time: 200.00 ms], 0.0212262 secs]   [Parallel Time: 16.3 ms, GC Workers: 8]      [GC Worker Start (ms):  706310.8  706310.8  706310.8  706310.8  706310.8  706310.9  706310.9  706310.9       Min: 706310.8, Avg: 706310.8, Max: 706310.9, Diff: 0.1]      [Ext Root Scanning (ms):  1.3  3.1  2.0  1.2  1.2  1.2  1.2  1.2       Min: 1.2, Avg: 1.5, Max: 3.1, Diff: 1.9, Sum: 12.3]         [Thread Roots (ms):  0.6  0.1  0.6  0.6  0.6  0.6  0.6  0.6          Min: 0.1, Avg: 0.5, Max: 0.6, Diff: 0.5, Sum: 4.3]         [StringTable Roots (ms):  1.8  0.0  0.7  1.9  1.9  1.9  1.8  1.9         [Times: user=0.09 sys=0.05, real=0.02 secs]  2024-03-02T17:50:33.110+0800: 706.331: Total time for which application threads were stopped: 0.0240503 seconds, Stopping threads took: 0.0002914 seconds

因此猜测因为是预估，而预估是对存量历史数据的计算，因此gc的自动调整机制还未跟上。

等待观察结果，大约2分钟后预估时间持续逐渐减小，并接近真实耗时，Eden空间持续增大。

效果图：

至此我们再次确认是stringtable引起的问题。

Stringtable为什么这么大？

要研究Stringtable为什么这么大，我们首先要知道Stringtable是如何产生的。

Stringtable里的对象是可以根据字面量和intern产生。字面量没问题，不可能产生大量的stringtable引用，剩下的就是intern。那怎么判断哪里大幅调用了？

arthas：但是arthas的实现原理是修改字节码插桩。string是常量类，无法实现监控调用。
全局搜索intern方法调用：但是底层存在不少调用代码，难以排查。
查看内存dump文件：从上面内容可知，young gc无法阻挡stringtable的增长，且full gc可以。因此在young gc结束之后进行dump，确保留下大量所需的数据。

在分析中间部分的string对象后，发现大量长度相等字母+数字的的字符串。猜测是罪魁祸首，复制后在日志中搜索，发现是下游响应的key。

结合网上文章与上述现象，我们初步判断可能就是【序列化key+随机key】导致的Stringtable逐渐增大。

在通过本地debug+底层代码分析后，终于发现jackson中调用intern的地方。

package com.fasterxml.jackson.core.util;
public final class InternCache extends ConcurrentHashMap<String,String>{    public String intern(String input) {        String result = get(input);        if (result != null) { return result; }            if (size() >= MAX_ENTRIES) {            synchronized (lock) {                if (size() >= MAX_ENTRIES) {                    clear();                }            }        }        result = input.intern();        put(result, result);        return result;    }{

再往上翻源码，可以发现是哈啰的soa中间件在解析响应值时，是通过jackson的序列化完成的。

package com.hellobike.soa.rpc.utils;
public class RpcParseUtil {
    public static <T> T parseResult(Serializer serializer,                                    String resultJson, Type returnType) {        if (resultJson == null) {            return null;        }
        if (returnType == void.class || returnType == Void.class || returnType == null) {            return null;        }
        return (T) serializer.decode(new StringWrapperByteBuf(resultJson), returnType, null);    }}

为什么jackson要这么做？

因为高并发接口场景下，会产生大量的序列化对象。而一般情况下，大量序列化对象的key都是固定不变的。如果不对key做缓存，那么绝大部分场景下，这些key都会成为重复的String对象存储在堆内存中。造成堆内存浪费。

而jackson作为序列化工具，一定是要考虑性能问题的。因此对这种重复的string对象做intern，实现字符串缓存。从而降低内存浪费。

至此，原因分析完毕。目前后续有两种解决途径

通知下游修改响应值，通过固定key的值避免序列化
通过修改此接口的序列化方式，关闭key缓存功能实现对随机key的缓存。

最后由于场景和配合问题，在不同的服务分别采用了两种方式。

溯本求源

上述排查过程大量环节是由于巧合发现的排查线索，当遇到其他场景我们应该如何排查呢？

通过-XX:G1LogLevel=finest打印更详细的日志，确认gc耗时原因
如果是stringtable增大引起，可以通过以下几种方式获取
通过jdk的sa-jdi工具中的sun.jvm.hotspot.memory.StringTablel类，尝试获取内存中stringtable引用的对象
通过内存分析
visualvm的Oql语句【select u from java.lang.String u where count(referrers(u)) == 0】分析young gc后依旧死亡的内存对象
通过mat或visualvm分析类似对象的引用关系
查看周边string对象分析所属环节

确认是被ItemBean引用。

解决途径

既然知道原因了，那解决起来就比较方便了。

由于已经确认我们的场景是soa响应的值的key是一个接近随机数的值（可能并非随机数，只是几乎不重样），那接下来就有两条路径能解决这个问题。

问题数据格式

{  "msg": "ok",  "code": 0,  "data": {    "outputs": {      "resultMap": {        "d32d56e8a1b34457af26d94a02c16cba": 0.32004153617699793,        "5bc9df88acea4da8bc829de13aa3afb2": 0.434173387080695,        "4900533c94bc443ea086de0936bfb59e": 0.5555319629335832,        "575360f1d0544f95b4c430dd8759839a": 0.908782703771497,        "00fba5cbd24749999dd719f9ec338f4b": 0.5919287549198423,        "1b7d5916f0454847a8ab90ce2a49e3a1": 0.5869043763908277,        "22d5a24229374a4b9d04c9bf371e7eb7": 0.09601455380787982,        "5a35beafa5d8492f85e5aa6ca3223129": 0.5386071460222013,        "61488dab533b4ec4ba6ae1ea61802c9d": 0.22098097729212451,        "0aaebddbe3e44677a47f5f4becb6ea51": 0.11469034353064256,        "21d6c73689514d00ae72b5eb7d1a1785": 0.7861965800121863,        "3709b11d98194297b788e6068c14760f": 0.8005589118418589,        "cb1b093134d049f9bebc49e8cc3d791d": 0.22801438927340578,        "6718a455bcf1416fb1839c79254c4d7c": 0.936704202541788,        "1200f2d71e5944fcbd645d7084cfbfe8": 0.43563843328912855,        "6f9ebebbeec247119a3bcf2b7c52b76f": 0.9499259171150704,        "cea737f0c1cb4be0969d692196287b61": 0.6442754643503128,        "a1da06ba41d74faf9c88374bf302e578": 0.46847853036230913,        "b5f1177c0bfa4f67a69d67606665a87b": 0.19847468498415466,        "efe70c3f037e44cb995dd8a6a6aff8f1": 0.6239117074575771      }    },    "code": 0,    "cost": 43,    "seqNo": "1097369524402581504",    "extraOutputs": {      "traceId": "96939a6e0c65b3f4e6ebadcaa5e4c4e8",      "decision_version": 21    },    "projectId": 4,    "version": 21  }}

解决途径：

修改soa对接参数，将这个接近随机数的值放到value中
对问题接口不进行key的序列化。

AlgorithmMarket服务：

通过修改响应数据，将响应数据的类似随机key放入到list中，解决问题。

AppProphetOnlineService服务：

通过修改soa底层序列化方式，不对key做intern对方式决解问题。

private SoaClient getNonInternSoaClient(String clientName, String group) {    ClientConf clientConf = new ClientConf();    clientConf.setClientName(clientName);    clientConf.setGroup(group);    SoaClient client = CustomizedClientCreator.createSoaClient2(clientConf);    client.start();    return client;}

public class CustomizedBlockingGrpcClientInvoker extends GrpcClientInvoker {
    protected final Serializer serializer2;

    public CustomizedBlockingGrpcClientInvoker(ManagedChannel channel, InternalClientInvokerOption option, DynamicConfigHelper helper) {        super(channel, option, helper);        this.serializer2 = new CustomizedJacksonSerializer();    }
    public SoaResponse doInvoke(SoaRequest request) throws RpcException {        try {            return this.invokeInternal(request);        } catch (RpcException var3) {            return SoaResponse.buildErrorResponse(var3);        } catch (Throwable var4) {            return SoaResponse.buildErrorResponse(new RpcException(RpcResultStatus.UNKNOWN_ERROR, var4));        }    }
    private SoaResponse invokeInternal(SoaRequest soaRequest) throws RpcException {        String requestJson = this.buildRequestJson(soaRequest);        SoaInvokerRequest request = this.buildSoaRequest(soaRequest, requestJson);        SoaInvokerServiceGrpc.SoaInvokerServiceBlockingStub soaInvokerServiceStub = (SoaInvokerServiceGrpc.SoaInvokerServiceBlockingStub) (new BlockingStubBuilder(this.channel, soaRequest)).build();        SoaInvokerResponse soaInvokerResponse = soaInvokerServiceStub.call(request);        checkResponseCode(soaInvokerResponse);        this.helper.getSoaClientLogger().logForResponse(soaRequest, soaInvokerResponse.getResultJson());
        Object response = RpcParseUtil.parseResult(this.serializer2, soaInvokerResponse.getResultJson(), soaRequest.getReturnType());        return SoaResponse.buildSuccessResponse(response);    }

}

 private static ObjectMapper initObjectMapper() {        JsonFactory jsonFactory=new JsonFactory().disable(JsonFactory.Feature.INTERN_FIELD_NAMES);        SimpleModule module = new SimpleModule();        module.addSerializer(LocalDate.class, new LocalDateSerializer()).addDeserializer(LocalDate.class, new LocalDateDeserializer());        module.addSerializer(LocalDateTime.class, new LocalDateTimeSerializer()).addDeserializer(LocalDateTime.class, new LocalDateTimeDeserializer());        ObjectMapper mapper= (new ObjectMapper(jsonFactory))                .registerModule(module)                .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)                .configure(SerializationFeature.WRITE_NULL_MAP_VALUES, false)                .setSerializationInclusion(JsonInclude.Include.NON_NULL);        return mapper;    }

其他坑

fastjson

fastjson 在1.1.24之前对所有的 json 的 key 使用了 intern 方法，缓存到了字符串常量池中，这样每次读取的时候就会非常快，大大减少时间和空间。而且 json 的 key 通常都是不变的。这个地方没有考虑到大量的 json key 如果是变化的，那就会给字符串常量池带来很大的负担。

这个问题 fastjson 在1.1.24版本中已经将这个漏洞修复了。程序加入了一个最大的缓存大小，超过这个大小后就不会再往字符串常量池中放了。

1.1.24版本的com.alibaba.fastjson.parser.SymbolTable#addSymbol() Line:113]代码

public static final int MAX_SIZE = 1024;if (size >= MAX_SIZE) {return new String(buffer, offset, len);}

这个问题是70w 数据量时候的引发的，如果是几百万的数据量的话可能就不只是30ms 的问题了。因此在使用系统级提供的String#intern方式一定要慎重！

ClassLoader

既然Ext Root Scanning的耗时会导致上述问题，那我们常用的ClassLoader也是根节点，使用的也很频繁，是否会也会导致此类问题？

本地尝试通过代码构建一个不停创建ClassLoade并加载class文件的环境，同时打印gc日志。

测试demo：

MyClassLoader loader = new MyClassLoader(ClassLoadGCTest.class.getClassLoader());Object targetBiz = loader.loadClass("com.jiaxz.hotload.DiffBizMethod").newInstance();targetBiz = null;

结果如下：

[Parallel Time: 103.4 ms, GC Workers: 3]      [GC Worker Start (ms):  48087.4  48087.4  48087.4       Min: 48087.4, Avg: 48087.4, Max: 48087.4, Diff: 0.0]      [Ext Root Scanning (ms):  47.0  34.8  47.0       Min: 34.8, Avg: 43.0, Max: 47.0, Diff: 12.2, Sum: 128.9]         [Thread Roots (ms):  0.0  0.1  0.1          Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2]         [StringTable Roots (ms):  0.0  0.0  0.1          Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]         [Universe Roots (ms):  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [JNI Handles Roots (ms):  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [ObjectSynchronizer Roots (ms):  0.0  0.0  0.2          Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.2]         [FlatProfiler Roots (ms):  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [Management Roots (ms):  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [SystemDictionary Roots (ms):  0.0  30.3  0.0          Min: 0.0, Avg: 10.1, Max: 30.3, Diff: 30.3, Sum: 30.3]         [CLDG Roots (ms):  0.6  0.0  0.0          Min: 0.0, Avg: 0.2, Max: 0.6, Diff: 0.6, Sum: 0.6]         [JVMTI Roots (ms):  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [CodeCache Roots (ms):  0.1  0.0  1.2          Min: 0.0, Avg: 0.4, Max: 1.2, Diff: 1.2, Sum: 1.3]         [CM RefProcessor Roots (ms):  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [Wait For Strong CLD (ms):  0.0  0.0  0.2          Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.2]         [Weak CLD Roots (ms):  46.5  16.6  46.5          Min: 16.6, Avg: 36.5, Max: 46.5, Diff: 29.9, Sum: 109.5]         [SATB Filtering (ms):  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]

由于SystemDictionary是JVM内部的一个关键数据结构，用于存储已加载类的相关信息。所以当类加载器加载越来越多的类，且并未卸载时也会导致Ext Root Scanning增长。

哈啰场景下如何定位解决

在哈啰环境下，若使用了G1GC，同时在SOA出参入参的时候使用了大量不重复的值作为key，即能重现上述场景。可通过以下方式定位并临时解决。

临时定位解决

1. 重写InternCache类

在工程目录下重写InternCache类。注意需要相同包路径。

package com.fasterxml.jackson.core.util;
import java.util.concurrent.ConcurrentHashMap;
public final class InternCache extends ConcurrentHashMap<String, String> {
    //定义默认值false    private volatile boolean onOff = false;        static {        //初始化onOff变量        onOff = System.getProperty("onOff") != null;        log.info("onOff:{}", onOff);    }
    public String intern(String input) {        //若开启开关，直接返回结果        if(onOff){            return input;        }        String result = (String)this.get(input);        ...    }}

通过重写jackson的InternCache，加入开关控制是否开启序列化。

2. 灰度发布

这里可以将onOff默认值改为true，通过值发布一台观察gc日志。也可以将onOff默认值改成false，发布单台机器后通过debug容器修改环境变量，实现灰度测试。

debug机器环境修改：

打开启动脚本：vim init.script
增加环境变量：-DonOff=true
启动脚本：bash init.script docker_start

3. 查看效果

灰度完之后，可通过gc日志查看ext root scan查看扫描耗时是否缩小。

4. 验证

上述方式已在AppAlgorithmMarket服务验证通过。

debug节点GC：[Ext Root Scanning (ms): Min: 1.2, Avg: 1.5, Max: 2.6, Diff: 1.4, Sum: 11.7]

正常节点GC：[Ext Root Scanning (ms): Min: 51.6, Avg: 51.7, Max: 61.2, Diff: 0.6, Sum: 460.0]

全量修改建议

中间件实现apollo动态变更soa序列化方式，并使用JsonFactory jsonFactory=new JsonFactory() .disable(JsonFactory.Feature.INTERN_FIELD_NAMES)关闭key的缓存。

The End

最后最后推荐下两个不错的产品，感兴趣的可以

上车了，这里只吸引同频的人，如果加入几分钟

就直接退出的就不要来了，浪费我的名额。

第一个来自码哥的小报童，仅需 19 元，刚开始更新，需要的可以扫码加入。

本专栏内容涵盖 Java 基础、Java 高级进阶、Redis、MySQL、消息中间件、微服务

架构设计等面试必考点、面试高频点。本专栏不只是单纯教大家学会背八股文知识，

更多是结合实际大厂高并发项目的场景，去学习面试技术要点。从面试官的角度去出

发，介绍互联网 Java 流行技术体系各个面试要点。

本专栏适合于准备进阶 Java 后端技术、有面试或提升技术需求，希望学习互联网大

厂 Java 流行技术体系的程序员和面试官，助你拿到名企高薪 Offer。

第二个是我的知识星球，仅需 189 元，限时特惠

赠送老徐避坑星球（139元）+ 我的副业我陪伴

群（39.9），需要的可以扫码加入。

关于星球介绍点击：

超 500 万字详解，从零到一带你彻底吃透 Kafka + RocketMQ

小红书实战

加入星球的会赠送老徐的副业避坑星球（价值

139），还会送我的副业陪伴群一年（价值

39.9），相当于免费加入我的技术星球。

需要的加入后文末加我好友来领取福利。

（副业是未来必须要做的事情，

可以趁现在早点布局）

需要续费的扫这个，优惠15元

另外必须要注意的是上车的老铁一定要加我微信

好友，拉你们加入星球专属交流群。

http://mp.weixin.qq.com/s?__biz=Mzg3MTcxMDgxNA==&mid=2247509996&idx=1&sn=fe7f694d9c86058857d8dd5ca184778e

华仔聊技术

聊聊后端技术架构以及中间件源码