G1GC深度探索--Young gc耗时持续增长原因分析

科技   2024-09-20 17:49   河北  

欢迎加入华仔的星球,你将获得: 专属的中间件专栏 / 1v1 提问 / 简历修改指导/ 学习打卡 / 每月赠书 / 社群讨论

截止目前,累计输出 500w+ 字,讲解图 2000+ 张,还在持续爆肝中.. 后续还会上新更多项目和专栏,目标是打造地表最强中间件星球,戳我加入学习,已有420+小伙伴加入电商实战项目火热更新中,结束时会有简历指导包装,需要的抓紧来

这里说几点,解答一些疑惑,可以认真看下:

1、星球内容只会越来越完善,价格越来越贵,一年时间从69元开始发售到现在已经涨到了189元,下个月会涨价一波,所以需要抓紧来,越早越受益,别错过(中秋福利已取消,如果你真的需要来加我好友私我)

2、只筛选认可且支持我的老铁,我不喜欢白嫖怪,尊重别人就是尊重自己。

3、对于星球年费说下,只是到期后新内容看不到,已经更完的内容相当于一次付费永久看,所以认可我内容的可以放心来,有疑问文末加我好友进行答疑

4、除专栏外,我会提供专属的一对一答疑服务,包括不限于工作中、专栏中遇到的问题,简历修改指导、职业规划服务等。这也是星球的特色服务。

奇怪现象

你是否遇到过服务内部即使设置了线程超时时间,上游依旧报超时异常?



你是否遇到过下游服务快速响应,当前服务后续方法无耗时逻辑。但偏偏整体执行耗时极长?



你是否遇到过gc耗时规律性增长?却与流量不成比例?



你是否遇到过Eden空间突然降低且维持很久?



背景

那是一个平静的午后,突然的gc耗时告警惊扰了打工人专心编码的心。打开gc监控一看,发现虽然只有单个容器报警,但是大部分gc已经飙升到接近1s了。



将报警机器快速拉出负载后,开始查看分析gc情况。

  • 没有full gc

  • 年轻代频率大幅提高,耗时大幅提高

  • 流量并未有较大起伏

  • 使用的G1GC,设置的最大停顿时间是200ms


头痛医头

既然只是年轻代频率和耗时大幅提高,那不是手到擒来?--增加年轻代空间。


一个更大的年轻代,自然能让gc的频率降低,总吞吐提高。


进入容器,查看堆内存大小。发现服务设置的堆内存为10G,未手动设置Eden大小。


默认情况下年轻代占1/3。理论上不应该在流量没有起伏的情况下gc突然如此频繁。


查看gc日志

发现在某个时间段内,gc频率极高。同时年轻代大幅缩小。日志内容如下

[Parallel Time: 206.4 ms, GC Workers: 8]      [GC Worker Start (ms): Min: 3867072696.3, Avg: 3867072696.3, Max: 3867072696.4, Diff: 0.1]      [Ext Root Scanning (ms): Min: 179.4, Avg: 183.6, Max: 185.9, Diff: 6.5, Sum: 1468.4]      [Update RS (ms): Min: 2.3, Avg: 2.5, Max: 3.0, Diff: 0.7, Sum: 20.3]         [Processed Buffers: Min: 56, Avg: 76.2, Max: 100, Diff: 44, Sum: 610]      [Scan RS (ms): Min: 0.2, Avg: 0.3, Max: 0.4, Diff: 0.2, Sum: 2.2]      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]      [Object Copy (ms): Min: 17.6, Avg: 19.7, Max: 24.0, Diff: 6.3, Sum: 157.8]      [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.5]         [Termination Attempts: Min: 1, Avg: 100.8, Max: 128, Diff: 127, Sum: 806]      [GC Worker Other (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.8]      [GC Worker Total (ms): Min: 206.2, Avg: 206.3, Max: 206.4, Diff: 0.2, Sum: 1650.1]      [GC Worker End (ms): Min: 3867072902.5, Avg: 3867072902.6, Max: 3867072902.6, Diff: 0.1]   [Code Root Fixup: 0.0 ms]   [Code Root Purge: 0.0 ms]   [Clear CT: 0.7 ms]   [Other: 7.7 ms]      [Choose CSet: 0.0 ms]      [Ref Proc: 5.1 ms]      [Ref Enq: 0.1 ms]      [Redirty Cards: 0.1 ms]      [Humongous Register: 0.1 ms]      [Humongous Reclaim: 0.0 ms]      [Free CSet: 1.0 ms]   [Eden: 6120.0M(6120.0M)->0.0B(1356.0M) Survivors: 24.0M->24.0M Heap: 8625.7M(10.0G)->2506.0M(10.0G)]


上图可以分析出,Eden大幅缩小。


由于流量不变,young gc会更加频繁,从而大量存活对象不停被复制,最后导致单位时间内有gc耗时大幅提高。


上述猜测合情合理。为了验证想法,通过jvm监控查看分区使用情况。




观察上图,我们发现

图一已申请空间:

年轻代已申请空间快速抖动-->年轻代总空间降至低位-->年轻代总空间保持低位-->年轻代总空间恢复高位。


图二已使用空间:

年轻代正常gc-->随着总空间降低,年轻代可使用空间持续保持低位-->gc次数突增。


既然知道是年轻代原因,那就先着手解决Eden缩小的问题。


当前是G1GC,通过设置参数强制增大年轻代最小浮动空间解决-XX:G1NewSizePercent=5%--->60%


修改完重启之后,无论gc频率和gc耗时都大幅降低。


整体数据:



单容器数据:


修改完后,无论总gc单次耗时还是gc频率都大幅降低。粗算整体降低至25%。young gc耗时报警消失。


东风又起

在长时间持续观察每个容器后,发现每个容器都迎来gc突刺,观察发现因为都发生了full gc。



但是以经验来看,G1GC有Mix gc机制,即在young gc时,会尝试通过混合gc的模式清理老年代。理论上不太可能产生full gc。那究竟发生了什么?


查看gc日志,发现并未出现关于mix gc的部分。那我们先了解下Mix gc以及触发机制。


Mixed GC触发条件

  • 当达到IHOP阈值,-XX:InitiatingHeapOccupancyPercent(默认45%)时,即,当使用内存占到堆总大小的45%的时候,G1将开始并发标记阶段 + Mixed GC

  • 当全局并发标记结束后,统计出所有可能被回收的垃圾占Heap的比例值,如果超过5%(-XX:G1HeapWastePercent默认值5%),就会触发多轮Mixed GC

  • 调用System.gc,且开启-XX:ExplicitGCInvokesConcurrent


答案呼之欲出,mixed gc的触发阈值默认是45%(8b12版本之前是堆整体使用率,之后是老年代使用空间占比),但是年轻代最小也堆是60%,导致老年代无法达到40%,因此无法触发mixed gc。


于是,修改G1NewSizePercent和InitiatingHeapOccupancyPercent后我们继续观察。



修改不再出现full gc,gc周期也被拉长整体稳定增长。但是峰值gc还是高达1s。


我们继续查看gc日志,尝试继续优化。


精益求精

在仔细查看gc日志后,我发现大量类似以下的gc日志。

144392.185: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 9537847296 bytes, allocation request: 2097168 bytes, threshold: 4831838190 bytes (45.00 %), source: concurrent humongous allocation]


在 G1 中,当对象超过一个 Region 的 50% ,就会判定为大对象,如果一个对象太大,会连续使用多个Humongous区域存放,且占用老年代空间。因此尝试放大G1HeapRegionSize。


正常容器(Regionsize=4M)




debug容器(Regionsize=8M)




可以发现。修改完regionSize后,由于部分的大对象不再是大对象,因此不再进入老年代。从而提高了延长了老年代达到阈值的时间。同时年轻代老年代自动调整导致的毛刺更加平缓。同时根据上图也能发现年轻代整体的可使用空间也远高于小region。

至此,gc优化结束。


但是真的结束了么?


为什么Eden会持续会缩小呢?带着这个问题,开始继续研究。


柳暗花明

这里感谢张承老师的提醒。他提到曾经出现过这样的情况,可以通过-XX:G1LogLevel=finest打印更详细的日志,有可能是StringTable过大引起的扫描耗时,之前出现过类似的情况,原因是序列化问题导致StringTable过大。


扫描耗时!一语惊醒梦中人。


我们知道gc环节中,最关键的无非是根结点扫描和对象复制的过程。而其中对象复制又是耗时较多的。但是,但凡其他某个环节耗时上来了,在遇到设置的200ms的gc耗时时间后,必然意味着留给young gc复制对象的时间的缩短,而复制对象的时间被压缩必然伴随着Eden空间的缩小。至此,对Eden的缩小有了个猜测。这里,我们后续再做验证。


通过参数-XX:G1LogLevel=finest获取日志。

{Heap before GC invocations=214 (full 0): garbage-first heap   total 10485760K, used 5631601K [0x0000000540000000, 0x0000000541001400, 0x00000007c0000000)  region size 16384K, 192 young (3145728K), 16 survivors (262144K) Metaspace       used 147267K, capacity 154132K, committed 154240K, reserved 1183744K  class space    used 17488K, capacity 18983K, committed 19072K, reserved 1048576K2024-03-06T19:41:01.013+0800: 83631.062: [GC pause (G1 Evacuation Pause) (young)Desired survivor size 201326592 bytes, new threshold 15 (max 15)- age   1:   15568080 bytes,   15568080 total- age   2:    6251328 bytes,   21819408 total- age   3:    9652016 bytes,   31471424 total- age   4:    7445392 bytes,   38916816 total- age   5:    7072600 bytes,   45989416 total- age   6:    6826088 bytes,   52815504 total- age   7:    5464128 bytes,   58279632 total- age   8:    5080296 bytes,   63359928 total- age   9:    5480008 bytes,   68839936 total- age  10:    6519056 bytes,   75358992 total- age  11:    7353016 bytes,   82712008 total- age  12:    6458120 bytes,   89170128 total- age  13:    4577144 bytes,   93747272 total- age  14:    5488152 bytes,   99235424 total- age  15:    5709096 bytes,  104944520 total 83631.063: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 10026, predicted base time: 181.10 ms, remaining time: 18.90 ms, target pause time: 200.00 ms] 83631.063: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 176 regions, survivors: 16 regions, predicted young region time: 16.59 ms] 83631.063: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 176 regions, survivors: 16 regions, old: 0 regions, predicted pause time: 197.68 ms, target pause time: 200.00 ms], 0.2069120 secs]   [Parallel Time: 202.5 ms, GC Workers: 8]      [GC Worker Start (ms):  83631062.9  83631062.9  83631062.9  83631062.9  83631062.9  83631062.9  83631062.9  83631062.9       Min: 83631062.9, Avg: 83631062.9, Max: 83631062.9, Diff: 0.1]      [Ext Root Scanning (ms):  173.3  173.1  173.0  172.3  172.6  173.2  173.1  172.8       Min: 172.3, Avg: 172.9, Max: 173.3, Diff: 1.0, Sum: 1383.4]         [Thread Roots (ms):  0.1  0.8  0.8  0.8  0.8  0.7  0.7  0.7          Min: 0.1, Avg: 0.7, Max: 0.8, Diff: 0.7, Sum: 5.4]         [StringTable Roots (ms):  187.6  189.9  189.8  188.5  190.1  189.7  189.8  189.8          Min: 187.6, Avg: 189.4, Max: 190.1, Diff: 2.5, Sum: 1515.0]         [Universe Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [JNI Handles Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [ObjectSynchronizer Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.2  0.0          Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.2]         [FlatProfiler Roots (ms):  0.0  0.1  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]         [Management Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [SystemDictionary Roots (ms):  0.0  0.0  0.0  1.4  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.2, Max: 1.4, Diff: 1.4, Sum: 1.4]         [CLDG Roots (ms):  2.8  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.4, Max: 2.8, Diff: 2.8, Sum: 2.8]         [JVMTI Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [CodeCache Roots (ms):  2.1  1.8  1.9  1.8  1.6  2.0  1.8  2.0          Min: 1.6, Avg: 1.9, Max: 2.1, Diff: 0.5, Sum: 14.9]         [CM RefProcessor Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [Wait For Strong CLD (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [Weak CLD Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [SATB Filtering (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]      [Update RS (ms):  2.0  1.8  1.9  1.8  1.5  2.0  1.7  1.9       Min: 1.5, Avg: 1.8, Max: 2.0, Diff: 0.5, Sum: 14.6]         [Processed Buffers:  36  55  78  88  78  88  96  44          Min: 36, Avg: 70.4, Max: 96, Diff: 60, Sum: 563]      [Scan RS (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0       Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]      [Code Root Scanning (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.1       Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]      [Object Copy (ms):  26.9  27.3  27.2  28.0  27.9  26.8  27.2  27.3       Min: 26.8, Avg: 27.3, Max: 28.0, Diff: 1.1, Sum: 218.7]      [Termination (ms):  0.0  0.0  0.1  0.0  0.1  0.0  0.0  0.0       Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]         [Termination Attempts:  1  98  101  88  89  67  87  82          Min: 1, Avg: 76.6, Max: 101, Diff: 100, Sum: 613]      [GC Worker Other (ms):  0.0  0.0  0.1  0.0  0.2  0.0  0.1  0.0       Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.5]      [GC Worker Total (ms):  202.2  202.2  202.3  202.2  202.4  202.2  202.2  202.1       Min: 202.1, Avg: 202.2, Max: 202.4, Diff: 0.2, Sum: 1617.7]      [GC Worker End (ms):  83631265.1  83631265.1  83631265.1  83631265.1  83631265.3  83631265.1  83631265.1  83631265.1       Min: 83631265.1, Avg: 83631265.1, Max: 83631265.3, Diff: 0.2]   [Code Root Fixup: 0.0 ms]   [Code Root Purge: 0.0 ms]   [Clear CT: 0.3 ms]   [Other: 4.1 ms]      [Choose CSet: 0.0 ms]      [Ref Proc: 2.6 ms]      [Ref Enq: 0.1 ms]      [Redirty Cards: 0.1 ms]         [Parallel Redirty:  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]         [Redirtied Cards:  2746  1281  1119  473  463  0  0  0          Min: 0, Avg: 760.2, Max: 2746, Diff: 2746, Sum: 6082]      [Humongous Register: 0.0 ms]         [Humongous Total: 0]         [Humongous Candidate: 0]      [Humongous Reclaim: 0.0 ms]         [Humongous Reclaimed: 0]      [Free CSet: 0.2 ms]         [Young Free CSet: 0.2 ms]         [Non-Young Free CSet: 0.0 ms]   [Eden: 2816.0M(2816.0M)->0.0B(3088.0M) Survivors: 256.0M->256.0M Heap: 5499.6M(10.0G)->2699.5M(10.0G)]Heap after GC invocations=215 (full 0): garbage-first heap   total 10485760K, used 2764320K [0x0000000540000000, 0x0000000541001400, 0x00000007c0000000)  region size 16384K, 16 young (262144K), 16 survivors (262144K) Metaspace       used 147267K, capacity 154132K, committed 154240K, reserved 1183744K  class space    used 17488K, capacity 18983K, committed 19072K, reserved 1048576K} [Times: user=1.62 sys=0.00, real=0.20 secs] 2024-03-06T19:41:01.221+0800: 83631.270: Total time for which application threads were stopped: 0.2095341 seconds, Stopping threads took: 0.0002805 seconds


观察日志,我们发现StringTable的耗时接近200ms。那什么是Stringtable?


其是JVM提供的针对字符串的缓存,有兴趣的同学可以自行查询资料。这里简单说明下,在很多场景中,会有大量重复的字符串,JVM为了节约空间,通过一个stringtale对字符串进行缓存。在使用字面量或者调用String的intern方法后,会询问底层的StringTable中是否存在一样的字符串。如果存在则返回,如果不存在,则放入stringtable。从而实现字符串缓存的功能。


在确认了是StringTable引发的问题后,我们首先来验证一下。

  • 拉出某个Eden已大幅降低的容器,保证线上不受影响。

  • 使用jmap -histoy pid命令强制gc,观察gc效果。


发现full gc后,Stringtable耗时大幅降低,但是eden依旧缩小。和预期不符,难道不是这个原因引起的?


仔细观察full gc后的日志,发现gc的真实消耗的确已经大幅降低。[Times: user=0.09 sys=0.05, real=0.02 secs]


但是为什么eden还是 4588.0M(4588.0M)->0.0B(2468.0M) ?

2024-03-02T17:45:27.190+0800: 400.411: Application time: 3.1242468 seconds{Heap before GC invocations=10 (full 0): garbage-first heap   total 10485760K, used 2790066K [0x0000000540800000, 0x0000000540c05000, 0x00000007c0800000)  region size 4096K, 632 young (2588672K), 18 survivors (73728K) Metaspace       used 136316K, capacity 142388K, committed 142592K, reserved 1173504K  class space    used 16630K, capacity 17875K, committed 17920K, reserved 1048576K2024-03-02T17:45:27.192+0800: 400.413: [Full GC (Heap Inspection Initiated GC) 2024-03-02T17:45:27.192+0800: 400.413: [Class Histogram (before full gc):  num     #instances         #bytes  class name----------------------------------------------   1:       7591768     1216542784  [C   2:        148811      488969976  [B   3:        205621      209905680  [I   4:       4033851      129083232  java.util.HashMap$Node  .....  .....
2024-03-02T17:50:33.086+0800: 706.307: Application time: 1.2055260 seconds{Heap before GC invocations=12 (full 2): garbage-first heap total 10485760K, used 4982437K [0x0000000540800000, 0x0000000540c05000, 0x00000007c0800000) region size 4096K, 1147 young (4698112K), 0 survivors (0K) Metaspace used 136951K, capacity 143136K, committed 143360K, reserved 1175552K class space used 16658K, capacity 17919K, committed 17920K, reserved 1048576K2024-03-02T17:50:33.088+0800: 706.309: [GC pause (G1 Evacuation Pause) (young)Desired survivor size 301989888 bytes, new threshold 15 (max 15)- age 1: 43347440 bytes, 43347440 total 706.310: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 11409, predicted base time: 14.25 ms, remaining time: 185.75 ms, target pause time: 200.00 ms] 706.310: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1147 regions, survivors: 0 regions, predicted young region time: 183.96 ms] 706.310: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1147 regions, survivors: 0 regions, old: 0 regions, predicted pause time: 198.21 ms, target pause time: 200.00 ms], 0.0212262 secs] [Parallel Time: 16.3 ms, GC Workers: 8] [GC Worker Start (ms): 706310.8 706310.8 706310.8 706310.8 706310.8 706310.9 706310.9 706310.9 Min: 706310.8, Avg: 706310.8, Max: 706310.9, Diff: 0.1] [Ext Root Scanning (ms): 1.3 3.1 2.0 1.2 1.2 1.2 1.2 1.2 Min: 1.2, Avg: 1.5, Max: 3.1, Diff: 1.9, Sum: 12.3] [Thread Roots (ms): 0.6 0.1 0.6 0.6 0.6 0.6 0.6 0.6 Min: 0.1, Avg: 0.5, Max: 0.6, Diff: 0.5, Sum: 4.3] [StringTable Roots (ms): 1.8 0.0 0.7 1.9 1.9 1.9 1.8 1.9 Min: 0.0, Avg: 1.5, Max: 1.9, Diff: 1.9, Sum: 11.8] [Universe Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [JNI Handles Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [ObjectSynchronizer Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [FlatProfiler Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1] [Management Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [SystemDictionary Roots (ms): 0.0 0.0 1.2 0.0 0.0 0.0 0.0 0.0 Min: 0.0, Avg: 0.1, Max: 1.2, Diff: 1.2, Sum: 1.2] [CLDG Roots (ms): 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 Min: 0.0, Avg: 0.4, Max: 3.0, Diff: 3.0, Sum: 3.0] [JVMTI Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [CodeCache Roots (ms): 1.8 1.2 2.2 1.8 1.8 1.8 1.8 1.8 Min: 1.2, Avg: 1.8, Max: 2.2, Diff: 1.0, Sum: 14.2] [CM RefProcessor Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Wait For Strong CLD (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Weak CLD Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [SATB Filtering (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Update RS (ms): 1.5 1.1 1.5 1.6 1.6 1.5 1.6 1.6 Min: 1.1, Avg: 1.5, Max: 1.6, Diff: 0.6, Sum: 12.0] [Processed Buffers: 171 58 79 117 132 101 91 133 Min: 58, Avg: 110.2, Max: 171, Diff: 113, Sum: 882] [Scan RS (ms): 0.2 0.2 0.2 0.2 0.2 0.2 0.3 0.1 Min: 0.1, Avg: 0.2, Max: 0.3, Diff: 0.1, Sum: 1.7] [Code Root Scanning (ms): 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5, Sum: 0.5] [Object Copy (ms): 12.6 11.3 11.4 12.6 12.6 12.7 12.5 12.6 Min: 11.3, Avg: 12.3, Max: 12.7, Diff: 1.3, Sum: 98.4] [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Termination Attempts: 9 11 9 11 12 11 1 13 Min: 1, Avg: 9.6, Max: 13, Diff: 12, Sum: 77] [GC Worker Other (ms): 0.3 0.0 0.1 0.1 0.1 0.1 0.0 0.0 Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 0.8] [GC Worker Total (ms): 16.0 15.7 15.7 15.7 15.7 15.7 15.6 15.6 Min: 15.6, Avg: 15.7, Max: 16.0, Diff: 0.4, Sum: 125.7] [GC Worker End (ms): 706326.8 706326.5 706326.5 706326.5 706326.5 706326.5 706326.5 706326.5 Min: 706326.5, Avg: 706326.5, Max: 706326.8, Diff: 0.3] [Code Root Fixup: 0.0 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.6 ms] [Other: 4.3 ms] [Choose CSet: 0.0 ms] [Ref Proc: 1.4 ms] [Ref Enq: 0.1 ms] [Redirty Cards: 0.1 ms] [Parallel Redirty: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Redirtied Cards: 1776 898 719 458 0 0 0 0 Min: 0, Avg: 481.4, Max: 1776, Diff: 1776, Sum: 3851] [Humongous Register: 0.1 ms] [Humongous Total: 15] [Humongous Candidate: 15] [Humongous Reclaim: 0.1 ms] [Humongous Reclaimed: 15] [Free CSet: 0.8 ms] [Young Free CSet: 0.8 ms] [Non-Young Free CSet: 0.0 ms] [Eden: 4588.0M(4588.0M)->0.0B(2468.0M) Survivors: 0.0B->132.0M Heap: 4865.7M(10.0G)->372.6M(10.0G)]Heap after GC invocations=13 (full 2): garbage-first heap total 10485760K, used 381544K [0x0000000540800000, 0x0000000540c05000, 0x00000007c0800000) region size 4096K, 33 young (135168K), 33 survivors (135168K) Metaspace used 136951K, capacity 143136K, committed 143360K, reserved 1175552K class space used 16658K, capacity 17919K, committed 17920K, reserved 1048576K} [Times: user=0.09 sys=0.05, real=0.02 secs] 2024-03-02T17:50:33.110+0800: 706.331: Total time for which application threads were stopped: 0.0240503 seconds, Stopping threads took: 0.0002914 seconds


继续查看日志,发现真实耗时与预估耗时相差极大。

706.310: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1147 regions, survivors: 0 regions, old: 0 regions, predicted pause time: 198.21 ms, target pause time: 200.00 ms], 0.0212262 secs]   [Parallel Time: 16.3 ms, GC Workers: 8]      [GC Worker Start (ms):  706310.8  706310.8  706310.8  706310.8  706310.8  706310.9  706310.9  706310.9       Min: 706310.8, Avg: 706310.8, Max: 706310.9, Diff: 0.1]      [Ext Root Scanning (ms):  1.3  3.1  2.0  1.2  1.2  1.2  1.2  1.2       Min: 1.2, Avg: 1.5, Max: 3.1, Diff: 1.9, Sum: 12.3]         [Thread Roots (ms):  0.6  0.1  0.6  0.6  0.6  0.6  0.6  0.6          Min: 0.1, Avg: 0.5, Max: 0.6, Diff: 0.5, Sum: 4.3]         [StringTable Roots (ms):  1.8  0.0  0.7  1.9  1.9  1.9  1.8  1.9         [Times: user=0.09 sys=0.05, real=0.02 secs]  2024-03-02T17:50:33.110+0800: 706.331: Total time for which application threads were stopped: 0.0240503 seconds, Stopping threads took: 0.0002914 seconds


因此猜测因为是预估,而预估是对存量历史数据的计算,因此gc的自动调整机制还未跟上。


等待观察结果,大约2分钟后预估时间持续逐渐减小,并接近真实耗时,Eden空间持续增大。


效果图:




至此我们再次确认是stringtable引起的问题。


Stringtable为什么这么大?


要研究Stringtable为什么这么大,我们首先要知道Stringtable是如何产生的。


Stringtable里的对象是可以根据字面量和intern产生。字面量没问题,不可能产生大量的stringtable引用,剩下的就是intern。那怎么判断哪里大幅调用了?

  • arthas:但是arthas的实现原理是修改字节码插桩。string是常量类,无法实现监控调用。

  • 全局搜索intern方法调用:但是底层存在不少调用代码,难以排查。

  • 查看内存dump文件:从上面内容可知,young gc无法阻挡stringtable的增长,且full gc可以。因此在young gc结束之后进行dump,确保留下大量所需的数据。


在分析中间部分的string对象后,发现大量长度相等字母+数字的的字符串。猜测是罪魁祸首,复制后在日志中搜索,发现是下游响应的key。


结合网上文章与上述现象,我们初步判断可能就是【序列化key+随机key】导致的Stringtable逐渐增大。


在通过本地debug+底层代码分析后,终于发现jackson中调用intern的地方。

package com.fasterxml.jackson.core.util;
public final class InternCache extends ConcurrentHashMap<String,String>{ public String intern(String input) { String result = get(input); if (result != null) { return result; } if (size() >= MAX_ENTRIES) { synchronized (lock) { if (size() >= MAX_ENTRIES) { clear(); } } } result = input.intern(); put(result, result); return result; }{


再往上翻源码,可以发现是哈啰的soa中间件在解析响应值时,是通过jackson的序列化完成的。

package com.hellobike.soa.rpc.utils;
public class RpcParseUtil {
public static <T> T parseResult(Serializer serializer, String resultJson, Type returnType) { if (resultJson == null) { return null; }
if (returnType == void.class || returnType == Void.class || returnType == null) { return null; }
return (T) serializer.decode(new StringWrapperByteBuf(resultJson), returnType, null); }}


为什么jackson要这么做?


因为高并发接口场景下,会产生大量的序列化对象。而一般情况下,大量序列化对象的key都是固定不变的。如果不对key做缓存,那么绝大部分场景下,这些key都会成为重复的String对象存储在堆内存中。造成堆内存浪费。


而jackson作为序列化工具,一定是要考虑性能问题的。因此对这种重复的string对象做intern,实现字符串缓存。从而降低内存浪费。


至此,原因分析完毕。目前后续有两种解决途径

  • 通知下游修改响应值,通过固定key的值避免序列化

  • 通过修改此接口的序列化方式,关闭key缓存功能实现对随机key的缓存。


最后由于场景和配合问题, 在不同的服务分别采用了两种方式。


溯本求源

上述排查过程大量环节是由于巧合发现的排查线索,当遇到其他场景我们应该如何排查呢?

  • 通过-XX:G1LogLevel=finest打印更详细的日志,确认gc耗时原因

  • 如果是stringtable增大引起,可以通过以下几种方式获取

  • 通过jdk的sa-jdi工具中的sun.jvm.hotspot.memory.StringTablel类,尝试获取内存中stringtable引用的对象

  • 通过内存分析

  • visualvm的Oql语句【select u from java.lang.String u where count(referrers(u)) == 0】分析young gc后依旧死亡的内存对象

  • 通过mat或visualvm分析类似对象的引用关系

  • 查看周边string对象分析所属环节




确认是被ItemBean引用。


解决途径

既然知道原因了,那解决起来就比较方便了。


由于已经确认我们的场景是soa响应的值的key是一个接近随机数的值(可能并非随机数,只是几乎不重样),那接下来就有两条路径能解决这个问题。


问题数据格式

{  "msg": "ok",  "code": 0,  "data": {    "outputs": {      "resultMap": {        "d32d56e8a1b34457af26d94a02c16cba": 0.32004153617699793,        "5bc9df88acea4da8bc829de13aa3afb2": 0.434173387080695,        "4900533c94bc443ea086de0936bfb59e": 0.5555319629335832,        "575360f1d0544f95b4c430dd8759839a": 0.908782703771497,        "00fba5cbd24749999dd719f9ec338f4b": 0.5919287549198423,        "1b7d5916f0454847a8ab90ce2a49e3a1": 0.5869043763908277,        "22d5a24229374a4b9d04c9bf371e7eb7": 0.09601455380787982,        "5a35beafa5d8492f85e5aa6ca3223129": 0.5386071460222013,        "61488dab533b4ec4ba6ae1ea61802c9d": 0.22098097729212451,        "0aaebddbe3e44677a47f5f4becb6ea51": 0.11469034353064256,        "21d6c73689514d00ae72b5eb7d1a1785": 0.7861965800121863,        "3709b11d98194297b788e6068c14760f": 0.8005589118418589,        "cb1b093134d049f9bebc49e8cc3d791d": 0.22801438927340578,        "6718a455bcf1416fb1839c79254c4d7c": 0.936704202541788,        "1200f2d71e5944fcbd645d7084cfbfe8": 0.43563843328912855,        "6f9ebebbeec247119a3bcf2b7c52b76f": 0.9499259171150704,        "cea737f0c1cb4be0969d692196287b61": 0.6442754643503128,        "a1da06ba41d74faf9c88374bf302e578": 0.46847853036230913,        "b5f1177c0bfa4f67a69d67606665a87b": 0.19847468498415466,        "efe70c3f037e44cb995dd8a6a6aff8f1": 0.6239117074575771      }    },    "code": 0,    "cost": 43,    "seqNo": "1097369524402581504",    "extraOutputs": {      "traceId": "96939a6e0c65b3f4e6ebadcaa5e4c4e8",      "decision_version": 21    },    "projectId": 4,    "version": 21  }}


解决途径:

  • 修改soa对接参数,将这个接近随机数的值放到value中

  • 对问题接口不进行key的序列化。


AlgorithmMarket服务:

通过修改响应数据,将响应数据的类似随机key放入到list中,解决问题。


AppProphetOnlineService服务:

通过修改soa底层序列化方式,不对key做intern对方式决解问题。

private SoaClient getNonInternSoaClient(String clientName, String group) {    ClientConf clientConf = new ClientConf();    clientConf.setClientName(clientName);    clientConf.setGroup(group);    SoaClient client = CustomizedClientCreator.createSoaClient2(clientConf);    client.start();    return client;}
public class CustomizedBlockingGrpcClientInvoker extends GrpcClientInvoker {
protected final Serializer serializer2;

public CustomizedBlockingGrpcClientInvoker(ManagedChannel channel, InternalClientInvokerOption option, DynamicConfigHelper helper) { super(channel, option, helper); this.serializer2 = new CustomizedJacksonSerializer(); }
public SoaResponse doInvoke(SoaRequest request) throws RpcException { try { return this.invokeInternal(request); } catch (RpcException var3) { return SoaResponse.buildErrorResponse(var3); } catch (Throwable var4) { return SoaResponse.buildErrorResponse(new RpcException(RpcResultStatus.UNKNOWN_ERROR, var4)); } }
private SoaResponse invokeInternal(SoaRequest soaRequest) throws RpcException { String requestJson = this.buildRequestJson(soaRequest); SoaInvokerRequest request = this.buildSoaRequest(soaRequest, requestJson); SoaInvokerServiceGrpc.SoaInvokerServiceBlockingStub soaInvokerServiceStub = (SoaInvokerServiceGrpc.SoaInvokerServiceBlockingStub) (new BlockingStubBuilder(this.channel, soaRequest)).build(); SoaInvokerResponse soaInvokerResponse = soaInvokerServiceStub.call(request); checkResponseCode(soaInvokerResponse); this.helper.getSoaClientLogger().logForResponse(soaRequest, soaInvokerResponse.getResultJson());
Object response = RpcParseUtil.parseResult(this.serializer2, soaInvokerResponse.getResultJson(), soaRequest.getReturnType()); return SoaResponse.buildSuccessResponse(response); }

}
 private static ObjectMapper initObjectMapper() {        JsonFactory jsonFactory=new JsonFactory().disable(JsonFactory.Feature.INTERN_FIELD_NAMES);        SimpleModule module = new SimpleModule();        module.addSerializer(LocalDate.class, new LocalDateSerializer()).addDeserializer(LocalDate.class, new LocalDateDeserializer());        module.addSerializer(LocalDateTime.class, new LocalDateTimeSerializer()).addDeserializer(LocalDateTime.class, new LocalDateTimeDeserializer());        ObjectMapper mapper= (new ObjectMapper(jsonFactory))                .registerModule(module)                .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)                .configure(SerializationFeature.WRITE_NULL_MAP_VALUES, false)                .setSerializationInclusion(JsonInclude.Include.NON_NULL);        return mapper;    }


其他坑

fastjson

fastjson 在1.1.24之前对所有的 json 的 key 使用了 intern 方法,缓存到了字符串常量池中,这样每次读取的时候就会非常快,大大减少时间和空间。而且 json 的 key 通常都是不变的。这个地方没有考虑到大量的 json key 如果是变化的,那就会给字符串常量池带来很大的负担。


这个问题 fastjson 在1.1.24版本中已经将这个漏洞修复了。程序加入了一个最大的缓存大小,超过这个大小后就不会再往字符串常量池中放了。


1.1.24版本的com.alibaba.fastjson.parser.SymbolTable#addSymbol() Line:113]代码

public static final int MAX_SIZE = 1024;if (size >= MAX_SIZE) {return new String(buffer, offset, len);}


这个问题是70w 数据量时候的引发的,如果是几百万的数据量的话可能就不只是30ms 的问题了。因此在使用系统级提供的String#intern方式一定要慎重!


ClassLoader

既然Ext Root Scanning的耗时会导致上述问题,那我们常用的ClassLoader也是根节点,使用的也很频繁,是否会也会导致此类问题?


本地尝试通过代码构建一个不停创建ClassLoade并加载class文件的环境,同时打印gc日志。


测试demo:

MyClassLoader loader = new MyClassLoader(ClassLoadGCTest.class.getClassLoader());Object targetBiz = loader.loadClass("com.jiaxz.hotload.DiffBizMethod").newInstance();targetBiz = null;


结果如下:

[Parallel Time: 103.4 ms, GC Workers: 3]      [GC Worker Start (ms):  48087.4  48087.4  48087.4       Min: 48087.4, Avg: 48087.4, Max: 48087.4, Diff: 0.0]      [Ext Root Scanning (ms):  47.0  34.8  47.0       Min: 34.8, Avg: 43.0, Max: 47.0, Diff: 12.2, Sum: 128.9]         [Thread Roots (ms):  0.0  0.1  0.1          Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2]         [StringTable Roots (ms):  0.0  0.0  0.1          Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]         [Universe Roots (ms):  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [JNI Handles Roots (ms):  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [ObjectSynchronizer Roots (ms):  0.0  0.0  0.2          Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.2]         [FlatProfiler Roots (ms):  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [Management Roots (ms):  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [SystemDictionary Roots (ms):  0.0  30.3  0.0          Min: 0.0, Avg: 10.1, Max: 30.3, Diff: 30.3, Sum: 30.3]         [CLDG Roots (ms):  0.6  0.0  0.0          Min: 0.0, Avg: 0.2, Max: 0.6, Diff: 0.6, Sum: 0.6]         [JVMTI Roots (ms):  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [CodeCache Roots (ms):  0.1  0.0  1.2          Min: 0.0, Avg: 0.4, Max: 1.2, Diff: 1.2, Sum: 1.3]         [CM RefProcessor Roots (ms):  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]         [Wait For Strong CLD (ms):  0.0  0.0  0.2          Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.2]         [Weak CLD Roots (ms):  46.5  16.6  46.5          Min: 16.6, Avg: 36.5, Max: 46.5, Diff: 29.9, Sum: 109.5]         [SATB Filtering (ms):  0.0  0.0  0.0          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]


由于SystemDictionary是JVM内部的一个关键数据结构,用于存储已加载类的相关信息。所以当类加载器加载越来越多的类,且并未卸载时也会导致Ext Root Scanning增长。


哈啰场景下如何定位解决

在哈啰环境下,若使用了G1GC,同时在SOA出参入参的时候使用了大量不重复的值作为key,即能重现上述场景。可通过以下方式定位并临时解决。


临时定位解决

1. 重写InternCache类

在工程目录下重写InternCache类。注意需要相同包路径。

package com.fasterxml.jackson.core.util;
import java.util.concurrent.ConcurrentHashMap;
public final class InternCache extends ConcurrentHashMap<String, String> {
//定义默认值false private volatile boolean onOff = false; static { //初始化onOff变量 onOff = System.getProperty("onOff") != null; log.info("onOff:{}", onOff); }
public String intern(String input) { //若开启开关,直接返回结果 if(onOff){ return input; } String result = (String)this.get(input); ... }}


通过重写jackson的InternCache,加入开关控制是否开启序列化。


2. 灰度发布

这里可以将onOff默认值改为true,通过值发布一台观察gc日志。也可以将onOff默认值改成false,发布单台机器后通过debug容器修改环境变量,实现灰度测试。


debug机器环境修改:

  • 打开启动脚本:vim init.script

  • 增加环境变量:-DonOff=true

  • 启动脚本:bash init.script docker_start


3. 查看效果

灰度完之后,可通过gc日志查看ext root scan查看扫描耗时是否缩小。


4. 验证

上述方式已在AppAlgorithmMarket服务验证通过。

debug节点GC:[Ext Root Scanning (ms): Min: 1.2, Avg: 1.5, Max: 2.6, Diff: 1.4, Sum: 11.7]

正常节点GC:[Ext Root Scanning (ms): Min: 51.6, Avg: 51.7, Max: 61.2, Diff: 0.6, Sum: 460.0]


全量修改建议

中间件实现apollo动态变更soa序列化方式,并使用JsonFactory jsonFactory=new JsonFactory() .disable(JsonFactory.Feature.INTERN_FIELD_NAMES)关闭key的缓存。



The End

最后最后推荐下两个不错的产品感兴趣的可以
上车了,这里只吸引同频的人,如果加入几分钟
就直接退出的就不要来了,浪费我的名额。

第一个来自码哥的小报童,仅需 19 元,刚开始更新,需要的可以扫码加入。

本专栏内容涵盖 Java 基础、Java 高级进阶、Redis、MySQL、消息中间件、微服务

架构设计等面试必考点、面试高频点。本专栏不只是单纯教大家学会背八股文知识,

更多是结合实际大厂高并发项目的场景,去学习面试技术要点。从面试官的角度去出

发,介绍互联网 Java 流行技术体系各个面试要点。

本专栏适合于准备进阶 Java 后端技术、有面试或提升技术需求,希望学习互联网大

厂 Java 流行技术体系的程序员和面试官,助你拿到名企高薪 Offer。






第二个是我的知识星球,仅需 189 元,限时特惠
赠送老徐避坑星球(139元)+ 我的副业我陪伴
群(39.9),需要的可以扫码加入。

关于星球介绍点击:

超 500 万字详解,从零到一带你彻底吃透 Kafka + RocketMQ

小红书实战



加入星球的会赠送老徐的副业避坑星球(价值
139),还会送我的副业陪伴群一年(价值
39.9),相当于免费加入我的技术星球。

需要的加入后文末加我好友来领取福利。


(副业是未来必须要做的事情,
可以趁现在早点布局)


需要续费的扫这个,优惠15元


另外必须要注意的是上车的老铁一定要加我微信
好友,拉你们加入星球专属交流群。

华仔聊技术
聊聊后端技术架构以及中间件源码
 最新文章