0. 写在前面的省流版
1. 关于使用"+"做字符串拼接
一些古老的技术文章中会说,在Java中使用"+"做字符串拼接性能不好,但实际情况是JDK 9+之后的版本,使用"+"做字符串拼接会比StringBuilder快。
如下是一个字符串拼接的的方法,我们基于这个方法来介绍JDK8和JDK9之后版本的性能以及背后的内部细节。
class Demo {
public static String concatIndy(int i) {
return "value " + i;
}
}
2. JDK 8下的字符串拼接实现
2.1 编译并查看字节码
jdk8/bin/javac Demo.java
jdk8/bin/javap -c Dem
class Demo {
Demo();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static java.lang.String concatIndy(int);
Code:
0: new #2 // class java/lang/StringBuilder
3: dup
4: invokespecial #3 // Method java/lang/StringBuilder."<init>":()V
7: ldc #4 // String value
9: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
12: iload_0
13: invokevirtual #6 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
16: invokevirtual #7 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
19: areturn
}
2.2 反编译后的Java代码
public static String concatIndy(int i) {
return new StringBuilder("value ")
.append(i)
.toString();
}
可以看出,在JDK 8中,在非循环体内使用"+"实现字符串拼接和使用StringBuilder是一样的,用"+"做拼接代码更简洁,推荐使用"+"而不是StringBuilder。
3. JDK 9之后的字符串拼接实现 (JEP 280)
3.1. 使用JDK 11编译后并查看字节码
jdk11/bin/javac Demo.java
jdk11/bin/javap -c Demo
class Demo {
Demo();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static java.lang.String concatIndy(int);
Code:
0: iload_0
1: invokedynamic #2, 0 // InvokeDynamic #0:makeConcatWithConstants:(I)Ljava/lang/String;
6: areturn
}
可以看到,JDK 11中编译后的字节码和JDK 8是不一样的,不再是基于StringBuilder实现,而是基于StringConcatFactory.makeConcatWithConstants动态生成一个方法来实现。这个会比StringBuilder更快,不需要创建StringBuilder对象,也会减少一次数组拷贝。
这里由于是内部使用的数组,所以用了UNSAFE.allocateUninitializedArray的方式更快分配byte[]数组。通过:
StringConcatFactory.makeConcatWithConstants
import java.lang.invoke.*;
static final MethodHandle STR_INT;
static {
try {
STR_INT = StringConcatFactory.makeConcatWithConstants(
MethodHandles.lookup(),
"concat_str_int",
MethodType.methodType(String.class, int.class),
"value \1"
).dynamicInvoker();
} catch (Exception e) {
throw new Error("Bootstrap error", e);
}
}
static String concat_str_int(int value) throws Throwable {
return (String) STR_INT.invokeExact(value);
}
StringConcatFactory.makeConcatWithConstants是公开API,可以用来动态生成字符串拼接的方法,除了编译器生成字节码调用,也可以直接调用。调用生成方法一次大约需要1微秒(千分之一毫秒)。
3.2. makeConcatWithConstants动态生成方法的代码
import java.lang.StringConcatHelper;
import static java.lang.StringConcatHelper.mix;
import static java.lang.StringConcatHelper.newArray;
import static java.lang.StringConcatHelper.prepend;
import static java.lang.StringConcatHelper.newString;
public static String invokeStatic(String str, int value) throws Throwable {
long lengthCoder = 0;
lengthCoder = mix(lengthCoder, str);
lengthCoder = mix(lengthCoder, value);
byte[] bytes = newArray(lengthCoder);
lengthCoder = prepend(lengthCoder, bytes, value);
lengthCoder = prepend(lengthCoder, bytes, str);
return newString(bytes, lengthCoder);
}
StringConcatHelper
StringConcatHelper是:
package java.lang;
class StringConcatHelper {
static String newString(byte[] buf, long indexCoder) {
// Use the private, non-copying constructor (unsafe!)
if (indexCoder == LATIN1) {
return new String(buf, String.LATIN1);
} else if (indexCoder == UTF16) {
return new String(buf, String.UTF16);
}
}
}
public class String {
String(byte[] value, byte coder) {
// 无拷贝构造
this.value = value;
this.coder = coder;
}
}
StringConcatHelper的mix方法计算长度和字符编码 (将长度和coder组合放到一个long中);
根据长度和编码构造一个byte[];
然后把相关的值写入到byte[]中;
使用byte[]无拷贝的方式构造String对象。
class AbstractStringBuilder
private void inflate() {
if (!isLatin1()) {
return;
}
byte[] buf = StringUTF16.newBytesFor(value.length);
StringLatin1.inflate(value, 0, buf, 0, count);
this.value = buf;
this.coder = UTF16;
}
}
3.3 JMH比较字符串拼接和使用StringBuilder的性能
测试代码
public class ConcatBench {
public static String concatIndy(int i) {
return "value " + i;
}
public static String concatSB(int i) {
return new StringBuilder("value ")
.append(i)
.toString();
}
}
JDK 11下的测试结果
Benchmark Mode Cnt Score Error Units
ConcatBench.concatIndy thrpt 5 130.841 ± 1.127 ops/us
ConcatBench.concatSB thrpt 5 117.897 ± 1.437 ops/us
4. StringConcatFactory的实现细节
以及阿里巴巴的贡献
4.1 基于MethodHandlers API的实现
package java.lang;
class StringConcatHelper {
static long initialCoder() { ... }
// T: boolean, char, int, long, String
static long mix(long lengthCoder, T value) { ... }
static byte[] newArray(long indexCoder) { ... }
static long prepend(long lengthCoder, byte[] buf, T value) { ... }
static String newString(byte[] buf, long indexCoder) { ... }
}
class StringConcatFactory {
static MethodHandle generateMHInlineCopy(MethodType mt, String[] constants) {
Class<?>[] ptypes = mt.erase().parameterArray();
MethodHandle mh = MethodHandles.dropArgumentsTrusted(newString(), 2, ptypes);
..
mh = filterInPrependers(mh, constants, ptypes);
..
MethodHandle newArrayCombinator = newArray();
mh = MethodHandles.foldArgumentsWithCombiner(mh, 0, newArrayCombinator,
1 // index
);
..
mh = filterAndFoldInMixers(mh, initialLengthCoder, ptypes);
if (objFilters != null) {
mh = MethodHandles.filterArguments(mh, 0, objFilters);
}
return mh;
}
}
4.2 基于MethodHandle表达式的问题
这种动态生成MethodHandle表达式在参数个数较多时,会遇到问题,它会生成大量中间转换类,并且生成MethodHandle消耗比较大,极端情况下,C2优化器需要高达2G的内存来编译复杂的字符串拼接:
https://github.com/openjdk/jdk/pull/18953
4.3 阿里巴巴贡献的改进 (PR 20273)
阿里巴巴的工程师温绍锦在2024年7月提交了一个新的方案:
《Re-thinking String Concatenation》 :
package java.lang.invoke;
class StringConcatFactory {
static final class InlineHiddenClassStrategy {
static MethodHandle generate(Lookup lookup, MethodType args, String[] constants) {
byte[] classBytes = ClassFile.of().build(...);
var hiddenClass = lookup.makeHiddenClassDefiner(CLASS_NAME, classBytes, Set.of(), DUMPER)
.defineClass(true, null);
var constructor = lookup.findConstructor(hiddenClass, CONSTRUCTOR_METHOD_TYPE);
var concat = lookup.findVirtual(hiddenClass, METHOD_NAME, concatArgs);
var instance = hiddenClass.cast(constructor.invoke(constants));
return concat.bindTo(instance);
}
}
}
java.lang.StringConcatHelper中的基类 StringConcatBase。
package java.lang;
class StringConcatHelper {
static abstract class StringConcatBase {
@Stable
final String[] constants;
final int length;
final byte coder;
StringConcatBase(String[] constants) {
int length = 0;
byte coder = String.LATIN1;
for (String c : constants) {
length += c.length();
coder |= c.coder();
}
this.constants = constants;
this.length = length;
this.coder = coder;
}
}
}
生成的代码。
import static java.lang.StringConcatHelper.newArrayWithSuffix;
import static java.lang.StringConcatHelper.prepend;
import static java.lang.StringConcatHelper.stringCoder;
import static java.lang.StringConcatHelper.stringSize;
class StringConcat extends java.lang.StringConcatHelper.StringConcatBase {
// super class defines
// String[] constants;
// int length;
// byte coder;
StringConcat(String[] constants) {
super(constants);
}
String concat(int arg0, long arg1, boolean arg2, char arg3, String arg4,
float arg5, double arg6, Object arg7
) {
// Types other than byte/short/int/long/boolean/String require a local variable to store
String str4 = stringOf(arg4);
String str5 = stringOf(arg5);
String str6 = stringOf(arg6);
String str7 = stringOf(arg7);
int coder = coder(this.coder, arg0, arg1, arg2, arg3, str4, str5, str6, str7);
int length = length(this.length, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7);
String[] constants = this.constants;
byte[] buf = newArrayWithSuffix(constants[paramCount], length. coder);
prepend(length, coder, buf, constants, arg0, arg1, arg2, arg3, str4, str5, str6, str7);
return new String(buf, coder);
}
static int length(int length, int arg0, long arg1, boolean arg2, char arg3,
String arg4, String arg5, String arg6, String arg7) {
return stringSize(stringSize(stringSize(stringSize(stringSize(stringSize(stringSize(stringSize(
length, arg0), arg1), arg2), arg3), arg4), arg5), arg6), arg7);
}
static int cocder(int coder, char arg3, String str4, String str5, String str6, String str7) {
return coder | stringCoder(arg3) | str4.coder() | str5.coder() | str6.coder() | str7.coder();
}
static int prepend(int length, int coder, byte[] buf, String[] constants,
int arg0, long arg1, boolean arg2, char arg3,
String str4, String str5, String str6, String str7) {
// StringConcatHelper. prepend
return prepend(prepend(prepend(prepend(
prepend(apppend(prepend(prepend(length,
buf, str7, constant[7]), buf, str6, constant[6]),
buf, str5, constant[5]), buf, str4, constant[4]),
buf, arg3, constant[3]), buf, arg2, constant[2]),
buf, arg1, constant[1]), buf, arg0, constant[0]);
}
}
4.4 PR 20273带来的启动性能提升显著
(具体细节看 PR 20273的comments)
5. 阿里巴巴对字符串拼接的其他贡献
5.1 PR 20253 Optimize StringConcatHelper.simpleConcat
5.2 PR 19730 Reduce object allocation for FloatToDecimal and DoubleToDecimal
这个PR通过消除过程中的内存分配提升float/double类型:
toString和StringBuilder.append(float/double)的性能。
-Benchmark Mode Cnt Score Error Units base
-StringBuilders.appendWithFloat8Latin1 avgt 15 317.144 ? 11.325 ns/op
-StringBuilders.appendWithFloat8Utf16 avgt 15 316.980 ? 17.955 ns/op
-StringBuilders.appendWithDouble8Latin1 avgt 15 440.853 ? 13.067 ns/op
-StringBuilders.appendWithDouble8Utf16 avgt 15 418.896 ? 4.610 ns/op
+Benchmark Mode Cnt Score Error Units (Webrevs 00 4c810154)
+StringBuilders.appendWithFloat8Latin1 avgt 15 168.231 ? 4.749 ns/op +88.51%
+StringBuilders.appendWithFloat8Utf16 avgt 15 213.981 ? 3.274 ns/op +48.13%
+StringBuilders.appendWithDouble8Latin1 avgt 15 241.536 ? 0.993 ns/op +82.52%
+StringBuilders.appendWithDouble8Utf16 avgt 15 284.863 ? 10.381 ns/op +47.05%
-Benchmark (size) Mode Cnt Score Error Units (baseline)
-Integers.toStringBig 500 avgt 15 18.483 ± 2.771 us/op
-Integers.toStringSmall 500 avgt 15 4.435 ± 0.067 us/op
-Integers.toStringTiny 500 avgt 15 2.382 ± 0.063 us/op
+Benchmark (size) Mode Cnt Score Error Units (PR Update 20 c0f42a7c)
+Integers.toStringBig 500 avgt 15 5.392 ? 0.016 us/op (+242.78%)
+Integers.toStringSmall 500 avgt 15 3.201 ? 0.024 us/op (+38.55%)
+Integers.toStringTiny 500 avgt 15 2.141 ? 0.021 us/op (+11.25%)
-Benchmark (size) Mode Cnt Score Error Units (baseline)
-Longs.toStringBig 500 avgt 15 8.336 ± 0.025 us/op
-Longs.toStringSmall 500 avgt 15 4.389 ± 0.018 us/op
+Benchmark (size) Mode Cnt Score Error Units (PR Update 20 c0f42a7c)
+Longs.toStringBig 500 avgt 15 7.706 ? 0.015 us/op (+8.17%)
+Longs.toStringSmall 500 avgt 15 3.094 ? 0.021 us/op (+41.85%)
-Benchmark Mode Cnt Score Error Units (baseline)
-StringBuilders.toStringCharWithInt8 avgt 15 124.316 ± 61.017 ns/op
+Benchmark Mode Cnt Score Error Units (PR Update 20 c0f42a7c)
+StringBuilders.toStringCharWithInt8 avgt 15 44.497 ? 29.741 ns/op (+179.38%)
PR 14578 优化UUID.toString的性能;
PR 14751 优化String的UpperLower性能;
PR 15768 优化HexFormat.formatHex的性能;
PR 15776 优化String.format的性能;
PR 19513 优化java.text.Format的性能;
6. 总结
5. 除了PR 20273之外,阿里巴巴还做了大量的OpenJDK其他的贡献,包括对GC、JIT、Runtime、RAS,以及核心类库等的改进,例如RISC-V架构支持、VectorAPI、Primitive Types类型和各种场景字符串处理性能改进等等。
本文由高可用架构转载。技术原创及架构实践文章,欢迎通过公众号菜单「联系我们」进行投稿