String 和 StringBuilder 的区别分析

Java 提供了3个常用的用于表示字符串的类,分别为 StringStringBuilderStringBuffer。本篇文章对他们的差异进行简单的分析。

1. final关键字

对比分析 StringStringBuilder 之前,我们先回顾一下 final 关键字的使用。

对于 final 关键字,使用它修饰 类和方法 一般是出于代码可读性的考虑,具体效果是让被修饰的对象不能被继承或者重写,但效率上并不会有提升[3]。

使用 final 关键字修饰变量,可以一定程度上提升程序执行的效率[3](与编译过程有关),以及 保证该变量对其他线程的可见性。因此,对可以使用 final 修饰的变量使用 final 关键字,是一个好的习惯。

具体使用 final 关键字修饰变量的效果如下:

  • 修饰基本数据类型的变量,变量初始化后值不能改变;
  • 修饰的引用类型变量,变量初始化后就不能再指向其他的对象,但是所指向对象的内容是可变的。

2. String类

String 类型和 StringBuilderStringBuffer 是Java 中提供的3个字符串相关的类。其中,String 实际上是字符串常量, 而对于字符串变量 StringBuffer 是线程安全的,而 StringBuilder 非线程安全的。

通常,如果字符串对象需要 经常进行增删字符操作,则上述三者的效率如下: StringBuilder > StringBuffer > String.

上述效率的差别主要是由于上述三者的实现方式不同造成的。

首先是 String 类,它实际将字符串中的字符元素存储在一个 final 类型的 byte 数组中。由于 final 类型的变量一经初始化,所指向的对象存储区域是不能改变的,对于 String 类来说,就是 byte[] 的大小一经初始化后就不能改变。所以对于一些 可能 需要改变字符串长度的操作例如 concatreplace,就需要重新生成 byte[] 数组对象,即新的 String 对象。当这种更新操作多了之后,会有大量的无指针指向的对象,从而需要 GC 机制进行处理,使得效率降低。

部分源码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {

/**
* The value is used for character storage.
*
* @implNote This field is trusted by the VM, and is a subject to
* constant folding if String instance is constant. Overwriting this
* field after construction will cause problems.
*
* Additionally, it is marked with {@link Stable} to trust the contents
* of the array. No other facility in JDK provides this functionality (yet).
* {@link Stable} is safe here, because value is never null.
*/
@Stable
private final byte[] value;


/**
* Returns the length of this string.
* The length is equal to the number of <a href="Character.html#unicode">Unicode
* code units</a> in the string.
*
* @return the length of the sequence of characters represented by this
* object.
*/
public int length() {
return value.length >> coder();
}

/**
* Concatenates the specified string to the end of this string.
* <p>
* If the length of the argument string is {@code 0}, then this
* {@code String} object is returned. Otherwise, a
* {@code String} object is returned that represents a character
* sequence that is the concatenation of the character sequence
* represented by this {@code String} object and the character
* sequence represented by the argument string.<p>
* Examples:
* <blockquote><pre>
* "cares".concat("s") returns "caress"
* "to".concat("get").concat("her") returns "together"
* </pre></blockquote>
*
* @param str the {@code String} that is concatenated to the end
* of this {@code String}.
* @return a string that represents the concatenation of this object's
* characters followed by the string argument's characters.
*/
public String concat(String str) {
if (str.isEmpty()) {
return this;
}
if (coder() == str.coder()) {
byte[] val = this.value;
byte[] oval = str.value;
int len = val.length + oval.length;
byte[] buf = Arrays.copyOf(val, len);
System.arraycopy(oval, 0, buf, val.length, oval.length);
return new String(buf, coder);
}
int len = length();
int olen = str.length();
byte[] buf = StringUTF16.newBytesFor(len + olen);
getBytes(buf, 0, UTF16);
str.getBytes(buf, len, UTF16);
return new String(buf, UTF16); // 返回一个新的数组
}

/**
* Replaces each substring of this string that matches the literal target
* sequence with the specified literal replacement sequence. The
* replacement proceeds from the beginning of the string to the end, for
* example, replacing "aa" with "b" in the string "aaa" will result in
* "ba" rather than "ab".
*
* @param target The sequence of char values to be replaced
* @param replacement The replacement sequence of char values
* @return The resulting string
* @since 1.5
*/
public String replace(CharSequence target, CharSequence replacement) {
String tgtStr = target.toString();
String replStr = replacement.toString();
int j = indexOf(tgtStr);
if (j < 0) {
return this;
}
int tgtLen = tgtStr.length();
int tgtLen1 = Math.max(tgtLen, 1);
int thisLen = length();

int newLenHint = thisLen - tgtLen + replStr.length();
if (newLenHint < 0) {
throw new OutOfMemoryError();
}
StringBuilder sb = new StringBuilder(newLenHint); // 调用 StringBuilder
int i = 0;
do {
sb.append(this, i, j).append(replStr);
i = j + tgtLen;
} while (j < thisLen && (j = indexOf(tgtStr, j + tgtLen1)) > 0);
return sb.append(this, i, thisLen).toString();
}
}

3. StringBuilder类

对于 StringBuilder 类, 它继承自 AbstractStringBuilder 类, 研究 AbstractStringBuilder 类的代码,可以发现 该类将 字符串存储在一个非 finalbyte[] 中,当添加元素扩容时,则调用 Arrays.copyOf(oldStr, newLength) 来处理, Arrays.copyOf 最终会调用更加底层的 System.arraycopy() 函数- 一个 intrinsics 的函数[4],通常更加高效。

AbstractStringBuilder 的核心相关代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
abstract class AbstractStringBuilder implements Appendable, CharSequence {
/**
* The value is used for character storage.
*/
byte[] value;

/**
* The id of the encoding used to encode the bytes in {@code value}.
* 即指定的编码格式,从而确定每个 字符占用的 byte 数
*/
byte coder;

/**
The count is the number of characters used.
*/
int count;

/**
* For positive values of {@code minimumCapacity}, this method
* behaves like {@code ensureCapacity}, however it is never
* synchronized.
* If {@code minimumCapacity} is non positive due to numeric
* overflow, this method throws {@code OutOfMemoryError}.
*/
private void ensureCapacityInternal(int minimumCapacity) {
// overflow-conscious code
// 位移主要是编码格式的问题,实现 byte[] 长度到字符长度的转换
int oldCapacity = value.length >> coder;
if (minimumCapacity - oldCapacity > 0) {
// 调用 Arrays.copyOf 函数更新 byte[] 数组长度
value = Arrays.copyOf(value,
newCapacity(minimumCapacity) << coder);
}
}

public AbstractStringBuilder append(String str) {
if (str == null) {
return appendNull();
}
int len = str.length();
ensureCapacityInternal(count + len);
putStringAt(count, str);
count += len;
return this;
}
}

StringBuilder 中,append 方法就是直接调用 AbstractStringBuilder 中的 append 方法,代码如下:

1
2
3
4
5
6
7
8
9
10
11
public final class StringBuilder
extends AbstractStringBuilder
implements java.io.Serializable, Comparable<StringBuilder>, CharSequence
{

@Override
@HotSpotIntrinsicCandidate
public StringBuilder append(String str) {
super.append(str);
return this;
}

此外, 继承了 AbstractStringBuilder 的类 还有 StringBuffer 类, 相比 StringBuilder 类,它保证了字符串修改操作的线程安全。 具体的实现其实很简单,就是在 重写父类中的修改方法时, 使用 synchronized 关键字 进行了线程安全的保障处理,如下:

1
2
3
4
5
6
7
8
9
10
11
12
 public final class StringBuffer
extends AbstractStringBuilder
implements java.io.Serializable, Comparable<StringBuffer>, CharSequence
{
@Override
@HotSpotIntrinsicCandidate
public synchronized StringBuffer append(String str) {
toStringCache = null;
super.append(str); // 调用 AbstractStringBuilder 类中的append 函数。
return this;
}
}

参考:

  1. 关于 final 变量:https://www.cnblogs.com/dolphin0520/p/3736238.html
  2. final 变量的四种使用情况下分别的含义:https://www.educative.io/edpresso/what-is-the-final-keyword-in-java
  3. Final key word 对性能的影响: https://stackoverflow.com/questions/4279420/does-use-of-final-keyword-in-java-improve-the-performance
  4. Intrinsics 函数是什么? https://stackoverflow.com/questions/2268562/what-are-intrinsics