Java ArrayList 源码学习笔记

栏目: Java · 发布时间: 7年前

内容简介：Java ArrayList 源码学习笔记

世界上最牛的 Java 代码去哪找？当然是 JDK 咯～计划学习一下常见容器的源码。我会把我觉得比较有意思或者玄学的地方更新到这里。

以下 JDK 源码及 Javadoc 均从 java version "1.8.0_131" 版本实现中摘录或翻译 java.util.ArrayList

首先，开头就很有意思，声明了两个空数组：

116 行 - 126 行

/**
 * Shared empty array instance used for empty instances.
 */
private static final Object[] EMPTY_ELEMENTDATA = {};

/**
 * Shared empty array instance used for default sized empty instances. We
 * distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when
 * first element is added.
 */
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

有什么差别？注释上告诉我们， EMPTY_ELEMENTDATA 用于空数组实例，而 DEFAULTCAPACITY_EMPTY_ELEMENTDATA 用于默认的空数组实例，他们的区别在于是否能获知首次添加元素时对数组的扩充量。这样看我们也是一头雾水，不如我们看一下他是怎样应用的：

143行-166行

/**
 * Constructs an empty list with the specified initial capacity.
 *
 * @param  initialCapacity  the initial capacity of the list
 * @throws IllegalArgumentException if the specified initial capacity
 *         is negative
 */
public ArrayList(int initialCapacity) {
    if (initialCapacity > 0) {
        this.elementData = new Object[initialCapacity];
    } else if (initialCapacity == 0) {
        this.elementData = EMPTY_ELEMENTDATA;
    } else {
        throw new IllegalArgumentException("Illegal Capacity: "+
                                           initialCapacity);
    }
}

/**
 * Constructs an empty list with an initial capacity of ten.
 */
public ArrayList() {
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

我们看，当我们使用 new ArrayList() 创建 ArrayList 实例时， elementData 被赋值为 DEFAULTCAPACITY_EMPTY_ELEMENTDATA ，而如果我们这样创建实例 new ArrayList(0) ， elementData 将会被赋值为 EMPTY_ELEMENTDATA 。这看似没什么区别，我们再看一下扩容数组的函数。

222 行 - 228 行

private void ensureCapacityInternal(int minCapacity) {
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
    }

    ensureExplicitCapacity(minCapacity);
}

这个函数中对 elementData 的引用进行判断，如果引用是 DEFAULTCAPACITY_EMPTY_ELEMENTDATA 则会在 DEFAULT_CAPACITY（值为10） 与 minCapacity 中选择最大值为扩容后的长度。这里相当于用 elementData 的引用值做一个标记：如果我们使用 new ArrayList() 创建实例，则 ArrayList 在首次添加元素时会将数组扩容至 DEFAULT_CAPACITY 如果我们使用 new ArrayList(0) 创建实例则会按照 newCapacity = oldCapacity + (oldCapacity >> 1); （255行） 规则扩充实例。

那么，为什么这样做。我们想，我们使用空构造器和 new ArrayList(0) 创建实例的应用场景是不一样的，前者是我们无法预估列表中将会有多少元素，后者是我们预估元素个数会很少。因此ArrayList对此做了区别，使用不同的扩容算法。

然而令我惊讶的是，通过引用判断来区别用户行为的这种方式，这是我想不到的，如果是我，我一定会再设置一个标志变量。

238 行 - 244 行

/**
 * The maximum size of array to allocate.
 * Some VMs reserve some header words in an array.
 * Attempts to allocate larger arrays may result in
 * OutOfMemoryError: Requested array size exceeds VM limit
 */
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

这个变量显然是数组的最大长度，但是为什么要 Integer.MAX_VALUE - 8 呢，注释给了我们答案：一些 VM 会在数组头部储存头数据，试图尝试创建一个比 Integer.MAX_VALUE - 8 大的数组可能会产生 OOM 异常。

246 行 - 262 行

/**
 * Increases the capacity to ensure that it can hold at least the
 * number of elements specified by the minimum capacity argument.
 *
 * @param minCapacity the desired minimum capacity
 */
private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}

此处是扩容数组的核心代码，其计算新数组的长度采用 oldCapacity + (oldCapacity >> 1) ，新数组大概是原数组长度的 1.5 倍，使用位运算计算速度会比较快。

然而我想说的并不是这个。而是这句话 if (newCapacity - minCapacity < 0) 。在 ArrayList 类代码中，随处可见类似 a-b<0 的判断，那么为什么不用 a<b 做判断呢？

下面是我写的测试代码：

int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
int newCap = 65421;
System.out.println(newCap > MAX_ARRAY_SIZE);
System.out.println(newCap - MAX_ARRAY_SIZE > 0);

这段代码的输出是 false false ,我们可以看到，两种写法在正常情况下是等效的。但是我们考虑一下如果 newCap 溢出呢？我们令 newCap = Integer.MAX_VALUE + 654321 输出结果就变成了 false true 。这是为什么？

newCap：                  -2147418228
                          10000000000000001111111110001100
MAX_ARRAY_SIZE：          2147483639
                          01111111111111111111111111110111
MAX_ARRAY_SIZE*-1：       -2147483639
                          10000000000000000000000000001001
newCap - MAX_ARRAY_SIZE： 65429
                          00000000000000001111111110010101

我们看， newCap 由于溢出，进位覆盖了符号位，因此 newCap 为负，我们将 newCap - MAX_ARRAY_SIZE 看成 newCap + MAX_ARRAY_SIZE*-1 ，加和时两符号位相加又变成了 0（两个大负数相加又一次溢出），结果为正，而这次溢出恰恰是我们想要的，得到了正确的结果。

649 行 - 675 行

/**
 * The number of times this list has been <i>structurally modified</i>.
 * Structural modifications are those that change the size of the
 * list, or otherwise perturb it in such a fashion that iterations in
 * progress may yield incorrect results.
 *
 * <p>This field is used by the iterator and list iterator implementation
 * returned by the {@code iterator} and {@code listIterator} methods.
 * If the value of this field changes unexpectedly, the iterator (or list
 * iterator) will throw a {@code ConcurrentModificationException} in
 * response to the {@code next}, {@code remove}, {@code previous},
 * {@code set} or {@code add} operations.  This provides
 * <i>fail-fast</i> behavior, rather than non-deterministic behavior in
 * the face of concurrent modification during iteration.
 *
 * <p><b>Use of this field by subclasses is optional.</b> If a subclass
 * wishes to provide fail-fast iterators (and list iterators), then it
 * merely has to increment this field in its {@code add(int, E)} and
 * {@code remove(int)} methods (and any other methods that it overrides
 * that result in structural modifications to the list).  A single call to
 * {@code add(int, E)} or {@code remove(int)} must add no more than
 * one to this field, or the iterators (and list iterators) will throw
 * bogus {@code ConcurrentModificationExceptions}.  If an implementation
 * does not wish to provide fail-fast iterators, this field may be
 * ignored.
 */
protected transient int modCount = 0;

这个变量很有意思，它是用来标记数组结构的改变。每一次数组发生结构改变（比如说增与删）这个变量都会自增。当 List 进行遍历的时候，遍历的前后会检查这个遍历是否被改变，如果有改变则将抛出异常。这种设计体现了 fail-fast 思想，这是一种编程哲学。尽可能的抛出异常而不是武断的处理一个可能会造成问题的异常。这种思想在很多应用场景的开发（尤其是多线程高并发）都起着重要的指导作用。ErLang 就很提倡这种做法。

后记

看完 ArrayList 的源码，我心里有这样的感受：严谨，高效，还有很多我之前不知道的操作。自己的代码和大牛的代码差距还是很大的。看完这些我也不知道我能吸收多少……慢慢来吧。

以上所述就是小编给大家介绍的《Java ArrayList 源码学习笔记》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

嵌入式Linux应用开发完全手册

韦东山主编 / 人民邮电出版社 / 2008-8 / 69.00元

本书全部实例代码及相关工具。基于ARM 9+Linux 206平台，从基础讲起，引导读者快速入门，实例丰富，可直接应用于工程实践。本书全面介绍了嵌入式Linux系统开发过程中，从底层系统支持到上层GUI应用的方方面面，内容涵盖Linux操作系统的安装及相关工具的使用、配置，嵌入式编程所需要的基础知识（交叉编译工具的选项设置、Makefile语法、ARM汇编指令等），硬件部件的使用及......一起来看看《嵌入式Linux应用开发完全手册》这本书的介绍吧!

码农工具

Java ArrayList 源码学习笔记

后记

嵌入式Linux应用开发完全手册

JSON 在线解析

XML 在线格式化

HSV CMYK 转换工具