理解 Java 8 HashMap 实现

2017-03-26

简介

HashMap 原理基于哈希表,并实现了 Map 接口。在 Java 中,HashMap 与 Hashtable 区别在于:

  • HashMap 不是线程安全的,而 Hashtable 是
  • HashMap 允许一个为 null 的 key 和多个为 null 的 values

处理哈希冲突有两种方法,一种是开放地址法(Open Addressing),另一种是分离链表法(Separate Chain)。Java 采用分离链表法的方法。当冲突严重时,即超过常量 TREEIFY_THRESHOLD(默认为 8) ,那么 HashMap 内部就会将链表转化成红黑树;而当元素减少时,即小于常量 UNTREEIFY_THRESHOLD(默认为 6),树又将退化成普通的链表。

HashMap 的容量总是 2 的倍数,并且默认初始容量为 16,最大容量则是 $2^{30}$。如果用户指定初始容量不是 2 的倍数,那么内部就会进行调整。

核心数据结构

HashMap 使用了两种数据结构,一种是链表节点,另一种是树节点。

链表节点定义为:

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    
    public final K getKey();
    public final V getValue();
    public final V setValue(V newValue);
    public final String toString();
    public final int hashCode();
    public final boolean equals(Object o);
}

注意,创建了一个 Node 之后,那么只能更改 Value 而不能更改 Key

红黑树节点定义为:

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    /**
     * extends LinkedHashMap.Entry,which in trun extends Node<K,V>
     */
    TreeNode<K,V> parent;
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;
    boolean red;

    // Returns root of tree containing this node
    final TreeNode<K,V> root();
    // Ensures that the given root is the first node of its bin
    static <K,V> void moveRootToFront(Node<K,V>[] tab, TreeNode<K,V> root);
    // Finds the node starting at root p with the given hash and key. The kc argument caches comparableClassFor(key) upon first use comparing keys
    final TreeNode<K,V> find(int h, Object k, Class<?> kc);
    // Calls to get tree node containing the given key. Traversing from root
    final TreeNode<K,V> getTreeNode(int h, Object k);
    // Calls to determine the relative order between a and b when they have the same hash code and non-comparable
    static int tieBreakOrder(Object a, Object b);
    // Forms tree of the nodes linked from this node. return root of tree
    final void treeify(Node<K,V>[] tab);
    // Returns a list of non-TreeNodes replacing those linked from this node.
    final Node<K,V> untreeify(HashMap<K,V> map);
    // Tree version of putVal
    final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] tab,
                                       int h, K k, V v);
    // Removes the given node from tree
    final void removeTreeNode(HashMap<K,V> map, Node<K,V>[] tab,
                                  boolean movable);
    // Splits nodes in a tree bin into lower and upper tree bins, or untreeifies if not too small. Called only from resize
    final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit);
}

哈希表定义为:

transient Node<K,V>[] table;

size 记录了 HashMap key-value 键值对的数量:

transient int size;

核心操作

核心操作为 get 和 put 方法

get

get 方法实现如下:

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

get 方法调用 getNode 找出 key 所在的 Node,如果 Node 不为 null,那么返回 value;否则返回 null。

在调用 getNode,需要先调用 hash 方法先计算 key 的哈希值,hash 方法实现为:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

hash 函数的作用是:

Computes key.hashCode() and spreads (XORs) higher bits of hash to lower

用 XOR 将高位的 bit 扩散到低位的方式称为扰动。扰动之后的哈希值与 table 长度 -1 做与运算(相当于对长度取模)可以求得节点对应的桶。具体可以参看:JDK 源码中 HashMap 的 hash 方法原理是什么?。相比于 Java 7,Java 8只做了一次异或,官方的说法是:

Because many common sets of hashes are already reasonably distributed (so don’t benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.

getNode 实现为:

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

如果 table 不为 null,并且长度大于 0,那么可以得到该 key 所在桶的第一个节点 first,接下来查找逻辑如下:

  1. 判断 first 是否是要找的节点,如果是,那么返回该节点
  2. 否则判断是否存在 next 节点,如果不存在,那么返回 null;否则进入下一步
  3. 判断 first 是否是 TreeNode,如果是,那么调用 TreeNode 的 getTreeNode 方法查找;否则遍历链表查找

getTreeNode 查找过程为:

// in class TreeNode
final TreeNode<K,V> getTreeNode(int h, Object k) {
    return ((parent != null) ? root() : this).find(h, k, null);
    }

getTreeNode 首先找到根节点,然后调用 find 方法查找。

find 方法实现为:

final TreeNode<K,V> find(int h, Object k, Class<?> kc) {
    TreeNode<K,V> p = this;
    do {
        int ph, dir; K pk;
        TreeNode<K,V> pl = p.left, pr = p.right, q;
        if ((ph = p.hash) > h)
            p = pl;
        else if (ph < h)
            p = pr;
        else if ((pk = p.key) == k || (k != null && k.equals(pk)))
            return p;
        else if (pl == null)
            p = pr;
        else if (pr == null)
            p = pl;
        else if ((kc != null ||
                    (kc = comparableClassFor(k)) != null) &&
                    (dir = compareComparables(kc, k, pk)) != 0)
            p = (dir < 0) ? pl : pr;
        else if ((q = pr.find(h, k, kc)) != null)
            return q;
        else
            p = pl;
    } while (p != null);
    return null;
}

红黑树本质上还是一颗二叉查找树,所以 find 查找过程类似于二叉树查找。

put

put 可以新加入一对 key-value,也可以更新已有的 key:

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

实际请求委托给 putVal,putVal 定义为:

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

putVal 逻辑如下:

  1. 如果 table 没有被初始化,那么调用 resize() 初始化
  2. 如果对应桶为 null,那么调用 newNode 新建一个节点
  3. 如果对应桶不为 null,那么判断节点是否是 TreeNode,如果是,那么调用 putTreeVal;否则开始遍历链表查找

resize 方法将对 table 做扩容:

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

resize 逻辑比较复杂,主要分两步,第一步确定新的 capacity 和 threshold;第二步创建新的 table,并把旧 table 的节点移到新的 table 中。下面详细介绍一二步。

第一步:

  1. 如果 oldCap 大于 0,那么判断 oldCap 是否大于 MAXIMUM_CAPACITY,如果是,那么直接返回;否则尝试将 capacity 和 threshold 都扩大两倍
  2. 否则说明 oldCap 为0,那么判断 oldThr 是否大于0,如果是那么 newCap = oldThre
  3. 否则说明 oldThr 和 oldCap 都为 0,这时就用默认值初始化 newCap 和 newThr
  4. 然后判断 newThr 是否为 0,newThr 为 0 的情况有多种,所以在这里不再让 newThr 扩大为 oldThr 的两倍,而是尝试让 newThr = newCap * loadFactor

至此新的 capacity 和 threshold 已经确定了,然后就可以创建一个新的 table,并且将老 table 的节点移到新 table 中(如果老 table 不为 null)。相比于 JDK7,JDK8 中引入了 lo 链和 hi链,不仅保持了原链表顺序,还避免了并发添加死锁问题。具体过程如下:

依次遍历老 table 的每一个桶:

  1. 如果桶的位置只有一个节点,那么将这个节点放入新 table。注意节点在新 table 的位置为 hash & (newCap -1)
  2. 否则判断节点是否是 TreeNode,如果是,那么调用 split 方法
  3. 否则开始循环遍历,准备将原链表拆成 lo 链和 hi 链,其中 lo 链在新 table 的位置与老 table 一样,而 hi 链在新 table 的位置为老 table 位置 + oldCap。节点分到 lo 链还是 hi 链的依据是 hash & oldCap == 0,如果是,那么节点放入 lo 链;否则节点放入 hi 链

split 方法也将一个 tree bin 分裂成 lower tree bin 和 upper tree bin,如果分裂后的 bin 太小,将会把它转成链表:

// in class TreeNode
final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
    TreeNode<K,V> b = this;
    // Relink into lo and hi lists, preserving order
    TreeNode<K,V> loHead = null, loTail = null;
    TreeNode<K,V> hiHead = null, hiTail = null;
    int lc = 0, hc = 0;
    for (TreeNode<K,V> e = b, next; e != null; e = next) {
        next = (TreeNode<K,V>)e.next;
        e.next = null;
        if ((e.hash & bit) == 0) {
            if ((e.prev = loTail) == null)
                loHead = e;
            else
                loTail.next = e;
            loTail = e;
            ++lc;
        }
        else {
            if ((e.prev = hiTail) == null)
                hiHead = e;
            else
                hiTail.next = e;
            hiTail = e;
            ++hc;
        }
    }

    if (loHead != null) {
        if (lc <= UNTREEIFY_THRESHOLD)
            tab[index] = loHead.untreeify(map);
        else {
            tab[index] = loHead;
            if (hiHead != null) // (else is already treeified)
                loHead.treeify(tab);
            }
        }
    if (hiHead != null) {
        if (hc <= UNTREEIFY_THRESHOLD)
            tab[index + bit] = hiHead.untreeify(map);
        else {
            tab[index + bit] = hiHead;
            if (loHead != null)
                hiHead.treeify(tab);
        }
    }
}

参考

  1. HashMap
  2. Java 8系列之重新认识HashMap
  3. JDK 源码中 HashMap 的 hash 方法原理是什么?