哈希表 (Hash Table)


哈希表 (Hash Tables)

哈希表是一种通过哈希函数将键映射到值的数据结构,提供近似O(1)的查找、插入和删除操作。

基础概念

哈希表原理

键值对存储:
key1 ----→ hash(key1) = 4 ----→ value1

key2 ----→ hash(key2) = 2 ----→ value2

key3 ----→ hash(key3) = 4 ----→ value3 (链式解决冲突)

哈希函数

常见的哈希函数设计:

1. 除法哈希:
hash(key) = key % table_size

2. 乘法哈希:
hash(key) = floor(table_size * (key * A % 1))
其中A为常数(≈0.618)

3. 字符串哈希:
hash = 0
for char in string:
    hash = (hash * 31 + char) % table_size

冲突解决

  1. 链式法(Chaining)
[0] -> null
[1] -> key1 -> key5
[2] -> key2
[3] -> key3 -> key6 -> key7
[4] -> key4
  1. 开放寻址法(Open Addressing)
线性探测:
如果位置i被占用,尝试i+1, i+2, ...

二次探测:
如果位置i被占用,尝试i+1², i+2², ...

双重哈希:
如果位置h1(key)被占用,尝试h1(key) + i*h2(key)

经典问题

1. 两数之和

LeetCode 1 - Two Sum

使用哈希表优化查找:

目标和: 9
数组: [2, 7, 11, 15]

哈希表构建过程:
步骤1: {2: 0}           寻找 9-2=7
步骤2: {2: 0, 7: 1}     找到 7,返回 [0,1]
def twoSum(nums, target):
    seen = {}
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i
    return []

2. LRU缓存

LeetCode 146 - LRU Cache

哈希表+双向链表实现:

哈希表: {key: ListNode}
双向链表: head <-> node1 <-> node2 <-> tail

get(key2):             put(key4):
before:                before:
key1 <-> key2 <-> key3 key1 <-> key2 <-> key3

after:                 after:
key1 <-> key3 <-> key2 key2 <-> key3 <-> key4
class Node:
    def __init__(self, key=0, val=0):
        self.key = key
        self.val = val
        self.prev = None
        self.next = None

class LRUCache:
    def __init__(self, capacity):
        self.cache = {}
        self.capacity = capacity
        self.head = Node()
        self.tail = Node()
        self.head.next = self.tail
        self.tail.prev = self.head
    
    def get(self, key):
        if key in self.cache:
            node = self.cache[key]
            self._remove(node)
            self._add(node)
            return node.val
        return -1
    
    def put(self, key, value):
        if key in self.cache:
            self._remove(self.cache[key])
        node = Node(key, value)
        self._add(node)
        self.cache[key] = node
        if len(self.cache) > self.capacity:
            lru = self.head.next
            self._remove(lru)
            del self.cache[lru.key]
    
    def _remove(self, node):
        node.prev.next = node.next
        node.next.prev = node.prev
    
    def _add(self, node):
        prev = self.tail.prev
        prev.next = node
        self.tail.prev = node
        node.prev = prev
        node.next = self.tail

3. 字母异位词分组

LeetCode 49 - Group Anagrams

使用排序字符串作为哈希键:

输入: ["eat","tea","tan","ate","nat","bat"]

哈希表构建:
"aet": ["eat", "tea", "ate"]
"ant": ["tan", "nat"]
"abt": ["bat"]

输出: [["eat","tea","ate"], ["tan","nat"], ["bat"]]
def groupAnagrams(strs):
    groups = {}
    for s in strs:
        key = ''.join(sorted(s))
        groups.setdefault(key, []).append(s)
    return list(groups.values())

4. 最长连续序列

LeetCode 128 - Longest Consecutive Sequence

使用哈希集合查找连续数字:

输入: [100,4,200,1,3,2]

哈希集合: {1,2,3,4,100,200}

查找过程:
1 -> 2 -> 3 -> 4     (长度=4)
100                   (长度=1)
200                   (长度=1)

返回: 4
def longestConsecutive(nums):
    num_set = set(nums)
    max_length = 0
    
    for num in num_set:
        if num - 1 not in num_set:  # 序列的起点
            current = num
            current_length = 1
            
            while current + 1 in num_set:
                current += 1
                current_length += 1
                
            max_length = max(max_length, current_length)
    
    return max_length

高级应用

1. 布隆过滤器

布隆过滤器结构:
key --→ hash1(key) --→ bit array
    --→ hash2(key) --→ [0 1 1 0 1 0 1]
    --→ hash3(key) --→

用途:

  • 大规模数据去重
  • 缓存穿透预防
  • 网页URL去重

2. 一致性哈希

哈希环:
     server1

key1 ←-----→ server2

     server3

应用:

  • 分布式缓存
  • 负载均衡
  • 数据分片

3. 计数器

Count-Min Sketch:
key --→ hash1(key) --→ [5 2 8 1 3]
    --→ hash2(key) --→ [1 4 2 6 3]
    --→ hash3(key) --→ [7 2 4 1 9]

应用:

  • 流量统计
  • 频率计数
  • 热点检测

常见技巧

  1. 使用哈希表优化查找
  2. 设计合适的哈希键
  3. 处理哈希冲突
  4. 选择合适的哈希函数
  5. 权衡空间和时间

相关题目