字典树 (Trie/Prefix Tree)


字典树 (Trie/Prefix Tree)

字典树,又称前缀树或Trie树,是一种树形数据结构,用于高效地存储和检索字符串数据集中的键。这种数据结构有多种应用,例如自动补全和拼写检查。

基础概念

Trie树结构

示例: 存储 ["cat", "car", "dog"]

       root
     /     \
    c       d
    |       |
    a       o
   / \      |
  t   r     g

基本实现

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False

class Trie:
    def __init__(self):
        self.root = TrieNode()
    
    def insert(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True
    
    def search(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        return node.is_end
    
    def startsWith(self, prefix):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        return True

经典问题

1. 实现Trie前缀树

LeetCode 208 - Implement Trie (Prefix Tree)

操作示例:
插入 "apple"
       root
        |
        a
        |
        p
        |
        p
        |
        l
        |
        e*

查找 "app" → false
查找 "apple" → true
前缀 "app" → true
class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False

class Trie:
    def __init__(self):
        self.root = TrieNode()
    
    def insert(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True
    
    def search(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        return node.is_end
    
    def startsWith(self, prefix):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        return True

2. 添加与搜索单词

LeetCode 211 - Design Add and Search Words Data Structure

支持通配符搜索:
添加: "bad"
搜索: "b.d" → true
搜索: "pad" → false

通配符匹配过程:
       root
        |
        b
       / \
      a   e
     /     \
    d*      d*
class WordDictionary:
    def __init__(self):
        self.root = {}
    
    def addWord(self, word):
        node = self.root
        for char in word:
            if char not in node:
                node[char] = {}
            node = node[char]
        node['$'] = True
    
    def search(self, word):
        def dfs(node, i):
            if i == len(word):
                return '$' in node
            
            if word[i] == '.':
                for child in node:
                    if child != '$' and dfs(node[child], i + 1):
                        return True
                return False
            
            if word[i] not in node:
                return False
            
            return dfs(node[word[i]], i + 1)
        
        return dfs(self.root, 0)

3. 单词搜索II

LeetCode 212 - Word Search II

输入:
board = [
  ['o','a','a','n'],
  ['e','t','a','e'],
  ['i','h','k','r'],
  ['i','f','l','v']
]
words = ["oath","pea","eat","rain"]

Trie构建:
       root
     /     \
    o       e
    |       |
    a       a
    |       |
    t       t
    |
    h*
class Solution:
    def findWords(self, board, words):
        WORD_KEY = '$'
        trie = {}
        
        # 构建Trie
        for word in words:
            node = trie
            for char in word:
                node = node.setdefault(char, {})
            node[WORD_KEY] = word
        
        rowNum = len(board)
        colNum = len(board[0])
        matchedWords = []
        
        def backtracking(row, col, parent):    
            letter = board[row][col]
            currNode = parent[letter]
            
            # 检查是否找到单词
            word_match = currNode.pop(WORD_KEY, False)
            if word_match:
                matchedWords.append(word_match)
            
            # 暂时标记已访问
            board[row][col] = '#'
            
            # 探索相邻单元格
            for (rowOffset, colOffset) in [(-1, 0), (0, 1), (1, 0), (0, -1)]:
                newRow, newCol = row + rowOffset, col + colOffset
                if newRow < 0 or newRow >= rowNum or newCol < 0 or newCol >= colNum:
                    continue
                if not board[newRow][newCol] in currNode:
                    continue
                backtracking(newRow, newCol, currNode)
        
            # 恢复原始字母
            board[row][col] = letter
        
            # 优化:移除空节点
            if not currNode:
                parent.pop(letter)
                
        # 从每个单元格开始搜索
        for row in range(rowNum):
            for col in range(colNum):
                if board[row][col] in trie:
                    backtracking(row, col, trie)
        
        return matchedWords

4. 最长单词

LeetCode 720 - Longest Word in Dictionary

输入: ["w","wo","wor","worl","world"]

Trie构建:
       root
        |
        w*
        |
        o*
        |
        r*
        |
        l*
        |
        d*

输出: "world"
def longestWord(words):
    trie = {}
    for word in words:
        node = trie
        for char in word:
            if char not in node:
                node[char] = {}
            node = node[char]
        node['$'] = True
    
    def dfs(node, word):
        result = word
        for char in node:
            if char != '$' and '$' in node[char]:
                temp = dfs(node[char], word + char)
                if len(temp) > len(result):
                    result = temp
        return result
    
    return dfs(trie, "")

高级应用

1. 压缩前缀树

普通Trie:
    p
    |
    a
    |
    t
    |
    h*

压缩后:
    path*

2. AC自动机

模式串集合: {"he", "she", "his", "hers"}
文本串: "ahishers"

构建失败指针:
       root
     /     \
    h       s
   / \      |
  e   i     h
 /     \    |
r       s    e
|
s

3. 后缀树

字符串: "banana"
后缀:
- banana$
- anana$
- nana$
- ana$
- na$
- a$

常见技巧

  1. 字符映射优化
  2. 内存压缩
  3. 回溯剪枝
  4. 失败指针
  5. 动态删除

相关题目