字典树 (Trie/Prefix Tree)
字典树 (Trie/Prefix Tree)
字典树,又称前缀树或Trie树,是一种树形数据结构,用于高效地存储和检索字符串数据集中的键。这种数据结构有多种应用,例如自动补全和拼写检查。
基础概念
Trie树结构
示例: 存储 ["cat", "car", "dog"]
root
/ \
c d
| |
a o
/ \ |
t r g
基本实现
class TrieNode:
def __init__(self):
self.children = {}
self.is_end = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end = True
def search(self, word):
node = self.root
for char in word:
if char not in node.children:
return False
node = node.children[char]
return node.is_end
def startsWith(self, prefix):
node = self.root
for char in prefix:
if char not in node.children:
return False
node = node.children[char]
return True
经典问题
1. 实现Trie前缀树
LeetCode 208 - Implement Trie (Prefix Tree)
操作示例:
插入 "apple"
root
|
a
|
p
|
p
|
l
|
e*
查找 "app" → false
查找 "apple" → true
前缀 "app" → true
class TrieNode:
def __init__(self):
self.children = {}
self.is_end = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end = True
def search(self, word):
node = self.root
for char in word:
if char not in node.children:
return False
node = node.children[char]
return node.is_end
def startsWith(self, prefix):
node = self.root
for char in prefix:
if char not in node.children:
return False
node = node.children[char]
return True
2. 添加与搜索单词
LeetCode 211 - Design Add and Search Words Data Structure
支持通配符搜索:
添加: "bad"
搜索: "b.d" → true
搜索: "pad" → false
通配符匹配过程:
root
|
b
/ \
a e
/ \
d* d*
class WordDictionary:
def __init__(self):
self.root = {}
def addWord(self, word):
node = self.root
for char in word:
if char not in node:
node[char] = {}
node = node[char]
node['$'] = True
def search(self, word):
def dfs(node, i):
if i == len(word):
return '$' in node
if word[i] == '.':
for child in node:
if child != '$' and dfs(node[child], i + 1):
return True
return False
if word[i] not in node:
return False
return dfs(node[word[i]], i + 1)
return dfs(self.root, 0)
3. 单词搜索II
输入:
board = [
['o','a','a','n'],
['e','t','a','e'],
['i','h','k','r'],
['i','f','l','v']
]
words = ["oath","pea","eat","rain"]
Trie构建:
root
/ \
o e
| |
a a
| |
t t
|
h*
class Solution:
def findWords(self, board, words):
WORD_KEY = '$'
trie = {}
# 构建Trie
for word in words:
node = trie
for char in word:
node = node.setdefault(char, {})
node[WORD_KEY] = word
rowNum = len(board)
colNum = len(board[0])
matchedWords = []
def backtracking(row, col, parent):
letter = board[row][col]
currNode = parent[letter]
# 检查是否找到单词
word_match = currNode.pop(WORD_KEY, False)
if word_match:
matchedWords.append(word_match)
# 暂时标记已访问
board[row][col] = '#'
# 探索相邻单元格
for (rowOffset, colOffset) in [(-1, 0), (0, 1), (1, 0), (0, -1)]:
newRow, newCol = row + rowOffset, col + colOffset
if newRow < 0 or newRow >= rowNum or newCol < 0 or newCol >= colNum:
continue
if not board[newRow][newCol] in currNode:
continue
backtracking(newRow, newCol, currNode)
# 恢复原始字母
board[row][col] = letter
# 优化:移除空节点
if not currNode:
parent.pop(letter)
# 从每个单元格开始搜索
for row in range(rowNum):
for col in range(colNum):
if board[row][col] in trie:
backtracking(row, col, trie)
return matchedWords
4. 最长单词
LeetCode 720 - Longest Word in Dictionary
输入: ["w","wo","wor","worl","world"]
Trie构建:
root
|
w*
|
o*
|
r*
|
l*
|
d*
输出: "world"
def longestWord(words):
trie = {}
for word in words:
node = trie
for char in word:
if char not in node:
node[char] = {}
node = node[char]
node['$'] = True
def dfs(node, word):
result = word
for char in node:
if char != '$' and '$' in node[char]:
temp = dfs(node[char], word + char)
if len(temp) > len(result):
result = temp
return result
return dfs(trie, "")
高级应用
1. 压缩前缀树
普通Trie:
p
|
a
|
t
|
h*
压缩后:
path*
2. AC自动机
模式串集合: {"he", "she", "his", "hers"}
文本串: "ahishers"
构建失败指针:
root
/ \
h s
/ \ |
e i h
/ \ |
r s e
|
s
3. 后缀树
字符串: "banana"
后缀:
- banana$
- anana$
- nana$
- ana$
- na$
- a$
常见技巧
- 字符映射优化
- 内存压缩
- 回溯剪枝
- 失败指针
- 动态删除
相关题目
- LeetCode 336 - Palindrome Pairs
- LeetCode 421 - Maximum XOR of Two Numbers in an Array
- LeetCode 472 - Concatenated Words
- LeetCode 648 - Replace Words
- LeetCode 677 - Map Sum Pairs