在Python的众多第三方库中，toolz库以其强大的数据处理和函数式编程能力脱颖而出。无论是处理复杂的集合数据，还是实现高效的函数式操作，toolz都能提供简洁且高效的解决方案。接下来，就带大家深入了解这个宝藏库！

一、安装toolz库

使用pip命令通过阿里云镜像可快速安装toolz库：

pip install toolz -i https://mirrors.aliyun.com/pypi/simple/

二、核心函数介绍与案例

1. toolz.dicttoolz.merge函数：合并字典

merge函数用于合并多个字典，遇到相同的key，则会以传参后面的字典为准；遇到不同的key，则会合并。

from toolz.dicttoolz import merge
dict1 = {'a': 1, 'b': {'x': 2}}
dict2 = {'b': {'y': 3}, 'c': 4}
result = merge(dict1, dict2)
print(result)

输出为 {'a': 1, 'b': {'y': 3}, 'c': 4}，并不会输出{'a': 1, 'b': {'x': 2, 'y': 3}, 'c': 4}

2. toolz.dicttoolz.merge_with(func, *dicts, **kwargs)函数：自定义字典的合并规则

merge_with是toolz中处理字典合并的高级函数，我们可以自定义冲突解决策略。当多个字典存在相同键时，可以指定一个函数来处理这些冲突值（如求和、取最大值、拼接等）。

1）案例：不传特定函数，将字典合并后，对应的value返回的是列表

from toolz.dicttoolz import merge_with
dict1 = {'a': 1, 'b': {'x': 2,'z':{'m':3}}}
dict2 = {'a':2,'b': {'y': 3}, 'c': 4}
result = merge_with(lambda s:s,dict1, dict2)
print(result)

输出为：

{'a': [1, 2], 'b': [{'x': 2, 'z': {'m': 3}}, {'y': 3}], 'c': [4]}

2）案例：合并字典并对相同键的值求和（sum函数）

from toolz.dicttoolz import merge_with

dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 3, 'c': 4}
dict3 = {'c': 5, 'd': 6}

# 使用 sum 函数对相同键的值求和
result = merge_with(sum, dict1, dict2, dict3)
print(result)

输出：

{'a': 1, 'b': 5, 'c': 9, 'd': 6}

说明：

键 'b' 在 dict1 和 dict2 中分别为 2 和 3，合并后变为 2 + 3 = 5
键 'c' 在 dict2 和 dict3 中分别为 4 和 5，合并后变为 4 + 5 = 9
无冲突的键（如 'a' 和 'd'）直接保留原值

3）案例：自定义合并函数，合并列表并去重

from toolz.dicttoolz import merge_with

dict1 = {'a': [1, 2], 'b': [3]}
dict2 = {'a': [2, 4], 'c': [5]}

# 使用 lambda 函数合并列表并去重
result = merge_with(lambda x: list(set().union(*x)), dict1, dict2)
print(result)

输出：

{'a': [1, 2, 4], 'b': [3], 'c': [5]}

说明：键 'a' 的值合并后为 [1, 2] + [2, 4]，通过 set() 去重后变为 [1, 2, 4]

4）案例：自定义合并函数，合并嵌套字典的相同key。

递归函数还是用merge_with处理嵌套字典。

from toolz.dicttoolz import merge_with

dict1 = {'a': 1, 'b': {'x': 2,'z':{'m':3}}}
dict2 = {'a':2,'b': {'y': 3,'z':{'n':3}}, 'c': 4}

# 递归合并嵌套字典，普通值取后者覆盖
def combine(x):
    print(x)
    return merge_with(combine,*x) if all(isinstance(v, dict) for v in x) else x[-1]
result = merge_with(combine,dict1, dict2)
print(result)

输出：

{'a': 2, 'b': {'x': 2, 'z': {'m': 3, 'n': 3}, 'y': 3}, 'c': 4}

说明：

嵌套字典 'b' 递归合并为{'x': 2, 'z': {'m': 3, 'n': 3}, 'y': 3}，里面的嵌套字典'z'合并为{'m': 3, 'n': 3}。

3. toolz.itertoolz.concat函数：合并可迭代对象

concat函数可以将多个可迭代对象连接成一个可迭代对象，避免了多次创建中间列表的开销，提升处理效率。

案例：合并多个列表

from toolz.itertoolz import concat
list1 = [1, 2, 3]
list2 = [4, 5, 6]
list3 = [6, 7, 8, 9]
result = list(concat([list1, list2, list3]))
print(result)

输出：

[1, 2, 3, 4, 5, 6, 6, 7, 8, 9]

4. toolz.functoolz.pipe函数：数据管道

pipe函数允许将数据依次传递给多个函数进行处理，如同搭建一条数据处理管道，让代码更加简洁和易读。

案例：使用数据管道处理数据

from toolz.functoolz import pipe

def square(x):
    return x * x

def add_1(x):
    return x + 1

def double(x):
    return x * 2

result = pipe(3, square, add_1, double)
print(result)

输出：20

说明：

先将数字3传入函数square，得到9
再将数字9传入函数add_1，得到10
最后将数字10传入函数double，得到20

5. toolz.functoolz.compose函数：函数组合

compose函数用于组合多个函数，返回一个新函数，该新函数会按照从右到左的顺序依次调用传入的函数。

案例：组合函数

from toolz.functoolz import compose

def square(x):
    return x * x

def add_1(x):
    return x + 1

def double(x):
    return x * 2

new_func = compose(double, add_1, square)
result = new_func(3)
print(result)

输出：20

说明：跟上面pipe函数实现方式类似。

6. toolz.itertoolz.groupby函数：分组数据

groupby函数根据指定的键函数对可迭代对象进行分组，返回一个字典，键为分组依据，值为分组后的数据列表。

案例：对列表元素按长度分组

from toolz.itertoolz import groupby
data = ["apple", "banana", "pear", "grapefruit", "kiwi"]
result = groupby(len, data)
print(result)

输出：

{5: ['apple'], 6: ['banana'], 4: ['pear', 'kiwi'], 10: ['grapefruit']}

7. toolz.itertoolz.filter函数：筛选可迭代对象元素

toolz中的filter函数和 Python内置的filter函数功能类似，用于根据指定的过滤条件筛选可迭代对象中的元素。不同之处在于，toolz的filter函数在处理惰性求值的可迭代对象时表现更加高效。

案例：筛选列表中的偶数

from toolz.itertoolz import filter
data = [1, 2, 3, 4, 5, 6]
result = list(filter(lambda x: x % 2 == 0, data))
print(result)

输出：[2, 4, 6]

说明：通过传入一个匿名函数lambda x: x % 2 == 0作为过滤条件，筛选出列表data中的偶数，最终将筛选结果转换为列表输出。

8. toolz.itertoolz.remove函数：移除可迭代对象元素

remove函数用于从可迭代对象中移除满足特定条件的元素。它返回一个新的可迭代对象，原对象不会被修改。

案例：移除列表中的负数

from toolz.itertoolz import remove
data = [-1, 2, -3, 4, -5, 6]
result = list(remove(lambda x: x < 0, data))
print(result)

输出：[2, 4, 6]

说明：代码使用匿名函数lambda x: x < 0作为移除条件，将列表data中的负数移除，得到仅包含正数的新列表。

9.toolz.itertoolz.map函数：对可迭代对象元素应用函数

toolz的map函数和 Python 内置的map函数一样，会对可迭代对象中的每个元素应用指定的函数。但toolz的map函数在处理大规模数据时，通过惰性求值机制能更节省内存。

案例：对列表元素求平方

from toolz.itertoolz import map
data = [1, 2, 3, 4, 5]
result = list(map(lambda x: x ** 2, data))
print(result)

输出：[1, 4, 9, 16, 25]

说明：利用匿名函数lambda x: x ** 2，对列表data中的每个元素求平方，最后将结果转换为列表展示。

10. toolz.itertoolz.pluck函数：提取字典列表中指定键的值

pluck函数用于从字典列表中提取指定键对应的值，返回一个新的可迭代对象。

案例：提取学生成绩列表中每个学生的分数

from toolz.itertoolz import pluck
students = [
 {"name": "Alice", "score": 85},
 {"name": "Bob", "score": 90},
 {"name": "Charlie", "score": 78}
]
result = list(pluck("score", students))
print(result)

输出：[85, 90, 78]

说明：传入键名"score"和字典列表students，pluck函数将列表中每个字典里"score"键对应的值提取出来，形成新的列表。

11. toolz.itertoolz.count函数：统计可迭代对象元素个数

count函数用于统计可迭代对象中元素的个数，相比Python内置的len函数，它在处理惰性求值的可迭代对象时更加灵活。

案例：统计字符串中字符个数

from toolz.itertoolz import count
string = "hello world"
result = count(string)
print(result)

输出：11

说明：直接对字符串string使用count函数，得到字符串中字符的总数。

12. toolz.itertoolz.freq函数：统计可迭代对象元素出现频率

freq函数可以统计可迭代对象中每个元素出现的频率，返回一个字典，键为元素，值为出现次数。

案例：统计列表中元素出现频率

from toolz.itertoolz import freq
data = [1, 2, 1, 3, 2, 1]
result = freq(data)
print(result)

输出：{1: 3, 2: 2, 3: 1}

四时宝库

程序员的知识宝库

Python 中 toolz 库深度解析:数据处理与函数式编程的利器