Python 中 in 运算符在集合和列表中的不同行为详解

本文深入探讨了 Python 中 in 运算符在列表和集合等数据结构中的不同行为。通过分析其内部实现机制，解释了为何在处理 PyTorch 张量时，in 运算符在列表和集合中会产生不同的结果。此外，本文还提供了自定义类和代码示例，帮助读者更好地理解哈希表在集合查找中的作用，并针对特定问题提供有效的解决方案。

在 Python 中，in 运算符用于检查某个元素是否存在于一个集合（collection）中。然而，in 运算符的具体行为取决于集合的类型，尤其是集合是否使用了内部哈希表。了解这些差异对于编写高效且无错误的 Python 代码至关重要。

in 运算符的工作原理

x in collection 的行为根据 collection 的类型而异。

不使用哈希表的集合 (列表、元组等)

对于不使用哈希表的集合，例如列表和元组，in 运算符会遍历集合中的每个元素，并逐个比较 x 和集合中的元素 c，直到找到匹配项。

其伪代码如下：

立即学习“Python免费学习笔记（深入）”；

def is_in(x, collection):   for c in collection:       if (x is c or x==c):           return True   return False

该过程首先比较对象的标识 (is)，如果标识不同，则比较值 (==)。

使用哈希表的集合 (集合、字典等)

对于使用哈希表的集合，例如集合和字典，in 运算符会先计算 x 的哈希值，然后查找集合中具有相同哈希值的元素子集。

其伪代码如下：

立即学习“Python免费学习笔记（深入）”；

def is_in(x, collection):   # 选择集合中哈希值与 x 相同的元素子集   subset = get_subset_by_hash(collection, hash(x))   for c in subset:       if (x is c or x==c):           return True   return False

这种方法大大提高了查找速度，因为只需要比较哈希值相同的元素，而无需遍历整个集合。

示例：自定义类和哈希表

为了更好地理解 in 运算符的行为，我们可以创建一个自定义类 MyObj，并定义其哈希计算逻辑 (__hash__) 和相等性比较逻辑 (__eq__)。

class MyObj:     def __init__(self, val, hashval):         self._val = val         self._hashval = hashval      def __hash__(self):         print(f"{str(self)} calling __hash__")         return self._hashval      def __eq__(self, other):         print(f"{str(self)} calling __eq__, other={other}")         return super().__eq__(other)      def __repr__(self):         return f"<{self.__class__.__name__}: {self._val}>"

然后，创建 MyObj 的几个实例，并将其添加到集合和列表中：

百度文心百中

百度大模型语义搜索体验中心

查看详情

a = MyObj("a", 123) b = MyObj("b", 456) d = MyObj("d", 456)  # 与 b 相同的哈希值  print("创建集合 `s`") s = set([a, b, d])  print("创建列表 `lst`") lst = [a, b, d]

运行这段代码会发现，在创建集合时，Python 会计算每个元素的哈希值，并且如果存在哈希冲突（例如 b 和 d 的哈希值相同），则会调用 __eq__ 方法进行比较。

接下来，可以使用 in 运算符来检查元素是否存在于集合和列表中，并观察其行为差异。

>>> s {<MyObj: a>, <MyObj: b>, <MyObj: d>} >>> b in s <MyObj: b> calling __hash__ True >>> d in s <MyObj: d> calling __hash__ <MyObj: b> calling __eq__, other=<MyObj: d> <MyObj: d> calling __eq__, other=<MyObj: b> True

在集合中，Python 首先计算 b 和 d 的哈希值。如果存在哈希冲突，则会调用 __eq__ 方法进行比较。

>>> lst [<MyObj: a>, <MyObj: b>, <MyObj: d>] >>> a in lst True >>> b in lst <MyObj: a> calling __eq__, other=<MyObj: b> <MyObj: b> calling __eq__, other=<MyObj: a> True >>> d in lst <MyObj: a> calling __eq__, other=<MyObj: d> <MyObj: d> calling __eq__, other=<MyObj: a> <MyObj: b> calling __eq__, other=<MyObj: d> <MyObj: d> calling __eq__, other=<MyObj: b> True

在列表中，Python 会依次比较每个元素，直到找到匹配项。

PyTorch 张量和 in 运算符

在使用 PyTorch 张量时，in 运算符的行为可能会有所不同。这是因为 PyTorch 张量重载了 == 运算符，如果两个张量的形状不同，则会引发 RuntimeError。

import torch a = torch.Tensor(2,3) b = torch.Tensor(2)  # case 1a: # b  in list([a,a,b]) # raises an error:  # Traceback (most recent call last): #   File "<stdin>", line 1, in <module> # RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 0  # case 1b b in set([a,a,b]) # True (i.e. no error)

在列表 [a, b] 中，当使用 b in [a, b] 时，Python 会首先比较 b is a（标识比较），如果结果为 False，则会比较 b == a（值比较）。由于 a 和 b 的形状不同，因此 b == a 会引发 RuntimeError。

在集合 {a, b} 中，当使用 b in {a, b} 时，Python 会首先计算 a 和 b 的哈希值。由于 torch.Tensor 的哈希值是其内存地址，因此 a 和 b 的哈希值不同。因此，b == a 不会被执行，从而避免了 RuntimeError。

解决方案

为了避免 RuntimeError，可以使用 torch.Tensor.size() 属性来创建不同大小张量的集合。

import torch  # 创建不同大小张量的字典 tensor_dict = {}  # 添加张量到字典 def add_tensor(tensor, tensor_dict):   size = tuple(tensor.size()) # 将 torch.Size 转换为元组   if size not in tensor_dict:     tensor_dict[size] = set()   tensor_dict[size].add(tensor)  # 检查张量是否存在于字典中 def tensor_in_dict(tensor, tensor_dict):   size = tuple(tensor.size()) # 将 torch.Size 转换为元组   return size in tensor_dict and tensor in tensor_dict[size]  # 示例用法 a = torch.Tensor(2, 3) b = torch.Tensor(2)  add_tensor(a, tensor_dict) add_tensor(b, tensor_dict)  print(tensor_in_dict(b, tensor_dict)) # 输出 True

总结

in 运算符在 Python 中是一个非常有用的工具，但了解其在不同数据结构中的行为至关重要。对于使用哈希表的集合，in 运算符的查找速度更快，但在处理特殊类型的对象（例如 PyTorch 张量）时，需要注意潜在的错误。通过使用适当的数据结构和比较方法，可以编写高效且无错误的 Python 代码。

大家都在看：

Python自动化粘贴文本：加速消息发送的策略与挑战使用Python函数高效生成斐波那契数列深入理解 Python 类型变量与联合类型：避免 Pyright 报错的策略解决 pip install 错误：调整 Python 环境路径 Python中高效生成斐波那契数列：函数定义与调用实践

python 工具 ai pytorch Python 运算符数据结构 Collection 对象 pytorch