Comparing and Sorting¶
Python 3 is strict when comparing objects of disparate types. It also drops cmp-based comparison and sorting in favor of rich comparisons and key-based sorting, modern alternatives that have been available at least since Python 2.4. Details and porting strategies follow.
The strict approach to comparing in Python 3 makes it generally impossible to compare different types of objects.
For example, in Python 2, comparing
(with results that are unpredictable across Python implementations):
>>> 2 < '2' True
but in Python 3, it fails with a well described error message:
>>> 2 < '2' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: int() < str()
The change usually manifests itself in sorting lists: in Python 3, lists with items of different types are generally not sortable.
If you need to sort heterogeneous lists, or compare different types of objects, implement a key function to fully describe how disparate types should be ordered.
- Fixer: None
- Prevalence: Common
__cmp__() special method is no longer honored in Python 3.
In Python 2,
__cmp__(self, other) implemented comparison between two
objects, returning a negative value if
self < other, positive if
self > other, and zero if they were equal.
This approach of representing comparison results is common in C-style languages. But, early in Python 2 development, it became apparent that only allowing three cases for the relative order of objects is too limiting.
This led to the introduction of rich comparison methods, which assign a special method to each operator:
Operator Method ==
Each takes the same two arguments as cmp, and must return either a result
value (typically Boolean), raise an exception, or return
to signal the operation is not defined.
In Python 3, the cmp style of comparisons was dropped.
All objects that implemented
__cmp__ must be updated to implement all of
the rich methods instead.
(There is one exception: on Python 3,
__ne__ will, by default, delegate to
__eq__ and return the inverted result . However, this is not the case
in Python 2.)
To avoid the hassle of providing all six functions, you can implement
__ne__, and only one of the ordering operators, and use the
functools.total_ordering() decorator to fill in the rest.
Note that the decorator is not available in Python 2.6. If you need
to support that version, you’ll need to supply all six methods.
@total_ordering decorator does come with the cost of somewhat slower
execution and more complex stack traces for the derived comparison methods,
so defining all six explicitly may be necessary in some cases even if
Python 2.6 support is dropped.
As an example, suppose that you have a class to represent a person with
class Person(object): def __init__(self, firstname, lastname): self.first = firstname self.last = lastname def __cmp__(self, other): return cmp((self.last, self.first), (other.last, other.first)) def __repr__(self): return "%s %s" % (self.first, self.last)
total_ordering, the class would become:
from functools import total_ordering @total_ordering class Person(object): def __init__(self, firstname, lastname): self.first = firstname self.last = lastname def __eq__(self, other): return ((self.last, self.first) == (other.last, other.first)) def __ne__(self, other): return not (self == other) def __lt__(self, other): return ((self.last, self.first) < (other.last, other.first)) def __repr__(self): return "%s %s" % (self.first, self.last)
total_ordering cannot be used, or if efficiency is important,
all methods can be given explicitly:
class Person(object): def __init__(self, firstname, lastname): self.first = firstname self.last = lastname def __eq__(self, other): return ((self.last, self.first) == (other.last, other.first)) def __ne__(self, other): return ((self.last, self.first) != (other.last, other.first)) def __lt__(self, other): return ((self.last, self.first) < (other.last, other.first)) def __le__(self, other): return ((self.last, self.first) <= (other.last, other.first)) def __gt__(self, other): return ((self.last, self.first) > (other.last, other.first)) def __ge__(self, other): return ((self.last, self.first) >= (other.last, other.first)) def __repr__(self): return "%s %s" % (self.first, self.last)
- Fixer: None
- Prevalence: Common
As part of the move away from cmp-style comparisons, the
function was removed in Python 3.
If it is necessary (usually to conform to an external API), you can provide it with this code:
def cmp(x, y): """ Replacement for built-in function cmp that was removed in Python 3 Compare the two objects x and y and return an integer according to the outcome. The return value is negative if x < y, zero if x == y and strictly positive if x > y. """ return (x > y) - (x < y)
The expression used is not straightforward, so if you need the functionality, we recommend adding the full, documented function to your project’s utility library.
- Fixer: None
- Prevalence: Uncommon
In Python 2,
sorted() functions have a
which determines the sort order. The argument for
cmp is a function
that, like all cmp-style functions, returns a negative, zero, or positive
result depending on the order of its two arguments.
For example, given a list of instances of a Person class (defined above):
>>> actors = [Person('Eric', 'Idle'), ... Person('John', 'Cleese'), ... Person('Michael', 'Palin'), ... Person('Terry', 'Gilliam'), ... Person('Terry', 'Jones')] ...
one way to sort it by last name in Python 2 would be:
>>> def cmp_last_name(a, b): ... """ Compare names by last name""" ... return cmp(a.last, b.last) ... >>> sorted(actors, cmp=cmp_last_name) ['John Cleese', 'Terry Gilliam', 'Eric Idle', 'Terry Jones', 'Michael Palin']
This function is called many times – O(n log n) – during the comparison.
As an alternative to cmp, sorting functions can take a keyword-only
parameter, a function that returns the key under which to sort:
>>> def keyfunction(item): ... """Key for comparison by last name""" ... return item.last ... >>> sorted(actors, key=keyfunction) ['John Cleese', 'Terry Gilliam', 'Eric Idle', 'Terry Jones', 'Michael Palin']
The advantage of this approach is that this function is called only once for each item. When simple types such as tuples, strings, and numbers are used for keys, the many comparisons are then handled by optimized C code. Also, in most cases key functions are more readable than cmp: usually, people think of sorting by some aspect of an object (such as last name), rather than by comparing individual objects. The main disadvantage is that the old cmp style is commonly used in C-language APIs, so external libraries are likely to provide similar functions.
In Python 3, the
cmp parameter was removed, and only
key (or no
argument at all) can be used.
There is no fixer for this change.
However, discovering it is straightforward: the calling
sort with the
cmp argument raises TypeError in Python 3.
Each cmp function must be replaced by a key function.
There are two ways to do this:
If the function did a common operation on both arguments, and then compared the results, replace it by just the common operation. In other words,
cmp(f(a), f(b))should be replaced with
If the above does not apply, wrap the cmp-style function with
functools.cmp_to_key(). See its documentation for details.
cmp_to_keyfunction is not available in Python 2.6, so if you need to support that version, you’ll need copy it from Python sources