Iterators are a concept that almost any modern high-level programming language implements and you have probably used them already, even if you didn't notice. In this article, I want to pull back the proverbial curtain and look at why iterators are so useful and how they work.
What is an iterator?
Iterators are a generalization of the most basic properties of sequence data types. A sequence has a current state and a way to get from this state to the next one. This can be used, for example, to access the items of an array in order (the state is the current index and you change it by increasing the index). This iterator also has a natural stopping point when the index is greater than the length of the array. However, you can also create infinite iterators. For example, you could take away the array from our previous example and simply return the next higher natural number every time your iterator is polled.
To an outsider, an iterator only has some sort of next
function or method to poll it
again that can either return an item or signify that the iterator is exhausted1.
Code example in Python:
i = iter([0, 1, 2])
next(i) # returns 0
next(i) # returns 1
next(i) # returns 2
next(i) # raises exception StopIteration
Code example in Rust:
let i = [0, 1, 2].into_iter();
println!("{:?}", i.next()); // prints Some(0)
println!("{:?}", i.next()); // prints Some(1)
println!("{:?}", i.next()); // prints Some(2)
println!("{:?}", i.next()); // prints None
History
Iterators were first introduced in Barbara Liskov's programming language CLU. In her 1992 article A History of CLU she describes that the people working on CLU realized that different collection data structures (e.g. arrays, linked lists, and sets) all used different methods to access elements. However, it would be useful to be able to take a collection of any type and go through its elements in a for loop like this2:
for element in collection:
do_stuff_with(element)
This example written in CLU actually looks remarkably similar to modern Python:
odds = iter(n:int) yields int
i:int
i = 1
while i < n do
yield i
i := i + 2
end
end odds
As you can see, we define an iterator odds
that "yields" an integer (in contrast to a procedure which would "return" a value).
The logic in the iterator is relatively straightforward. We have an integer i
that is initialized with 1 and increased by 2
as long as it is smaller than the limit given by the parameter n
.
What's interesting is the yield i
inside the loop. What looks like a normal function is really a stateful object that saves
its internal state in i
and changes its state any time it is polled (e.g. by a for loop).
Iterators in Python
Since I wrote that the CLU code looked similar to Python, I should probably talk about iterators in Python. Actually, Python distinguished between iterators and generators. Let's rewrite our example from before using a generator:
def odds(n):
i = 1
while i < n:
yield i
i += 2
As you can see, Python doesn't require us to explicitly declare that this as a generator and we save a few lines that were only used to end blocks in CLU, but you should still see the similarity.
Let's paste this inside our Python REPL and call odds
to see what happens when we call it:
>>> odds(13)
<generator object odds at 0x7f79a26eccf0>
As we see, calling a generator function creates a new object of type generator
. But how do we do anything with it?
Looking at some Python documentation tells us we can call next
on it:
>>> next(odds(13))
1
That's a result! Let's choose a smaller number and call next
until we exhaust the generator:
>>> o = odds(6)
>>> next(o)
1
>>> next(o)
3
>>> next(o)
5
>>> next(o)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
So our generator object is stateful and returns new numbers every time we call it. And once the condition in our while loop becomes false, it stops.
More precisely, it raises the StopIteration
exception - in Python, an exception is not necessarily an error
and is in this case used to signal that the iterator is exhausted.
This is our first glimpse into the inner workings of Python iterators: Raise StopIteration
if the iterator is exhausted.
Actually, the documentation from before tells us even more: Our custom iterator needs to have
__iter__
and __next__
methods
where __iter__
returns the iterator itself and __next__
returns the next item or raises StopIteration
when the iterator is exhausted.
So let's rewrite odds
in this explicit version:
class odds:
def __init__(self, n):
self.i = 1
self.n = n
def __iter__(self):
return self
def __next__(self):
if self.i < self.n:
i = self.i
self.i += 2
return i
else:
raise StopIteration
We have a normal Python class and don't use a loop anymore since we don't rely on Python magic to track the state of our iterator, instead tracking it ourselves using normal attributes. Let's create an instance of this class:
>>> odds(13)
<__main__.odds object at 0x7f79a267ed30>
This is not a generator like our previous version, but it implements what Python calls the iterator protocol. So, it works just the same as our generator:
>>> o = odds(6)
>>> next(o)
1
>>> next(o)
3
>>> next(o)
5
>>> next(o)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 13, in __next__
StopIteration
As you may have noticed, I have been talking about iterators and generators and sometimes switching between these terms. What's the difference?
The Python glossary tells us that a generator is
"a function which returns a generator iterator". A generator iterator is an object that is generated by a
generator to implement the iterator protocol for the content of the generator function. It runs the function
and suspends execution when it encounters a yield
statement. It then saves the function's state and resumes
it once next
is called on it. If the function exits, the generator iterator raises StopIteration
.
What happens when you use our first Python implementation of odds
is this:
- By calling
odds(6)
we create a new generator iterator and assign it too
- Calling
next(o)
starts the function described byodds
- The function comes across a
yield
statement and returns the yielded expression - When we call
next(o)
again, the function resumes, increasing the variablei
and reentering the while loop - When we call
next(o)
and the while loop (and therefore the function) exits,o
raises aStopIteration
exception
But why should you even use generators? The answer is that they can be a lot more effective than a collection-based approach. The keyword here is laziness. Unlike a list which needs to be completely kept in memory, the generator only has to know its current state, therefore saving memory and often even being faster.
Here are some runtimes (in nanoseconds) for our odds
iterators compared to the following function that simply creates a list:
def odds_list(n):
o = []
i = 1
while i < n:
o.append(i)
i += 2
return o
n | Generator | Self-Implemented Iterator | List |
---|---|---|---|
10 | 9744 | 12705 | 8925 |
100 | 8206 | 14766 | 8542 |
1000 | 38032 | 106564 | 49902 |
10000 | 389495 | 932232 | 488954 |
100000 | 3084526 | 10046116 | 4852884 |
1000000 | 32289332 | 93365725 | 44623061 |
As you can see, the generator version has a slight advantage in speed for bigger inputs whereas our custom implementation falls behind. I would guess that this implementation is harder for Python to optimize than the generator version.
However, if we look at the result of getsizeof, our custom implementation
is the smallest at 48 bytes. The generator uses consistently uses 112 bytes whereas the list grows up to 4167352 bytes
(or about 4.2MB) for an input of n=1000000
.
This shows that if you care about speed, generators using the yield
expression are the way to go. If you need to optimize for memory
usage, you can implement the iterator yourself, though the gain is probably in many cases not significant enough to warrant the
additional effort compared to a generator.
Iterators in Rust
Of course, Python is not the only language to feature iterators - as I showed in the introduction, Rust also has them. In Rust, iterators exist as a trait (similar to the "iterator protocol" used by Python, but formalized in the language). Let's look at the definition:
trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
}
Since Rust is statically typed, you also need to tell it the type of the iterator, but next
looks very similar to Python's
__next__
. Unlike Python, where exceptions are part of the normal control flow, Rust only has panics that take the whole program
with them. Instead, if an action could fail, you use Result<T, E>
(which can contain error information) or Option<T>
if a value
might not be defined. This is actually very fitting for iterators: If there is a next item in the iterator, it returns that item,
otherwise it returns None
- no need for exceptions here!
Our odds
iterator could look something like this:
struct Odds {
state: u32,
limit: u32,
}
impl Odds {
fn new(limit: u32) -> Self {
Self {
state: 1,
limit,
}
}
}
impl Iterator for Odds {
type Item = u32;
fn next(&mut self) -> Option<Self::Item> {
if self.state < self.limit {
let tmp = self.state;
self.state += 2;
Some(tmp)
} else {
None
}
}
}
And then you can use it like this:
let mut o = Odds::new(6);
println!("{:?}", o.next()); // Some(1)
println!("{:?}", o.next()); // Some(3)
println!("{:?}", o.next()); // Some(5)
println!("{:?}", o.next()); // None
As you can see, the logic is more or less identical to the odds
class we built in Python. If you are already familiar with Rust,
you have probably wondered about the definition of the Iterator
trait. It seems like a prime candidate for generics (i.e. Iterator<T>
),
but instead uses associated types. Let's see what happens if we define our own GenericIter<T>
trait:
trait GenericIter<T> {
fn next(&mut self) -> Option<T>;
}
impl GenericIter<u32> for Odds {
fn next(&mut self) -> Option<u32> {
if self.state < self.limits {
let tmp = self.state;
self.state += 2;
Some(tmp)
} else {
None
}
}
}
The implementation is identical, except for the type signatures. So what makes the version with an associated type superior? Well, with our generic version we could add code like this:
impl GenericIter<u64> for Odds {
fn next(&mut self) -> Option<u64> {
if self.state < self.limits {
let tmp = self.state;
self.state += 2;
Some(tmp as u64)
} else {
None
}
}
}
Now we have two implementations of GenericIter
for Odds
. Since their generics differ, they can both coexist peacefully.
Except, if we try using an instance of our iterator, this happens:
error[E0282]: type annotations needed
--> src/main.rs:66:31
|
66 | println!("{:?}", o.next());
| --^^^^--
| | |
| | cannot infer type for type parameter `T` declared on the trait `GenericIter`
| this method call resolves to `Option<T>`
For more information about this error, try `rustc --explain E0282`.
error: could not compile `rust-iter` due to previous error
The compiler cannot decide whether to call GenericIter::<u32>::next
or GenericIter::<64>::next
. We could of course
not use the method syntax and call next
as e.g. GenericIter::<u64>::next(&mut o)
, but that is less than elegant.
A reasonable idea to solve this would be to only allow one implementation of Iterator
per type. And this brings us back
to the question of why the Rust standard library uses an associated type for the item type of the Iterator
trait.
Since the trait is not generic anymore, you can only have one implementation per type and therefore only one next
implementation to call. But what if we want an iterator with say an internal state of type u64
that outputs u32
s?
One of the cool things you can do with iterators is combine and modify them using adapters. For example, we could create
a new iterator by applying a function to each element - we could map an iterator of item type T
to one of type U
.
Let's try to build such a thing:
struct Map<I, T>
where
I: Iterator,
{
i: I,
f: Box<dyn Fn(I::Item) -> T>,
}
As you can see, our map iterator contains another iterator and has a function that maps an argument of I
's item type
to the generic type T
. Let's continue by implementing Iterator
for this new type:
impl<I: Iterator, T> Iterator for Map<I, T> {
type Item = T;
fn next(&mut self) -> Option<Self::Item> {
self.i.next().map(&mut self.f)
}
}
To get the next item of a Map
, we get the next element of the wrapped iterator and apply the mapping function to it.
Since it.next()
returns an Option
, we use the Option::map
method which transforms an Option<T>
to a Option<U>
,
so that if the Option
is Some
, the mapping function is applied to its contents, otherwise it stays None
.
We can now build a function that creates a new Map
given an iterator and a function3:
fn map<I, B>(i: I, f: Box<dyn Fn(I::Item) -> B>) -> Map<I, B>
where
I: Iterator + Sized,
{
Map { i, f }
}
If you look at the Rust standard library implementation of Iterator::map
, you will see that it doesn't require you
to box the mapping function by making use of the way the compiler deals with generics. If you are interested in the
nitty-gritty details, have a look at the source code.
But we do have a working map
implementation:
let mut o = map(Odds::new(6), Box::new(|n| n * 2));
println!("{:?}", o.next()); // Some(2)
println!("{:?}", o.next()); // Some(4)
println!("{:?}", o.next()); // Some(10)
println!("{:?}", o.next()); // None
From here, you can do tons of things with iterators, for example:
- Chain: Link two iterators together
- Filter: Only return the elements that a given predicate is true for
- Cycle: Repeat a sequence infinitely
Beyond just those, Rust brings many more built-in adapters for iterators.
All of them follow the same basic pattern as our Map
, though. When you call the relevant function on an Iterator
, a new iterator
is created that usually takes ownership of the iterator(s) it adapts and somehow transforms their outputs in its own next
implementation.
Wrapping up
As I wrote in the beginning of this article, iterators are a very common concept. I hope that you now know how they can be used to provide a generalized interface for interacting with sequences and can be more efficient by working lazily. Writing your own iterator is often easier than you might think, no matter which language we are talking about, and can help you write cleaner and even more performant code. Happy coding!
-
Technically, an iterator can yield more items after not yielding once, but the more common semantics are that once an iterator doesn't yield an item it is exhausted. The Python documentation even states that "Implementations that do not obey this property are deemed broken". ↩
-
Of course, Liskov described this in CLU syntax, but I find Python syntax easier to read. ↩
-
In the standard library,
map
is provided as a default implementation of theIterator
trait. Since we don't want to modify or build the trait from scratch here, we're just writing it as a function. ↩