Recently found myself back to using Python and with a good need to brush up the concepts. This post represents a quick recap of what are the python iterators and generators.
Python iterators
Iterators are objects that can be iterated upon and they are everywhere in Python. Even the most basic program in python that is implemented with a small for loop has iterators inside it.
Every iterator object in python implements two special methods __iter__()
and __next__()
that are referred to as iterator protocol. The function iter()
is used to return an iterator from it.
Iterating the Iterator
The function next()
is used to manually iterate through all the items of an iterator. Lets consider an example:
# Define our list
my_list = [1,2,3,5]
# Set out iterator using the iter()
my_iterlist = iter(my_list)
# Now lets iterator though it with two ways:
# next() and obj.__next__()
# Will print 1
print(next(my_iterlist))
# Prints 2
print(next(my_iterlist))
# Prints 3
print(my_iterlist.__next__())
# Prints 5
print(my_iterlist.__next__())
# Iterating further would raise error
# No more items remaining in the list
next(my_iterlist)
As can be seen from the output of the above script, iterating further would raise the StopIteration
. Ofcourse, a more elegant way of using iterators is using the for
loop.
for item in my_list:
print(item)
So how does it look inside a for
loop?
Earlier I mentioned that iterators are used even in simple loops, even for
loop like the one we have earlier is actually implemented in the back as:
# Create the iterator object from the iterable
iter_obj = iter(my_list)
# Open-ended loop
while True:
try:
# Get next item
item = next(iter_obj)
# Do what the loop has i.e. print in our case
except StopIteration:
# If we hit StopIteration, then exit the loop
break
In other words, the for
loop is creating the iterator object, iter_obj
by calling on the iter()
on the iterator. Furthermore, the next()
is called to get the next item and executes the body of for
loop. Once the full list has been iterated and StopIteration
is thrown that ends the loop.
Pretty funny, that all the for
loops are actually an open-ended while
loops all along.
What about infinite iterators?
Understanding how the for
loops are internally implemented raises the possibility of creating own iterators that wont stop at the StopIteration
exceptions but rather make them run for infinite time.
Running an infinite iterator is possible using the build-in function iter()
that can be called with two arguments; whereby the first must be a callable function and second be the sentinel. The sentinel is what keeps the call for function true until the output equals the value defined as the sentinel.
>>> int()
0
>>> infinity = iter(int,1)
>>> next(infinity)
0
>>> next(infinity)
>>> infinity = iter(int,0)
>>> next(infinity)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
As we can see, the int()
function always return 0
. So when we pass the it as iter(int,1)
it returns an iterator that will call the function int()
until the returned value equals 1. As it will never reach that condition, we have infinite iterator.
Just as with iterator, it is possible to create custom infinite iterators. It is worth noting that when writing custom infinite iterator, care must be taken about the terminating condition.
Iterators provide an obvious advantage of being able to save on available resources. It allows you to performs operations without the need to store intermediate results in the memory.
Generators
Building iterators in Python has its obvious advantage, but implementing it required quite some work including creating a custom class with __iter__()
and __next__()
methods, throwing the StopIteration
when no value can be returned etc.
Alternatively, python generators are perfect solution for cases where a custom iterator is necessary and one wants to avoid the messiness of implementing an iterator. In other words, generators are a more simpler way of creating iterators.
Creating a generator in Python is similar as defining a normal function except rather than normal return
statement you have a yield
statement. If a function contains atleast one yield
statement, then it becomes generator function. The difference being that a return
statement terminates the function while yield
pauses the function by saving its state and later continues from there when called next.
Generator example
# Our simple generator function
def gen():
n = 1
print('First yield')
yield n
n =+ 1
print('Second yield')
yield n
n =+ 1
print('Third yield')
yield n
# Now if we only call the generator it will initate it but not start
# the execution
a = gen()
# We can now iterate with the next()
next(a)
# After a function yields, the execution is paused and
# the control transfered back to the compiler. The local
# variables and its states are stored between successive calls.
next(a)
next(a)
# Finally, when function terminates, the StopIteration is
# raised automatically for further call
next(a)
Running the above will give the following output
First yield
Second yield
Third yield
Traceback (most recent call last):
File "gen_example.py", line 18, in <module>
next(a)
StopIteration
Interestingly, if you run the above in interactive shell the value of the variable n
will be saved between each of the calls. As opposed to normal function, the local variables are not destroyed when the function yields. Additionally, the generator object can only be iterated once. To restart the process, the generator needs to be initialized again. Ofcourse, an elegant way is to use for
loop that takes an iterator and iterates over it using next()
function. This method also automatically ends when the StopIteration
is raised.
Why to use generators rather than iterators?
Generators provide ample of advantages over iterators namely being easy and cleaner to implement, memory efficient and can be used to represent infinite stream.