Recently found myself back to using Python and with a good need to brush up the concepts. This post represents a quick recap of what are the python iterators and generators.
Iterators are objects that can be iterated upon and they are everywhere in Python. Even the most basic program in python that is implemented with a small for loop has iterators inside it.
Every iterator object in python implements two special methods
__next__() that are referred to as iterator protocol. The function
iter() is used to return an iterator from it.
Iterating the Iterator
next() is used to manually iterate through all the items of an iterator. Lets consider an example:
# Define our list my_list = [1,2,3,5] # Set out iterator using the iter() my_iterlist = iter(my_list) # Now lets iterator though it with two ways: # next() and obj.__next__() # Will print 1 print(next(my_iterlist)) # Prints 2 print(next(my_iterlist)) # Prints 3 print(my_iterlist.__next__()) # Prints 5 print(my_iterlist.__next__()) # Iterating further would raise error # No more items remaining in the list next(my_iterlist)
As can be seen from the output of the above script, iterating further would raise the
StopIteration. Ofcourse, a more elegant way of using iterators is using the
for item in my_list: print(item)
So how does it look inside a
Earlier I mentioned that iterators are used even in simple loops, even
for loop like the one we have earlier is actually implemented in the back as:
# Create the iterator object from the iterable iter_obj = iter(my_list) # Open-ended loop while True: try: # Get next item item = next(iter_obj) # Do what the loop has i.e. print in our case except StopIteration: # If we hit StopIteration, then exit the loop break
In other words, the
for loop is creating the iterator object,
iter_obj by calling on the
iter() on the iterator. Furthermore, the
next() is called to get the next item and executes the body of
for loop. Once the full list has been iterated and
StopIteration is thrown that ends the loop.
Pretty funny, that all the
for loops are actually an open-ended
while loops all along.
What about infinite iterators?
Understanding how the
for loops are internally implemented raises the possibility of creating own iterators that wont stop at the
StopIteration exceptions but rather make them run for infinite time.
Running an infinite iterator is possible using the build-in function
iter() that can be called with two arguments; whereby the first must be a callable function and second be the sentinel. The sentinel is what keeps the call for function true until the output equals the value defined as the sentinel.
>>> int() 0 >>> infinity = iter(int,1) >>> next(infinity) 0 >>> next(infinity) >>> infinity = iter(int,0) >>> next(infinity) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
As we can see, the
int() function always return
0. So when we pass the it as
iter(int,1) it returns an iterator that will call the function
int() until the returned value equals 1. As it will never reach that condition, we have infinite iterator.
Just as with iterator, it is possible to create custom infinite iterators. It is worth noting that when writing custom infinite iterator, care must be taken about the terminating condition.
Iterators provide an obvious advantage of being able to save on available resources. It allows you to performs operations without the need to store intermediate results in the memory.
Building iterators in Python has its obvious advantage, but implementing it required quite some work including creating a custom class with
__next__() methods, throwing the
StopIteration when no value can be returned etc.
Alternatively, python generators are perfect solution for cases where a custom iterator is necessary and one wants to avoid the messiness of implementing an iterator. In other words, generators are a more simpler way of creating iterators.
Creating a generator in Python is similar as defining a normal function except rather than normal
return statement you have a
yield statement. If a function contains atleast one
yield statement, then it becomes generator function. The difference being that a
return statement terminates the function while
yield pauses the function by saving its state and later continues from there when called next.
# Our simple generator function def gen(): n = 1 print('First yield') yield n n =+ 1 print('Second yield') yield n n =+ 1 print('Third yield') yield n # Now if we only call the generator it will initate it but not start # the execution a = gen() # We can now iterate with the next() next(a) # After a function yields, the execution is paused and # the control transfered back to the compiler. The local # variables and its states are stored between successive calls. next(a) next(a) # Finally, when function terminates, the StopIteration is # raised automatically for further call next(a)
Running the above will give the following output
First yield Second yield Third yield Traceback (most recent call last): File "gen_example.py", line 18, in <module> next(a) StopIteration
Interestingly, if you run the above in interactive shell the value of the variable
n will be saved between each of the calls. As opposed to normal function, the local variables are not destroyed when the function yields. Additionally, the generator object can only be iterated once. To restart the process, the generator needs to be initialized again. Ofcourse, an elegant way is to use
for loop that takes an iterator and iterates over it using
next() function. This method also automatically ends when the
StopIteration is raised.
Why to use generators rather than iterators?
Generators provide ample of advantages over iterators namely being easy and cleaner to implement, memory efficient and can be used to represent infinite stream.