迭代器

什么是迭代器？

可以被next(iterator)函数调用并返回值的对象

如何用迭代器来解释for循环的原理？

In [26]: lst=[1,2,3,4]
In [27]: for i in lst:
   ....:     print(i)
   ....:     
1
2
3
4

for循环通过lst的.__iter__接口获取到lst的迭代器
调用next(迭代器)，获取lst中的值
捕获到StopIteration视为结束

In [28]: lst.__iter__()
Out[28]: <builtins.list_iterator at 0x7f8c678fb978>
In [29]: l_iterator=lst.__iter__()
In [30]: l_iterator.__next__()
Out[30]: 1
In [31]: l_iterator.__next__()
Out[31]: 2
In [32]: l_iterator.__next__()
Out[32]: 3
In [33]: l_iterator.__next__()
Out[33]: 4
In [34]: l_iterator.__next__()
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-34-e311b6799e5e> in <module>()
----> 1 l_iterator.__next__()
StopIteration:

iter()和.__iter__()，以及next()和.__next__()效果是一样的

在某些需要惰性访问的场景下如何实现一个可迭代对象和迭代器对象？

笨办法

好办法

import requests
from collections import Iterable, Iterator

class WeatherIterator(Iterator):
    def __init__(self, cities):
        self.cities = cities
        self.index = 0

    def getWeather(self, city):
        r = requests.get('http://wthrcdn.etouch.cn/weather_mini?city={}'.format(city))
        data = r.json()['data']['forecast'][0]
        return '{} {low} {high}'.format(city, **data)

    def __next__(self):
        # 迭代到最末端，抛出StopIteration异常
        if self.index == len(self.cities):
            raise StopIteration
        city = self.cities[self.index]
        self.index += 1
        # 调用getWeather接口返回天气信息
        return self.getWeather(city)

class WeatherIterable(Iterable):
    def __init__(self, cities):
        self.cities = cities

    # 可迭代对象，实现了迭代器协议
    def __iter__(self):
        # 根据迭代器协议，返回一个迭代器 iterator，该迭代器需要提供__next__()方法
        # __next__()方法要么返回一个下一个值，要么抛出一个StopIteration
        # for循环会捕获这个异常终止迭代
        return WeatherIterator(self.cities)


for weather_info in WeatherIterable(['北京', '新乡']):
    print(weather_info)

骚操作—反向迭代

先说下将一个列表反转有几种姿势

.reverse() 通过列表的reverse()方法可以得到反转的列表 PS: 原列表被修改了

In [1]: l = range(10)
In [2]: l
Out[2]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [3]: l.reverse()
In [4]: l
Out[4]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

切片操作 通过切片操作同样可以反转列表 PS: 但是得到了一个和原来列表等大的新列表

In [1]: l = range(10)
In [2]: l
Out[2]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [3]: id(l)
Out[3]: 140418840870496
In [4]: l[::-1]
Out[4]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
In [5]: id(l[::-1])
Out[5]: 140418840933944

以上两种方式都有各自的问题，推荐的方式是使用reversed(list)，它会返回一个反向迭代器，reversed()和iter()是对立的，这样既不改变原列表，又不会浪费大量的过多的内存空间

reversed()

In [7]: for i in reversed(l):
   ...:     print(i)
   ...:     
9
8
7
6
5
4
3
2
1
0

之前了解到iter()调用的是.__iter__()，而reversed()调用的.__reversed__()接口

In [8]: l.__reversed__()
Out[8]: <listreverseiterator at 0x7fb5cf1b3dd0>

巩固下正向和反向迭代器，做个小题，实现一个简单的xrange()，支持正向反向迭代

class FloatRang():
    def __init__(self, start, end, step=0.1):
        self.start = start
        self.end = end
        self.step = step

    # 正向迭代器接口
    def __iter__(self):
        t = self.start
        while t <= self.end:
            yield t
            t += self.step

    # 反向迭代器接口
    def __reversed__(self):
        t = self.end
        while t >= self.start:
            yield t
            t -= self.step

for i in FloatRang(1, 5, 0.5):
    print(i)

for i in reversed(FloatRang(1, 5, 0.5)):
    print(i)

骚操作—迭代器切片 itertools.islice()

切片操作非常的常用，无论时list还是str，先不看怎么实现，先想下应用场景有哪些？

大对象操作
日志文件截取某部分(前多少行，后多少行，哪行到哪行)，readlines()方法会一次性把所有内容读进内存

所以读大文本中的内容最好的方式就是使用迭代协议

In [8]: f=open('/var/log/dmesg')
In [9]: for line in f:
   ...:     print(line)
   ...:

获取前几行 使用islice实现迭代器切片操作，获取日志前5行

In [21]: f.seek(0)
In [22]: for line in islice(f, 0, 5):
   ....:     print(line)
   ....:     
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.13.0-86-generic (buildd@lgw01-19) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #130-Ubuntu SMP Mon Apr 18 18:27:15 UTC 2016 (Ubuntu 3.13.0-86.130-generic 3.13.11-ckt39)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.13.0-86-generic root=UUID=af414ad8-9936-46cd-b074-528854656fcd ro quiet splash

也可以这么写，只写5，代表只获取前5行

In [23]: f.seek(0)
In [24]: for line in islice(f, 5):
   ....:     print(line)
   ....:     
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.13.0-86-generic (buildd@lgw01-19) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #130-Ubuntu SMP Mon Apr 18 18:27:15 UTC 2016 (Ubuntu 3.13.0-86.130-generic 3.13.11-ckt39)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.13.0-86-generic root=UUID=af414ad8-9936-46cd-b074-528854656fcd ro quiet splash

获取指定行到末尾

In [25]: f.seek(0)
In [26]: for line in islice(f, 5, None):
   ....:     print(line)

获取指定行到指定行

In [27]: f.seek(0)
In [28]: for line in islice(f, 5, 10):
   ....:     print(line)

获取后几行 获取后几行就不需要用到之前学到的reversed()函数了，这里使用了一个数字列表

In [36]: l = range(10)
In [37]: l
Out[37]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [38]: for i in islice(reversed(l), 3):
   ....:     print(i)
   ....:     
9
8
7

迭代多个可迭代对象

并行

计算语文、数学、英语总成绩

生成各科成绩

In [47]: from random import randint
In [48]: chinese = [randint(60, 100) for _ in range(10)]
In [49]: chinese
Out[49]: [88, 68, 100, 97, 65, 66, 84, 70, 65, 69]
In [50]: math = [randint(60, 100) for _ in range(10)]
In [51]: english = [randint(60, 100) for _ in range(10)]

方法一： 通过索引

In [52]: for i in range(len(math)):
   ....:     print(chinese[i] + math[i] + english[i])
   ....:     
272
267
254
248
219
224
246
215
235
229

这种方法存在一个问题，就是如果可迭代对象不支持索引，就无法工作了

所以，最好使用方法二

方法二：使用zip() 效果和方法一是一样的

In [53]: for c,m,e in zip(chinese, math, english):
   ....:     print(c+m+e)
   ....:     
272
267
254
248
219
224
246
215
235
229

串行

方法一: 通过相加

In [54]: for i in [1,2,3] + ['a', 'b', 'c']:
   ....:     print(i)
   ....:     
1
2
3
a
b
c

方法一的缺点在于返回了两个列表之和大的列表，占用空间，所以更推荐方法二，因为chain返回的是一个迭代器对象

方法二：使用itertools.chain()

In [58]: from itertools import chain
In [59]: for i in chain([1, 2, 3], ['a', 'b', 'c']):
   ....:     print(i)
   ....:     
1
2
3
a
b
c
In [60]: chain([1, 2, 3], ['a', 'b', 'c'])
Out[60]: <itertools.chain at 0x7f61ecedea10>

迭代器

迭代器