微慑信息网

Python简单异步爬取妹子套图

有MJJ说python的爬取效率低,我不太赞同
下面简单测试下python的异步爬取速度,其实也不低
环境:win7+python3.6
使用的库:aiohttp asyncio csv bs4

import aiohttp
import asyncio
from bs4 import BeautifulSoup
import csv
import time

async def write_csv():
    urllist=['https://www.aitaotu.com/guonei/list_{}.html'.format(x) for x in range(605)]\
    +['https://www.aitaotu.com/rihan/list_{}.html'.format(x) for x in range(202)]\
    +['https://www.aitaotu.com/gangtai/list_{}.html'.format(x) for x in range(33)]\
    +['https://www.aitaotu.com/meinv/list_{}.html'.format(x) for x in range(89)]\

    headers={
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding':'gzip, deflate',
    'Accept-Language':'zh-CN,zh;q=0.8,en-US;q=0.6,en;q=0.4',
    'Host':'www.aitaotu.com',
    'Referer':'https://www.aitaotu.com/guonei/',
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
    }
    with open('aitaotu.csv', 'a+', newline='', encoding='utf-8') as f:
        f_csv = csv.writer(f)
        for url in urllist:
            async with aiohttp.ClientSession() as session:
                async with session.get(url, verify_ssl=False, headers=headers) as resp:
                    print(resp.status)
                    if resp.status==200:
                        respdata=await resp.text()
                        page = BeautifulSoup(respdata,'lxml')
                        hrefs = page.select('#infinite_scroll > div > div.item_t > div > a')
                        img_urls = page.select('#infinite_scroll > div > div.item_t > div > a > img')
                        titles = page.select('#infinite_scroll > div > div.item_b.clearfix > div.title > span > a')
                        for href1, img_url1, title1 in zip(hrefs, img_urls, titles):
                            href=href1.get('href')
                            img_url=img_url1.get('data-original')
                            title=title1.get_text()
                            data={'链接':'https://www.aitaotu.com/'+href,
                                  '图片':img_url,
                                  '标题':title
                                 }
                            print(data)

                            f_csv.writerow(data.values())

start=time.time()            
loop = asyncio.get_event_loop()
loop.run_until_complete(write_csv())
print('总共耗时 %s s' % (time.time()-start) )

 

 

 

赞(0) 打赏
转载请附本站链接,未经允许不得转载,,谢谢:微慑信息网-VulSee.com » Python简单异步爬取妹子套图

评论 抢沙发

微慑信息网 专注工匠精神

微慑信息网-VulSee.com-关注前沿安全态势,聚合网络安全漏洞信息,分享安全文档案例

访问我们联系我们

觉得文章有用就打赏一下文章作者

非常感谢你的打赏,我们将继续提供更多优质内容,让我们一起创建更加美好的网络世界!

支付宝扫一扫

微信扫一扫

登录

找回密码

注册