Gain：基于 asyncio, uvloop 和 aiohttp 的 Python 爬虫框架

栏目: Python · 发布时间: 8年前

内容简介：Gain：基于 asyncio, uvloop 和 aiohttp 的 Python 爬虫框架

Gain

Web crawling framework for everyone. Written with asyncio, uvloop and aiohttp. Every could write their own web crawler easily with gain framework. Gain framework provide a pretty simple api.

Road map

Basic spider
[] Custom header

Requirements

Python3.5+

Based on

asyncio
uvloop
aiohttp
pybloomfiltermmap
pyquery

Installation

pip install gain

Usage

Write spider.py:

from gain import Css, Item, Parser, Spider


class Post(Item):
    title = Css('.entry-title')
    content = Css('.entry-content')

    async def save(self):
        with open('scrapinghub.txt', 'a+') as f:
            f.writelines(self.results['title'] + '\n')


class MySpider(Spider):
    start_url = 'https://blog.scrapinghub.com/'
    parsers = [Parser('https://blog.scrapinghub.com/page/\d+/'),
               Parser('https://blog.scrapinghub.com/\d{4}/\d{2}/\d{2}/[a-z0-9\-]+/', Post)]


MySpider.run()

run python spider.py

Gain：基于 asyncio, uvloop 和 aiohttp 的 <a href='https://www.codercto.com/topics/20097.html'>Python</a> 爬虫框架

Example

the examples are in the /example/ directory.

Contribution

Just pull request or open issue.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

为什么中国没出Facebook

谢文 / 凤凰出版社 / 2011-7-1 / 39.80元

《为什么中国没出Facebook》对互联网的游戏规则、市场、格局、模式及发展趋势等多方面进行了阐述，既勾画出了理想中的互联网生态及其本质，又联系中国实际，探讨了中国互联网行业的未来发展。《为什么中国没出Facebook》提出了在互联网成事应该符合的8条原则，比较了Facebook、MySpace、Twitter三种创新模式，指出了Web2.0平台时代新浪、腾讯、百度、搜狐等互联网巨头的未来方向，也......一起来看看《为什么中国没出Facebook》这本书的介绍吧!

码农工具