内容简介:这边我写了一个自定义的过滤器,继承于scrapy-redis中的。因为我有个需求是,这条url https://segmentfault.com/stop-robot不过滤。settings.py注意我项目名字是tutorial
from scrapy_redis.dupefilter import RFPDupeFilter class CustomFilter(RFPDupeFilter): def request_seen(self, request): """Returns True if request was already seen. Parameters ---------- request : scrapy.http.Request Returns ------- bool """ if 'https://segmentfault.com/stop-robot' in request.url: return False fp = self.request_fingerprint(request) # This returns the number of values added, zero if already exists. added = self.server.sadd(self.key, fp) return added == 0
这边我写了一个自定义的过滤器,继承于scrapy-redis中的。因为我有个需求是,这条url https://segmentfault.com/stop-robot不过滤。
settings.py
DUPEFILTER_CLASS = 'tutorial.CustomFilter.CustomFilter'
注意我项目名字是tutorial
4615
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Beginning ASP.NET 4 in C# and Vb
Imar Spaanjaars / Wrox / 2010-3-19 / GBP 29.99
This book is for anyone who wants to learn how to build rich and interactive web sites that run on the Microsoft platform. With the knowledge you gain from this book, you create a great foundation to ......一起来看看 《Beginning ASP.NET 4 in C# and Vb》 这本书的介绍吧!