SeimiCrawler V2.0 发布，Java 爬虫框架

栏目: 软件资讯 · 发布时间: 7年前

内容简介：新版变动完美支持SpringBoot，可以尽情的集成SpringBoot现有生态，demo参考回调函数支持方法引用，设置起来更自然 push(Request.build(s.toString(),Basic::getTitle)); 非SpringBoot模式全局配置项通过...

新版变动

完美支持SpringBoot，可以尽情的集成SpringBoot现有生态，demo参考
回调函数支持方法引用，设置起来更自然

    push(Request.build(s.toString(),Basic::getTitle));

非SpringBoot模式全局配置项通过SeimiConfig进行配置，包括 Redis 集群信息，SeimiAgent信息等，SpringBoot模式则通过SpringBoot标准模式配置

常规模式：

SeimiConfig config = new SeimiConfig();
config.setSeimiAgentHost("127.0.0.1");
//config.redisSingleServer().setAddress("redis://127.0.0.1:6379");
Seimi s = new Seimi(config);
s.goRun("basic");

SpringBoot模式，在application.properties中配置

seimi.crawler.enabled=true
# 指定要发起start请求的crawler的name
seimi.crawler.names=basic,test

seimi.crawler.seimi-agent-host=xx
seimi.crawler.seimi-agent-port=xx

#开启分布式队列
seimi.crawler.enable-redisson-queue=true
#自定义bloomFilter预期插入次数，不设置用默认值 （）
#seimi.crawler.bloom-filter-expected-insertions=
#自定义bloomFilter预期的错误率，0.001为1000个允许有一个判断错误的。不设置用默认值（0.001）
#seimi.crawler.bloom-filter-false-probability=

默认的分布式队列改用Redisson实现，底层依旧为redis，去重引入BloomFilter以提高空间利用率，一个线上的BloomFilter调参模拟器地址
JDK要求 1.8+

【声明】文章转载自：开源中国社区 [http://www.oschina.net]

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Software Engineering for Internet Applications

Eve Andersson、Philip Greenspun、Andrew Grumet / The MIT Press / 2006-03-06 / USD 35.00

After completing this self-contained course on server-based Internet applications software, students who start with only the knowledge of how to write and debug a computer program will have learned ho......一起来看看《Software Engineering for Internet Applications》这本书的介绍吧!

码农工具

CSS 压缩/解压工具

在线压缩/解压 CSS 代码

HEX HSV 转换工具

HEX HSV 互换工具

HSV CMYK 转换工具

HSV CMYK互换工具