内容简介:jsoup 1.11.1 发布了,该版本降低了 30% 的 DOM 内存使用,增加了流式网络 HTML 解析,更快的 HTML 生成以及大量的改进和 bug 修复,下载地址:https://jsoup.org/download 改进 When loading content from a UR...
jsoup 1.11.1 发布了,该版本降低了 30% 的 DOM 内存使用,增加了流式网络 HTML 解析,更快的 HTML 生成以及大量的改进和 bug 修复,下载地址:https://jsoup.org/download
改进
When loading content from a URL or a file, the content is now parsed as it streams in from the network or disk, rather than being fully buffered before parsing. This substantially reduces memory consumption & large garbage objects when loading large files. Note that this change means that a response, once parsed, may not be parsed again from the same response object unless you call
Connection.Response.bufferUp()
first, which will buffer the full response into memory.Updated language level to Java 7 from Java 5. To maintain Android support (of minversion 8),
try-with-resources
are not used.Added
Connection.Response.bodyStream()
, a method to get the response body as an input stream. This is useful for saving a large response straight to a file, without buffering fully into memory first.Performance improvements in text and HTML generation (through less GC).
Reduced memory consumption of text, scripts, and comments in the DOM by 40%, by refactoring the node hierarchy to not track childnodes or attributes by default for lead nodes. For the average document, that's about a 30% memory reduction.
Reduced memory consumption of
Element
s by refactoring theirAttributes
to be a simple pair of arrays, vs aLinkedHashSet
.Added support for
Element.selectFirst()
, to efficiently find the first matching element.Added
Element.appendTo(parent)
to simplify slinging elements about.Added support for multiple headers with the same name in
Jsoup.Connect
Added
Element.shallowClone()
andNode.shallowClone()
, to allow cloning nodes without getting all their children.Updated
Element.text()
and the:contains(text)
selector to consider
character as spaces.Updated
Jsoup.connect().timeout()
to implement a total connect + combined read timeout. Previously it specified connect and buffer read times only, so to implement a combined total timeout, you had to have another thread send an interupt.Improved performance of
Node.addChildren()
(was quadratic)Added missing support for template tags in tables
In
Jsoup.Connect
file uploads, added the ability to set the uploaded files' mimetype.Improved Node traversal, including less object creation, and partial and filtering traversor support.
修复
Bugfix: if a document was was redecoded after character set detection, the HTML parser was not reset correctly, which could lead to an incorrect DOM.
Bugfix: attributes with the same name but different case would be incorrectly treated as different attributes.
Bugfix: self-closing tags for known empty elements were incorrectly treated as errors.
Bugfix: fixed an issue where a self-closing title, noframes, or style tag would cause the rest of the page to be incorrectly parsed as data or text.
Bugfix: fixed an issue with unknown mixed-case tags
Bugfix: fixed an issue where the entity resources were left open after startup, causing a warning.
Bugfix: fixed an issue where
Element.getElementsByIndexLessThan(index)
would incorrectly provide the root elementImproved parse time for pages with exceptionally deeply nested tags.
Improvement / workaround: modified the
Entities
implementation to load its data from a .class vs from a jar resource. Faster, and safer on Android.
【声明】文章转载自:开源中国社区 [http://www.oschina.net]
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:- Java HTML 解析器 jsoup 发布 1.13.1,解析速度显著提升
- Expat 2.2.8 发布,XML 解析器
- MediaInfo 20.03 发布,多媒体文件解析软件
- JsoupXPath v2.0-Beta 发布,HTML 解析器
- Kubernetes 1.12全新发布!新功能亮点解析
- MediaInfo 19.07 发布,多媒体文件解析软件
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
白帽子讲浏览器安全
钱文祥 / 电子工业出版社 / 2016-3 / 79.00元
浏览器是重要的互联网入口,一旦受到漏洞攻击,将直接影响到用户的信息安全。作为攻击者有哪些攻击思路,作为用户有哪些应对手段?在《白帽子讲浏览器安全》中我们将给出解答,带你了解浏览器安全的方方面面。《白帽子讲浏览器安全》兼顾攻击者、研究者和使用者三个场景,对大部分攻击都提供了分析思路和防御方案。《白帽子讲浏览器安全》从攻击者常用技巧的“表象”深入介绍浏览器的具体实现方式,让你在知其然的情况下也知其所以......一起来看看 《白帽子讲浏览器安全》 这本书的介绍吧!