jsoup 1.11.1 发布,最强的 Java HTML 解析器

栏目: 软件资讯 · 发布时间: 7年前

内容简介:jsoup 1.11.1 发布了,该版本降低了 30% 的 DOM 内存使用,增加了流式网络 HTML 解析,更快的 HTML 生成以及大量的改进和 bug 修复,下载地址:https://jsoup.org/download 改进 When loading content from a UR...

jsoup 1.11.1 发布了,该版本降低了 30% 的 DOM 内存使用,增加了流式网络 HTML 解析,更快的 HTML 生成以及大量的改进和 bug 修复,下载地址:https://jsoup.org/download

改进

  • When loading content from a URL or a file, the content is now parsed as it streams in from the network or disk, rather than being fully buffered before parsing. This substantially reduces memory consumption & large garbage objects when loading large files. Note that this change means that a response, once parsed, may not be parsed again from the same response object unless you call Connection.Response.bufferUp() first, which will buffer the full response into memory.

  • Updated language level to Java 7 from Java 5. To maintain Android support (of minversion 8), try-with-resources are not used.

  • Added Connection.Response.bodyStream(), a method to get the response body as an input stream. This is useful for saving a large response straight to a file, without buffering fully into memory first.

  • Performance improvements in text and HTML generation (through less GC).

  • Reduced memory consumption of text, scripts, and comments in the DOM by 40%, by refactoring the node hierarchy to not track childnodes or attributes by default for lead nodes. For the average document, that's about a 30% memory reduction.

  • Reduced memory consumption of Elements by refactoring their Attributesto be a simple pair of arrays, vs a LinkedHashSet.

  • Added support for Element.selectFirst(), to efficiently find the first matching element.

  • Added Element.appendTo(parent) to simplify slinging elements about.

  • Added support for multiple headers with the same name in Jsoup.Connect

  • Added Element.shallowClone() and Node.shallowClone(), to allow cloning nodes without getting all their children.

  • Updated Element.text() and the :contains(text) selector to consider   character as spaces.

  • Updated Jsoup.connect().timeout() to implement a total connect + combined read timeout. Previously it specified connect and buffer read times only, so to implement a combined total timeout, you had to have another thread send an interupt.

  • Improved performance of Node.addChildren() (was quadratic)

  • Added missing support for template tags in tables

  • In Jsoup.Connect file uploads, added the ability to set the uploaded files' mimetype.

  • Improved Node traversal, including less object creation, and partial and filtering traversor support.

修复

  • Bugfix: if a document was was redecoded after character set detection, the HTML parser was not reset correctly, which could lead to an incorrect DOM.

  • Bugfix: attributes with the same name but different case would be incorrectly treated as different attributes.

  • Bugfix: self-closing tags for known empty elements were incorrectly treated as errors.

  • Bugfix: fixed an issue where a self-closing title, noframes, or style tag would cause the rest of the page to be incorrectly parsed as data or text.

  • Bugfix: fixed an issue with unknown mixed-case tags

  • Bugfix: fixed an issue where the entity resources were left open after startup, causing a warning.

  • Bugfix: fixed an issue where Element.getElementsByIndexLessThan(index) would incorrectly provide the root element

  • Improved parse time for pages with exceptionally deeply nested tags.

  • Improvement / workaround: modified the Entities implementation to load its data from a .class vs from a jar resource. Faster, and safer on Android.


【声明】文章转载自:开源中国社区 [http://www.oschina.net]


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

白帽子讲浏览器安全

白帽子讲浏览器安全

钱文祥 / 电子工业出版社 / 2016-3 / 79.00元

浏览器是重要的互联网入口,一旦受到漏洞攻击,将直接影响到用户的信息安全。作为攻击者有哪些攻击思路,作为用户有哪些应对手段?在《白帽子讲浏览器安全》中我们将给出解答,带你了解浏览器安全的方方面面。《白帽子讲浏览器安全》兼顾攻击者、研究者和使用者三个场景,对大部分攻击都提供了分析思路和防御方案。《白帽子讲浏览器安全》从攻击者常用技巧的“表象”深入介绍浏览器的具体实现方式,让你在知其然的情况下也知其所以......一起来看看 《白帽子讲浏览器安全》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

随机密码生成器
随机密码生成器

多种字符组合密码

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码