- 授权协议: MIT
- 开发语言: Python
- 操作系统: 跨平台
- 软件首页: https://github.com/fxsjy/jparser
- 软件文档: https://github.com/fxsjy/jparser/blob/master/README.md
软件介绍
jparser是一个python库,用于网页转码,也就是从html源码中抽取正文的结构化数据:文本段落和图片。目前主要针对新闻资讯类页面进行了优化。
用法:
import urllib2
from jparser import PageModel
html = urllib2.urlopen("http://news.sohu.com/20170512/n492734045.shtml").read().decode('gb18030')
pm = PageModel(html)
result = pm.extract()
print "==title=="
print result['title']
print "==content=="
for x in result['content']:
if x['type'] == 'text':
print x['data']
if x['type'] == 'image':
print "[IMAGE]", x['data']['src']示例:
依赖:lxml
Automate This
Christopher Steiner / Portfolio / 2013-8-9 / USD 25.95
"The rousing story of the last gasp of human agency and how today's best and brightest minds are endeavoring to put an end to it." It used to be that to diagnose an illness, interpret legal docume......一起来看看 《Automate This》 这本书的介绍吧!
Base64 编码/解码
Base64 编码/解码
URL 编码/解码
URL 编码/解码
