html2text:将 HTML 转换为 Markdown 格式文本

栏目: 编程语言 · Python · 后端 · 发布时间: 9年前

内容简介:一般情况下,我们都是把markdown格式转换成html格式,那么我们如何把html格式转换成markdown呢?我们可以通过python html2text模块来实现,具体怎么实现呢?请查看文章详情

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

Usage: html2text [(filename|url) [encoding]]

Option | Description
---|---
--version | Show program's version number and exit
-h, --help | Show this help message and exit
--ignore-links | Don't include any formatting for links
--escape-all | Escape all special characters. Output is less readable, but avoids corner case formatting issues.
--reference-links | Use reference links instead of links to create markdown
--mark-code | Mark preformatted and code blocks with [code]...[/code]

For a complete list of options see the docs

Or you can use it from within Python:

>>> import html2text
>>>
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.

Or with some configuration options:

>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")
Hello, world!

>>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!"))

Hello, world!

>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!"))
Hello, [world](http://earth.google.com/)!

Originally written by Aaron Swartz. This code is distributed under the GPLv3.

How to install

html2text is available on pypi https://pypi.python.org/pypi/html2text

$ pip install html2text

How to run unit tests

PYTHONPATH=$PYTHONPATH:. coverage run --source=html2text setup.py test -v

To see the coverage results:

coverage combine
coverage html

then open the ./htmlcov/index.html file in your browser.

Documentation

Documentation lives here


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Writing Windows VxDs and Device Drivers, Second Edition

Writing Windows VxDs and Device Drivers, Second Edition

Karen Hazzah / CMP / 1996-01-12 / USD 54.95

Software developer and author Karen Hazzah expands her original treatise on device drivers in the second edition of "Writing Windows VxDs and Device Drivers." The book and companion disk include the a......一起来看看 《Writing Windows VxDs and Device Drivers, Second Edition》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

在线进制转换器
在线进制转换器

各进制数互转换器

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具