利用html2text库将html内容转为Markdown格式内容
首先需要使用pip安装html2text包和requests包
pip install html2text
pip install requests
再使用requests获取网页内容
import requests
htmlpage = requests.get(url, headers=headers)
使用html2text包将获取的html数据转换为markdown格式内容
import html2text as ht
import requests
text_maker = ht.HTML2Text()
def url2md(url):
htmlpage = requests.get(url, headers=headers).text
mdtext = text_maker.handle(htmlpage)
return mdtext
最后,将markdown内容存储为.md文件
article_content = url2md(article_url)
with open(os.path.join(title+".md"), "w", encoding="utf-8") as fh:
fh.write(article_content)
记录一下常用的小函数!