forked from timbertson/python-readability
-
Notifications
You must be signed in to change notification settings - Fork 356
Open
Description
Actual Behavior
Document.summary() is not working with python3 when the document is based on bytes and not on string content.
Steps to Reproduce the Problem
Follow the readme steps
>>> import requests
>>> from readability import Document
>>> response = requests.get('http://example.com')
>>> doc = Document(response.content)
>>> doc.title()
Traceback (most recent call last):
...
RE_CHARSET.findall(page) + RE_PRAGMA.findall(page) + RE_XML.findall(page)
^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot use a string pattern on a bytes-like objectHow to correct
String Regexp should be updated ro bytes regexp since encoding.get_encoding is only used for bytes content.
In encoding.py :
RE_CHARSET = re.compile(br'<meta.*?charset=["\']*(.+?)["\'>]', flags=re.I)
RE_PRAGMA = re.compile(br'<meta.*?content=["\']*;?charset=(.+?)["\'>]', flags=re.I)
RE_XML = re.compile(br'^<\?xml.*?encoding=["\']*(.+?)["\'>]')
Medno
Metadata
Metadata
Assignees
Labels
No labels