You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
32 lines
1.1 KiB
32 lines
1.1 KiB
6 years ago
|
Additions
|
||
|
---------
|
||
|
|
||
|
More of the jQuery API: nextUntil?
|
||
|
|
||
|
Optimizations
|
||
|
-------------
|
||
|
|
||
|
The html5lib tree builder doesn't use the standard tree-building API,
|
||
|
which worries me and has resulted in a number of bugs.
|
||
|
|
||
|
markup_attr_map can be optimized since it's always a map now.
|
||
|
|
||
|
Upon encountering UTF-16LE data or some other uncommon serialization
|
||
|
of Unicode, UnicodeDammit will convert the data to Unicode, then
|
||
|
encode it at UTF-8. This is wasteful because it will just get decoded
|
||
|
back to Unicode.
|
||
|
|
||
|
CDATA
|
||
|
-----
|
||
|
|
||
|
The elementtree XMLParser has a strip_cdata argument that, when set to
|
||
|
False, should allow Beautiful Soup to preserve CDATA sections instead
|
||
|
of treating them as text. Except it doesn't. (This argument is also
|
||
|
present for HTMLParser, and also does nothing there.)
|
||
|
|
||
|
Currently, htm5lib converts CDATA sections into comments. An
|
||
|
as-yet-unreleased version of html5lib changes the parser's handling of
|
||
|
CDATA sections to allow CDATA sections in tags like <svg> and
|
||
|
<math>. The HTML5TreeBuilder will need to be updated to create CData
|
||
|
objects instead of Comment objects in this situation.
|