Package buildxml :: Package tools :: Module BeautifulSoup :: Class ICantBelieveItsBeautifulSoup
[hide private]
[frames] | no frames]

Class ICantBelieveItsBeautifulSoup

source code


The BeautifulSoup class is oriented towards skipping over
common HTML errors like unclosed tags. However, sometimes it makes
errors of its own. For instance, consider this fragment:

 <b>Foo<b>Bar</b></b>

This is perfectly valid (if bizarre) HTML. However, the
BeautifulSoup class will implicitly close the first b tag when it
encounters the second 'b'. It will think the author wrote
"<b>Foo<b>Bar", and didn't close the first 'b' tag, because
there's no real-world reason to bold something that's already
bold. When it encounters '</b></b>' it will close two more 'b'
tags, for a grand total of three tags closed instead of two. This
can throw off the rest of your document structure. The same is
true of a number of other tags, listed below.

It's much more common for someone to forget to close a 'b' tag
than to actually use nested 'b' tags, and the BeautifulSoup class
handles the common case. This class handles the not-co-common
case: where you can't believe someone wrote what they did, but
it's valid HTML and BeautifulSoup screwed up by assuming it
wouldn't be.

Instance Methods [hide private]

Inherited from BeautifulSoup: __init__, start_meta

Inherited from BeautifulStoneSoup: __getattr__, convert_charref, endData, handle_charref, handle_comment, handle_data, handle_decl, handle_entityref, handle_pi, isSelfClosingTag, parse_declaration, popTag, pushTag, reset, unknown_endtag, unknown_starttag

Inherited from Tag: __call__, __contains__, __delitem__, __eq__, __getitem__, __iter__, __len__, __ne__, __nonzero__, __repr__, __setitem__, __str__, __unicode__, childGenerator, clear, decompose, fetch, fetchText, find, findAll, findChild, findChildren, first, firstText, get, getString, getText, has_key, index, prettify, recursiveChildGenerator, renderContents, setString, text

Inherited from Tag (private): _convertEntities, _getAttrMap, _invert, _sub_entity

Inherited from PageElement: append, extract, fetchNextSiblings, fetchParents, fetchPrevious, fetchPreviousSiblings, findAllNext, findAllPrevious, findNext, findNextSibling, findNextSiblings, findParent, findParents, findPrevious, findPreviousSibling, findPreviousSiblings, insert, nextGenerator, nextSiblingGenerator, parentGenerator, previousGenerator, previousSiblingGenerator, replaceWith, replaceWithChildren, setup, substituteEncoding, toEncoding

Inherited from PageElement (private): _findAll, _findOne, _lastRecursiveChild

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

Inherited from sgmllib.SGMLParser: close, convert_codepoint, convert_entityref, error, feed, finish_endtag, finish_shorttag, finish_starttag, get_starttag_text, goahead, handle_endtag, handle_starttag, parse_endtag, parse_pi, parse_starttag, report_unbalanced, setliteral, setnomoretags, unknown_charref, unknown_entityref

Inherited from sgmllib.SGMLParser (private): _convert_ref

Inherited from markupbase.ParserBase: getpos, parse_comment, parse_marked_section, unknown_decl, updatepos

Inherited from markupbase.ParserBase (private): _parse_doctype_attlist, _parse_doctype_element, _parse_doctype_entity, _parse_doctype_notation, _parse_doctype_subset, _scan_name

Class Variables [hide private]
  I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS = 'em', 'big', 'i',...
  I_CANT_BELIEVE_THEYRE_NESTABLE_BLOCK_TAGS = 'noscript'
  NESTABLE_TAGS = buildTagMap([], BeautifulSoup.NESTABLE_TAGS, I...

Inherited from BeautifulSoup: CHARSET_RE, NESTABLE_BLOCK_TAGS, NESTABLE_INLINE_TAGS, NESTABLE_LIST_TAGS, NESTABLE_TABLE_TAGS, NON_NESTABLE_BLOCK_TAGS, PRESERVE_WHITESPACE_TAGS, QUOTE_TAGS, RESET_NESTING_TAGS, SELF_CLOSING_TAGS

Inherited from BeautifulStoneSoup: ALL_ENTITIES, HTML_ENTITIES, MARKUP_MASSAGE, ROOT_TAG_NAME, STRIP_ASCII_SPACES, XHTML_ENTITIES, XML_ENTITIES

Inherited from Tag: BARE_AMPERSAND_OR_BRACKET, XML_ENTITIES_TO_SPECIAL_CHARS, XML_SPECIAL_CHARS_TO_ENTITIES, string

Inherited from sgmllib.SGMLParser: entity_or_charref, entitydefs

Inherited from sgmllib.SGMLParser (private): _decl_otherchars

Properties [hide private]

Inherited from object: __class__

Class Variable Details [hide private]

I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS

Value:
'em', 'big', 'i', 'small', 'tt', 'abbr', 'acronym', 'strong', 'cite', \
'code', 'dfn', 'kbd', 'samp', 'strong', 'var', 'b', 'big'

NESTABLE_TAGS

Value:
buildTagMap([], BeautifulSoup.NESTABLE_TAGS, I_CANT_BELIEVE_THEYRE_NES\
TABLE_BLOCK_TAGS, I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS)