Package buildxml :: Package tools :: Module BeautifulSoup :: Class MinimalSoup
[hide private]
[frames] | no frames]

Class MinimalSoup

source code


The MinimalSoup class is for parsing HTML that contains pathologically bad markup. It makes no assumptions about tag nesting, but it does know which tags are self-closing, that <script> tags contain Javascript and should not be parsed, that META tags may contain encoding information, and so on.

This also makes it better for subclassing than BeautifulStoneSoup or BeautifulSoup.

Instance Methods [hide private]

Inherited from BeautifulSoup: __init__, start_meta

Inherited from BeautifulStoneSoup: __getattr__, convert_charref, endData, handle_charref, handle_comment, handle_data, handle_decl, handle_entityref, handle_pi, isSelfClosingTag, parse_declaration, popTag, pushTag, reset, unknown_endtag, unknown_starttag

Inherited from Tag: __call__, __contains__, __delitem__, __eq__, __getitem__, __iter__, __len__, __ne__, __nonzero__, __repr__, __setitem__, __str__, __unicode__, childGenerator, clear, decompose, fetch, fetchText, find, findAll, findChild, findChildren, first, firstText, get, getString, getText, has_key, index, prettify, recursiveChildGenerator, renderContents, setString, text

Inherited from Tag (private): _convertEntities, _getAttrMap, _invert, _sub_entity

Inherited from PageElement: append, extract, fetchNextSiblings, fetchParents, fetchPrevious, fetchPreviousSiblings, findAllNext, findAllPrevious, findNext, findNextSibling, findNextSiblings, findParent, findParents, findPrevious, findPreviousSibling, findPreviousSiblings, insert, nextGenerator, nextSiblingGenerator, parentGenerator, previousGenerator, previousSiblingGenerator, replaceWith, replaceWithChildren, setup, substituteEncoding, toEncoding

Inherited from PageElement (private): _findAll, _findOne, _lastRecursiveChild

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

Inherited from sgmllib.SGMLParser: close, convert_codepoint, convert_entityref, error, feed, finish_endtag, finish_shorttag, finish_starttag, get_starttag_text, goahead, handle_endtag, handle_starttag, parse_endtag, parse_pi, parse_starttag, report_unbalanced, setliteral, setnomoretags, unknown_charref, unknown_entityref

Inherited from sgmllib.SGMLParser (private): _convert_ref

Inherited from markupbase.ParserBase: getpos, parse_comment, parse_marked_section, unknown_decl, updatepos

Inherited from markupbase.ParserBase (private): _parse_doctype_attlist, _parse_doctype_element, _parse_doctype_entity, _parse_doctype_notation, _parse_doctype_subset, _scan_name

Class Variables [hide private]
  RESET_NESTING_TAGS = buildTagMap('noscript')
  NESTABLE_TAGS = {}

Inherited from BeautifulSoup: CHARSET_RE, NESTABLE_BLOCK_TAGS, NESTABLE_INLINE_TAGS, NESTABLE_LIST_TAGS, NESTABLE_TABLE_TAGS, NON_NESTABLE_BLOCK_TAGS, PRESERVE_WHITESPACE_TAGS, QUOTE_TAGS, SELF_CLOSING_TAGS

Inherited from BeautifulStoneSoup: ALL_ENTITIES, HTML_ENTITIES, MARKUP_MASSAGE, ROOT_TAG_NAME, STRIP_ASCII_SPACES, XHTML_ENTITIES, XML_ENTITIES

Inherited from Tag: BARE_AMPERSAND_OR_BRACKET, XML_ENTITIES_TO_SPECIAL_CHARS, XML_SPECIAL_CHARS_TO_ENTITIES, string

Inherited from sgmllib.SGMLParser: entity_or_charref, entitydefs

Inherited from sgmllib.SGMLParser (private): _decl_otherchars

Properties [hide private]

Inherited from object: __class__