Home | Trees | Indices | Help |
---|
|
Spiders the webpage of the Studentenwerk starting from the sitemap
under
http://www.studentenwerk.uni-freiburg.de/index.php?id=272
.
Finds all URLs in the content region of the sitemap's HTML and
recurses to a level of DEPTH. It will currently generate a XMLEntry only for pages that have a
content-type
of u'text/html' - indexing of PDF, DOC, PS,
XSL, PPT is not yet supported.
|
|||
|
|||
bool |
|
||
Inherited from Inherited from Inherited from Inherited from Inherited from |
|
|||
Inherited from |
|
|||
_index_spider = None A instance of IndexSpider, which will handle the spidering. |
|||
Inherited from Inherited from |
|
Initialize the plugin. An IndexSpider instance with the appropriate initial values
is created and assigned to _index_spider. During indexing
we only want to consider the
|
We request our page data from the IndexSpider, until it runs out of new pages. The page coming from the spider is guaranteed to have a MIME type we have requested and to be unique (no duplicates). So we can just focus on the extraction of the content and the generation of an XMLEntry per page.
|
Home | Trees | Indices | Help |
---|
Generated by Epydoc 3.0.1 on Thu Sep 16 13:42:03 2010 | http://epydoc.sourceforge.net |