Home | Trees | Indices | Help |
---|
|
URL stack for spidering websites.
This is a simple stack, that takes an additional argument: the level of the parent document of the element to be pushed. If the level of the new element would exceed the limit, it is not added. It also keeps track of popped elements and refuses to add an element that was already on the stack in the past.
|
|||
|
|||
|
|||
dict |
|
||
int |
|
||
list of string. |
|
||
Inherited from Inherited from |
|
|||
Inherited from |
|
|||
list of dict |
_urls_info = None A list of dictionaries of the form {u'url: url, u'level':
level} .
|
||
list |
_urls = None The stack of only the URLs, not the level information. |
||
set |
_checked_urls = None A set of URLs that have already been on the stack. |
||
int |
_max_level = -1 The maximum depth to which URLs should be accepted on the stack. |
||
Inherited from |
|
Initialize the URLStack.
|
Push the URL on top of stack, if
|
Pop the top element from the stack.
|
Make
|
Return set of checked_urls.
|
|
_urlsThe stack of only the URLs, not the level information. Required to check if the URL has been on the stack before.
|
Home | Trees | Indices | Help |
---|
Generated by Epydoc 3.0.1 on Thu Sep 16 13:42:04 2010 | http://epydoc.sourceforge.net |