Package buildxml :: Package plugins :: Module portal :: Class SyncPlugin_portal
[hide private]
[frames] | no frames]

Class SyncPlugin_portal

source code


This is a generic plugin that incrementally queries plone portals for changes in their portal_catalog. It will be instantiated by xmlgetter.controller for every entry in the config.PORTALS variable in config and then its run() method from the base class is called.

Note: If you change the classname, you will have to do so in config.PORTAL_PLUGIN_NAME too. Also it has to match the modules name!

Instance Methods [hide private]
 
__init__(self, source_name, url, NO_NET=False)
Initialize the plugin and the State.
source code
bool
_loadState(self)
Load the state of the plugin from its state file.
source code
bool
_writeState(self)
Write the state of the plugin to its state file.
source code
bool
_getData(self)
Incrementally get all data from the plone portal,
source code
bool
_consolidate(self)
This function consolidates the existing data for the portal and the new entries fetched by _getData.
source code

Inherited from xmlgetter.plugin.BaseSyncPlugin: entries_written, run, source_name, stats, url

Inherited from xmlgetter.request.BaseRequester (private): _requestURL

Inherited from xmlgetter.log.BaseLogger: logger

Inherited from xmlgetter.log.BaseLogger (private): _getLogger

Class Variables [hide private]

Inherited from xmlgetter.log.BaseLogger (private): _loggers

Instance Variables [hide private]
PortalSourceState _state = None
The state of the plugin.
datetime _updated = None
Stores the date and time of last update.

Inherited from xmlgetter.log.BaseLogger (private): _source_name

Method Details [hide private]

__init__(self, source_name, url, NO_NET=False)
(Constructor)

source code 

Initialize the plugin and the State. Also set some variables for use in this plugin.

Parameters:
  • source_name (string) - The name of the source as displayed in logfile entries and statistics.
  • url (string) - The URL for the script remoteSyncQueryXML in the plone portal.
  • NO_NET (bool) - Whether to actually get data from the net, or just use that from a previous run - if available.
Overrides: xmlgetter.log.BaseLogger.__init__

_loadState(self)

source code 

Load the state of the plugin from its state file.

Returns: bool
True if state could be loaded, False otherwise.
Overrides: xmlgetter.plugin.BaseSyncPlugin._loadState

_writeState(self)

source code 

Write the state of the plugin to its state file.

Returns: bool
True if state could be successfully written.
Overrides: xmlgetter.plugin.BaseSyncPlugin._writeState

_getData(self)

source code 

Incrementally get all data from the plone portal,

Queries will continue to be sent to the portal until the response is only the string u'END'. Each query will request the next X entries specified in config.PORTAL_REQUEST_INCREMENT. If a request fails, the plugin will wait for X seconds specified in config.PORTAL_RETRY_WAIT and retry again, up to X times specified in config.MAX_PORTAL_RETRIES.

The result of the queries is written to _intermediate_temp_filename and will on success be copied to _temp_filename, which will then be processed by self._consolidate

Returns: bool
False if an error or warning occurred, True otherwise.
Overrides: xmlgetter.plugin.BaseSyncPlugin._getData

_consolidate(self)

source code 

This function consolidates the existing data for the portal and the new entries fetched by _getData.

There are four possibilities:

  1. The entry is totally new:
    • It has content
    • It has no entry in the portals state data
  2. The entry was modified:
    • It has content
    • It has an entry in the portals state data
  3. The entry is not new and was not modified, a stub:
    • It has no content
    • It has an entry in the portals state data
  4. Entry is static
    • It has content
    • It has a <static /> tag

In the first two cases, the newly fetched entry will be written to the the file _intermediate_xml_filename and on success be moved to _xml_filename.

In the third case, the entry is copied from the old data. To do this efficiently, the plugin saves a "URL to position and lengh" mapping in its state. So to read an entry from the old data, we only have to seek to the right position in the old data's file and copy the specified number of bytes over to the new data's file. Since the entries that are sent by the portal are sorted ascending by modfication time, and are alway processed in that order, we will never have to seek back wards in the old data's file, thus increasing performance.

Returns: bool
False if an error or warning occurred, True otherwise.
Overrides: xmlgetter.plugin.BaseSyncPlugin._consolidate

To Do: This can be serious: If an object in the portal reports a wrong modification date, this could lead to this case. Object A is not existant when the first run is initialized. On the second run, object A exists, but reports a modification date earlier than that of the first run, so it is sent as a stub, although the data of A has never been indexed before! If a entry is modified during the update and has already been acquired as beeing unmodified, there is a slim chance, that the entry appears as both, modified and as a stub. If a entry is deleted during the update there is a chance of receiving a duplicate or missing an entry. Also, this case could happen, if the modification date is not used properly in remoteSyncQueryXML - investigate! This could be solved by discarding incomplete entries, issuing a warning about this with full url of the entry and doing a full run every week or so... Or, collect all urls of the false stub entries and request them afterwards one by one. Would require changes in remoteSyncQueryXML. Second option sounds best... This has no priority at the moment, since the chances for this to occurr are very slim.