This function consolidates the existing data for the portal and the
new entries fetched by _getData.
There are four possibilities:
-
The entry is totally new:
-
It has content
-
It has no entry in the portals state data
-
The entry was modified:
-
It has content
-
It has an entry in the portals state data
-
The entry is not new and was not modified, a stub:
-
It has no content
-
It has an entry in the portals state data
-
Entry is static
-
It has content
-
It has a <static /> tag
In the first two cases, the newly fetched entry will be written to the
the file _intermediate_xml_filename and
on success be moved to _xml_filename.
In the third case, the entry is copied from the old data. To do this
efficiently, the plugin saves a "URL to position and lengh"
mapping in its state. So to read an entry from the old data, we only have
to seek to the right position in the old data's file and copy the
specified number of bytes over to the new data's file. Since the entries
that are sent by the portal are sorted ascending by modfication time, and
are alway processed in that order, we will never have to seek back wards
in the old data's file, thus increasing performance.
- Returns: bool
False if an error or warning occurred,
True otherwise.
- Overrides:
xmlgetter.plugin.BaseSyncPlugin._consolidate
To Do:
This can be serious: If an object in the portal reports a wrong
modification date, this could lead to this case. Object A is not
existant when the first run is initialized. On the second run, object A
exists, but reports a modification date earlier than that of the first
run, so it is sent as a stub, although the data of A has never been
indexed before! If a entry is modified during the update and has
already been acquired as beeing unmodified, there is a slim chance,
that the entry appears as both, modified and as a stub. If a entry is
deleted during the update there is a chance of receiving a duplicate or
missing an entry. Also, this case could happen, if the modification
date is not used properly in remoteSyncQueryXML -
investigate! This could be solved by discarding incomplete entries,
issuing a warning about this with full url of the entry and doing a
full run every week or so... Or, collect all urls of the false stub
entries and request them afterwards one by one. Would require changes
in remoteSyncQueryXML . Second option sounds best... This
has no priority at the moment, since the chances for this to
occurr are very slim.
|