Package buildxml :: Package plugins :: Module forschdb :: Class SyncPlugin_forschdb

Class SyncPlugin_forschdb

Finds all publications in the Forschungsdatenbank from all 11 faculties and the Universitätsklinikum.

The plugin needs a template url entry in config.PLUGINS:

       ...
       {u'name': u'forschdb',
           u'url':
               (u'http://forschdb.verwaltung.uni-freiburg.de/servuni/'
               u'forschdbuni.fdbfbr1?Fakultaet=${fac}&Dokumentart='
               u'Publikation&Ausgabeart=xml&Jahr=1900-${to_year}')},
       ...

It will then replace ${to_year} with the current year and generate a list of 13 URLs replacing ${fac} onces with 99 and the other 12 times with values from the range (0, 11). These URLs will then be queried, resulting each in a XML document with all publication entries for the faculty fac from the year 1900 until now.

The contents of each <publication> entry is then parsed with BeautifulSoup and a XMLEntry is produced. The content of the XMLEntry will be produced according to common citation rules, which presently distinguish five different types of publications:

"Buchbeitrag"
"Monografie und Herausgeberschrift"
"Edition und Uebersetzung"
"Sonstiges"
"Artikel"

"Artikel" will be the catch-all for unknown types of publications, of which there are presently none.

To Do: Have a look at memory consumption and optimize!

Notes:

All authors of a publication will be listed in the content of the XMLEntry to increase the findability. This does not however conform to standard standard citation rules. Since the Furschungsdatenbank also uses this citation style for authors, this increases coherence.
The function _getData uses german variable names to be coherent with the naming of the XML elements it processes.

Instance Methods

[hide private]

_extractTagData(self, tag, tagname=None)
Extracts data from a BeautifulSoup.Tag instance. source code

_extractAuthor(self, tag)
Extracts author data from a BeautifulSoup.Tag instance. source code

bool

_getData(self)
Gets the data from the Forschungsdatenbank.

source code

Inherited from xmlgetter.plugin.BaseSyncPlugin: __init__, entries_written, run, source_name, stats, url

Inherited from xmlgetter.plugin.BaseSyncPlugin (private): _consolidate, _loadState, _writeEntries, _writeState

Inherited from xmlgetter.request.BaseRequester (private): _requestURL

Inherited from xmlgetter.log.BaseLogger: logger

Inherited from xmlgetter.log.BaseLogger (private): _getLogger

Class Variables

[hide private]

Inherited from xmlgetter.log.BaseLogger (private): _loggers

Instance Variables

[hide private]

Inherited from xmlgetter.plugin.BaseSyncPlugin (private): _NO_NET, _base_url, _entries, _entries_written, _from_date, _intermediate_temp_filename, _intermediate_xml_filename, _stats, _temp_filename, _url, _xml_filename

Inherited from xmlgetter.log.BaseLogger (private): _source_name

Method Details

[hide private]

_extractTagData(self, tag, tagname=None)

source code

Extracts data from a BeautifulSoup.Tag instance.

Parameters:

tag (BeautifulSoup.Tag) - The root tag from which on to search for the data.
tagname (string) - The name of the tag that contains the data. If tagname is None, the data will be extracted from tag itself.

Returns:

A string representing the found data, or None if no data could be found.

_extractAuthor(self, tag)

source code

Extracts author data from a BeautifulSoup.Tag instance.

Parameters:

tag (BeautifulSoup.Tag) - The root tag from which on to search for the data.

Returns:

All authors of the publication concatenated and separated by a ','. If no author could be found, None is returned.

_getData(self)

source code

Gets the data from the Forschungsdatenbank.

Retrieves all publications for each faculty and the university hospital.

Uses german variable names to be coherent with the XML data retrieved.

Returns: bool: False if an error or warning occured, True otherwise.
Overrides: xmlgetter.plugin.BaseSyncPlugin._getData