Package buildxml :: Package plugins :: Module forschdb :: Class SyncPlugin_forschdb
[hide private]
[frames] | no frames]

Class SyncPlugin_forschdb

source code


Finds all publications in the Forschungsdatenbank from all 11 faculties and the Universitätsklinikum.

The plugin needs a template url entry in config.PLUGINS:

       ...
       {u'name': u'forschdb',
           u'url':
               (u'http://forschdb.verwaltung.uni-freiburg.de/servuni/'
               u'forschdbuni.fdbfbr1?Fakultaet=${fac}&Dokumentart='
               u'Publikation&Ausgabeart=xml&Jahr=1900-${to_year}')},
       ...

It will then replace ${to_year} with the current year and generate a list of 13 URLs replacing ${fac} onces with 99 and the other 12 times with values from the range (0, 11). These URLs will then be queried, resulting each in a XML document with all publication entries for the faculty fac from the year 1900 until now.

The contents of each <publication> entry is then parsed with BeautifulSoup and a XMLEntry is produced. The content of the XMLEntry will be produced according to common citation rules, which presently distinguish five different types of publications:

"Artikel" will be the catch-all for unknown types of publications, of which there are presently none.


To Do: Have a look at memory consumption and optimize!

Notes:
Instance Methods [hide private]
 
_extractTagData(self, tag, tagname=None)
Extracts data from a BeautifulSoup.Tag instance.
source code
 
_extractAuthor(self, tag)
Extracts author data from a BeautifulSoup.Tag instance.
source code
bool
_getData(self)
Gets the data from the Forschungsdatenbank.
source code

Inherited from xmlgetter.plugin.BaseSyncPlugin: __init__, entries_written, run, source_name, stats, url

Inherited from xmlgetter.request.BaseRequester (private): _requestURL

Inherited from xmlgetter.log.BaseLogger: logger

Inherited from xmlgetter.log.BaseLogger (private): _getLogger

Class Variables [hide private]

Inherited from xmlgetter.log.BaseLogger (private): _loggers

Instance Variables [hide private]

Inherited from xmlgetter.log.BaseLogger (private): _source_name

Method Details [hide private]

_extractTagData(self, tag, tagname=None)

source code 

Extracts data from a BeautifulSoup.Tag instance.

Parameters:
  • tag (BeautifulSoup.Tag) - The root tag from which on to search for the data.
  • tagname (string) - The name of the tag that contains the data. If tagname is None, the data will be extracted from tag itself.
Returns:
A string representing the found data, or None if no data could be found.

_extractAuthor(self, tag)

source code 

Extracts author data from a BeautifulSoup.Tag instance.

Parameters:
  • tag (BeautifulSoup.Tag) - The root tag from which on to search for the data.
Returns:
All authors of the publication concatenated and separated by a ','. If no author could be found, None is returned.

_getData(self)

source code 

Gets the data from the Forschungsdatenbank.

Retrieves all publications for each faculty and the university hospital.

Uses german variable names to be coherent with the XML data retrieved.

Returns: bool
False if an error or warning occured, True otherwise.
Overrides: xmlgetter.plugin.BaseSyncPlugin._getData