**Extract and Analyze Scientist's Homepages utilizing Common Crawl**
====================================================================
This page yields information and documentation about a bachelor project (6 ECTS) done in the summer semester 2017 at the
`Chair for Algorithms and Data Structures `_,
`Department of Computer Science `_,
`University of Freiburg `_, headed by
`Hannah Bast `_.
Project description
-------------------
The goal of this project is to use the open web crawl data archive of `Common Crawl `_ to
get scientist's personal Web pages. Further extract structured data from scientist's personal Web pages like their
name, profession, affiliation and gender.
What is covered in this project page?
-------------------------------------
This documentation page provides information about the approach, results and produced code of the project and
should enable you to reproduce the results and take them as a starting point for your own work or just to get
inspirations about how certain parts work.
To get an overview you can read the :ref:`sec-experiments` section which contains all the steps and results of this
project in a nutshell and links to more detailed documentation for each step.
Contents
--------
.. toctree::
:maxdepth: 3
experiments
common_crawl
software_requirements