**Extract and Analyze Scientist's Homepages utilizing Common Crawl** ==================================================================== This page yields information and documentation about a bachelor project (6 ECTS) done in the summer semester 2017 at the `Chair for Algorithms and Data Structures `_, `Department of Computer Science `_, `University of Freiburg `_, headed by `Hannah Bast `_. Project description ------------------- The goal of this project is to use the open web crawl data archive of `Common Crawl `_ to get scientist's personal Web pages. Further extract structured data from scientist's personal Web pages like their name, profession, affiliation and gender. What is covered in this project page? ------------------------------------- This documentation page provides information about the approach, results and produced code of the project and should enable you to reproduce the results and take them as a starting point for your own work or just to get inspirations about how certain parts work. To get an overview you can read the :ref:`sec-experiments` section which contains all the steps and results of this project in a nutshell and links to more detailed documentation for each step. Contents -------- .. toctree:: :maxdepth: 3 experiments common_crawl software_requirements