Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.
Digital libraries are libraries in which collections are stored in a digital format (the metadata at least). Digital libraries are now being made publicly available. However, building good user interfaces to query heterogeneous libraries requires to have a good knowledge on the type of available information (e.g. which attributes are useful for filtering). In this project, we harvest (using the Z39.50 and OAI-PMH protocol) and analyze (in terms of useful attributes for querying) four important digital libraries: Nebis (five million items), Infoscience (sixty thousand items), CiteSeer (seven hundred thousand items) and The European Library (one and a half million items).