Web scraping | EPFL Graph Search

Related courses (20)

COM-412: Semester research project in Data Science

Individual research during the semester under the guidance of a professor or an assistant.

COM-416: Research project in communication systems II

Individual research during the semester under the guidance of a professor or an assistant.

COM-507: Optional research project in communication Systems

Individual research during the semester under the guidance of a professor or an assistant.

Related lectures (6)

Real GDP Tracking

Covers the characterization of Leptazolines A-D and real-time data retrieval methods, web scraping, reverse engineering, and intraday data challenges.

HTTP Basics: Request, Response, HTML

Introduces HTTP basics, HTML, and tools for web scraping.

Semantic Web: Modeling and Ontologies

Explores the Semantic Web, database schemas, XML data model, and ontologies.

Related publications (23)

From scattered sources to comprehensive technology landscape : A recommendation-based retrieval approach

Karl Aberer, Chi Thang Duong

Mapping the technology landscape is crucial for market actors to take informed investment decisions. However, given the large amount of data on the Web and its subsequent information overload, manually retrieving information is a seemingly ineffective and ...

ELSEVIER2023

Efficient and Effective Multi-Modal Queries Through Heterogeneous Network Embedding

Karl Aberer, Quoc Viet Hung Nguyen, Thanh Trung Huynh, Thành Tâm Nguyên, Chi Thang Duong

The heterogeneity of today's Web sources requires information retrieval (IR) systems to handle multi-modal queries. Such queries define a user's information needs by different data modalities, such as keywords, hashtags, user profiles, and other media. Rec ...

IEEE COMPUTER SOC2022

Wasserstein Adversarial Regularization for learning with label noise

Devis Tuia, Sylvain Lobry, Nicolas Courty

Noisy labels often occur in vision datasets, especially when they are obtained from crowdsourcing or Web scraping. We propose a new regularization method, which enables learning robust classifiers in presence of noisy data. To achieve this goal, we propose ...

2021

Related concepts (6)

Data scraping

Data scraping is a technique where a computer program extracts data from human-readable output coming from another program. Normally, data transfer between programs is accomplished using data structures suited for automated processing by computers, not people. Such interchange and protocols are typically rigidly structured, well-documented, easily parsed, and minimize ambiguity. Very often, these transmissions are not human-readable at all.

Microformat

Microformats (μF) are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data (such as contact information, geographic coordinates, events, blog posts, products, recipes, etc.). They allow software to process the information reliably by having set classes refer to a specific type of data rather than being arbitrary. Microformats emerged around 2005 and were predominantly designed for use by search engines, web syndication and aggregators such as RSS.

Data extraction

Data extraction is the act or process of retrieving data out of (usually unstructured or poorly structured) data sources for further data processing or data storage (data migration). The import into the intermediate extracting system is thus usually followed by data transformation and possibly the addition of metadata prior to export to another stage in the data workflow. Usually, the term data extraction is applied when (experimental) data is first imported into a computer from primary sources, like measuring or recording devices.