**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Concept# Parallel algorithm

Résumé

In computer science, a parallel algorithm, as opposed to a traditional serial algorithm, is an algorithm which can do multiple operations in a given time. It has been a tradition of computer science to describe serial algorithms in abstract machine models, often the one known as random-access machine. Similarly, many computer science researchers have used a so-called parallel random-access machine (PRAM) as a parallel abstract machine (shared-memory).
Many parallel algorithms are executed concurrently – though in general concurrent algorithms are a distinct concept – and thus these concepts are often conflated, with which aspect of an algorithm is parallel and which is concurrent not being clearly distinguished. Further, non-parallel, non-concurrent algorithms are often referred to as "sequential algorithms", by contrast with concurrent algorithms.
Parallelizability
Analysis of parallel algorithms
Algorithms vary significantly in how parallelizable they are, ranging from ea

Source officielle

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Publications associées

Chargement

Personnes associées

Chargement

Unités associées

Chargement

Concepts associés

Chargement

Cours associés

Chargement

Séances de cours associées

Chargement

Personnes associées

Unités associées

Aucun résultat

Aucun résultat

Publications associées (30)

Chargement

Chargement

Chargement

Concepts associés (7)

Calcul distribué

Un calcul distribué, ou réparti ou encore partagé, est un calcul ou un traitement réparti sur plusieurs microprocesseurs et plus généralement sur plusieurs unités centrales informatiques, et on parle

Algorithme

thumb|Algorithme de découpe d'un polygone quelconque en triangles (triangulation).
Un algorithme est une suite finie et non ambiguë d'instructions et d’opérations permettant de résoudre une classe de

Parallélisme (informatique)

vignette|upright=1|Un des éléments de Blue Gene L cabinet, un des supercalculateurs massivement parallèles les plus rapides des années 2000.
En informatique, le parallélisme consiste à mettre en œuvr

Cours associés (11)

ME-465: Advanced heat transfer

The course will deepen the fundamentals of heat transfer. Particular focus will be put on radiative and convective heat transfer, and computational approaches to solve complex, coupled heat transfer problems.

CS-453: Concurrent algorithms

With the advent of multiprocessors, it becomes crucial to master the underlying algorithmics of concurrency. The objective of this course is to study the foundations of concurrent algorithms and in particular the techniques that enable the construction of robust such algorithms.

BIOENG-450: In silico neuroscience

"In silico Neuroscience" introduces students to a synthesis of modern neuroscience and state-of-the-art data management, modelling and computing technologies.

Sarvenaz Choobdar, Ali Marjovi

In most multi-robot systems, an individual robot is not capable of solving computationally hard problems due to lack of high processing power. This paper introduces the novel concept of robotic clusters to empower these systems in their problem solving. A robotic cluster is a group of individual robots which are able to share their processing resources, therefore, the robots can solve difficult problems by using the processing units of other robots. The concept, requirements, characteristics and architecture of robotic clusters are explained and then the problem of “topological map merging” is considered as a case study to describe the details of the presented idea and to evaluate its functionality. Additionally, a new parallel algorithm for solving this problem is developed. The experimental results proved that the robotic clusters remarkably speedup computations in multi-robot systems. The proposed mechanism can be used in many other robotic applications and has the potential to increase the performance of multi-robot systems especially for solving problems that need high processing resources.

In this thesis we study the efficient implementation of the finite element method for the numerical solution of partial differential equations (PDE) on modern parallel computer archi- tectures, such as Cray and IBM supercomputers. The domain-decomposition (DD) method represents the basis of parallel finite element software and is generally implemented such that the number of subdomains is equal to the number of MPI processes. We are interested in breaking this paradigm by introducing a second level of parallelism. Each subdomain is assigned to more than one processor and either MPI processes or multiple threads are used to implement the parallelism on the second level. The thesis is devoted to the study of this second level of parallelism and includes the stages described below. The algebraic additive Schwarz (AAS) domain-decomposition preconditioner is an integral part of the solution process. We seek to understand its performance on the parallel computers which we target and we introduce an improved construction approach for the parallel precon- ditioner. We examine a novel strategy for solving the AAS subdomain problems, using multiple MPI processes. At the subdomain level, this is represented by the ShyLU preconditioner. We bring improvements to its algorithm in the form of a novel inexact solver based on an incomplete QR (IQR) factorization. The performance of the new preconditioner framework is studied for Laplacian and advection-diffusion-reaction (ADR) problems and for Navier-Stokes problems, as a component within a larger framework of specialized preconditioners. The partitioning of the computational mesh comes with considerable memory limitations, when done at runtime on parallel computers, due to the low amount of available memory per processor. We describe and implement a solution to this problem, based on offloading the partitioning process to a preliminary offline stage of the simulation process. We also present the efficient implementation, based on parallel MPI collective instructions, of the routines which load the mesh parts during the simulation. We discuss an alternative parallel implementation of the finite element system assembly based on multi-threading. This new approach is used to supplement the existing one based on MPI parallelism, in situations where MPI alone can not make use of all the available parallel hardware resources. The work presented in the thesis has been done in the framework of two software projects: the Trilinos project and the LifeV parallel finite element modeling library. All the new develop- ments have been contributed back to the respective projects, to be used freely in subsequent public releases of the software.

Recent trends have led hardware manufacturers to place multiple processing cores on a single chip, making parallel programming the intended way of taking advantage of the increased processing power. However, bringing concurrency to average programmers is considered to be one of the major challenges in computer science today. The difficulty lies not only in writing correct parallel programs, but also in achieving the required efficiency and performance. For parallel programs, performance is not only about obtaining low execution times on a fixed number of cores, but also about maintaining efficiency as the number of available cores is increased. Ideally, programmers should have in their toolkit techniques that can be used when designing parallel programs, before any code is available for testing on production hardware, and which are then able to predict scalability once the program is implemented. Existing methods are either unreliable at predicting scalability, such as the case of disjoint-access parallelism, or do not apply to lock-based programs, which currently make up a large part of existing concurrent programs. Furthermore, using some of these techniques is so complicated that it outweighs the time required to implement, debug and test the program on real hardware. In this thesis we study the problem of predicting the scalability of concurrent programs without implementing them. This allows programmers in the design phase of a concurrent algorithm to choose only one or a few promising solutions that will be implemented, debugged and tested on production hardware. We first consider disjoint-access parallelism, an existing property that applies only to a very restricted class of programs. After an extensive practical evaluation spanning across a variety of scenarios, we find it to be ineffective at predicting scalability. For predicting the scalability of more general concurrent algorithms, we propose the obstruction degree, a new scalability metric based on the consistency requirements of algorithms. It applies to programs using locks, invalidation primitives and transactional memory. Our metric allows programmers to compare two given algorithms as well as predict their scalability limit, the maximum number of processors to which they can scale, thus allowing programmers to choose the appropriate size hardware for running their programs. We also examine the composition of relaxed memory transactions in order to combine the ease of programming offered by transactional memory with the increased scalability of transactions that circumvent the traditional transactional model. We present outheritance, a property we show to be both necessary and sufficient for ensuring the correct composition of relaxed transactions, and we show how to calculate the obstruction degree of compositions that use this new property. We use outheritance to build OE-STM, a new software transactional memory algorithm having elastic transactions that correctly compose.

Séances de cours associées (13)