In parallel computing, granularity (or grain size) of a task is a measure of the amount of work (or computation) which is performed by that task.
Another definition of granularity takes into account the communication overhead between multiple processors or processing elements. It defines granularity as the ratio of computation time to communication time, wherein computation time is the time required to perform the computation of a task and communication time is the time required to exchange data between processors.
If T_comp is the computation time and T_comm denotes the communication time, then the granularity G of a task can be calculated as:
Granularity is usually measured in terms of the number of instructions which are executed in a particular task. Alternately, granularity can also be specified in terms of the execution time of a program, combining the computation time and communication time.
Depending on the amount of work which is performed by a parallel task, parallelism can be classified into three categories: fine-grained, medium-grained and coarse-grained parallelism.
In fine-grained parallelism, a program is broken down to a large number of small tasks. These tasks are assigned individually to many processors. The amount of work associated with a parallel task is low and the work is evenly distributed among the processors. Hence, fine-grained parallelism facilitates load balancing.
As each task processes less data, the number of processors required to perform the complete processing is high. This in turn, increases the communication and synchronization overhead.
Fine-grained parallelism is best exploited in architectures which support fast communication. Shared memory architecture which has a low communication overhead is most suitable for fine-grained parallelism.
It is difficult for programmers to detect parallelism in a program, therefore, it is usually the compilers' responsibility to detect fine-grained parallelism.
An example of a fine-grained system (from outside the parallel computing domain) is the system of neurons in our brain.
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
From sensors,to smart phones,to the world's largest datacenters and supercomputers, parallelism & concurrency is ubiquitous in modern computing.There are also many forms of parallel & concurrent execu
Course no longer offered for new students; this edition is only a make-up course for those who repeated the year. Please log in with EPFL credentials and consult the mediaspace link below for course v
The course studies techniques to exploit Instruction-Level Parallelism (ILP) statically and dynamically. It also addresses some aspects of the design of domain-specific accelerators. Finally, it explo
Explore la programmation dynamique dans la conception de processeurs pour augmenter le parallélisme en exécutant des instructions hors de l'ordre, en améliorant les performances et l'efficacité.
Verification and testing of hardware heavily relies on cycle-accurate simulation of RTL.As single-processor performance is growing only slowly, conventional, single-threaded RTL simulation is becoming impractical for increasingly complex chip designs and s ...
EPFL2024
, , , , , ,
Due to its high parallelism, belief propagation (BP)decoding is amenable to high-throughput applications and thusrepresents a promising solution for the ultra-high peak datarate required by future communication systems. To bridge theperformance gap compare ...
Cycles are one of the fundamental subgraph patterns and being able to enumerate them in graphs enables important applications in a wide variety of fields, including finance, biology, chemistry, and network science. However, to enable cycle enumeration in r ...