In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capabilities. Most execution time of a scientific program is spent on loops; as such, many compiler optimization techniques have been developed to make them faster. Since instructions inside loops can be executed repeatedly, it is frequently not possible to give a bound on the number of instruction executions that will be impacted by a loop optimization. This presents challenges when reasoning about the correctness and benefits of a loop optimization, specifically the representations of the computation being optimized and the optimization(s) being performed. Loop optimization can be viewed as the application of a sequence of specific loop transformations (listed below or in Compiler transformations for high-performance computing) to the source code or intermediate representation, with each transformation having an associated test for legality. A transformation (or sequence of transformations) generally must preserve the temporal sequence of all dependencies if it is to preserve the result of the program (i.e., be a legal transformation). Evaluating the benefit of a transformation or sequence of transformations can be quite difficult within this approach, as the application of one beneficial transformation may require the prior use of one or more other transformations that, by themselves, would result in reduced performance. Common loop transformations include: Fission or distribution – loop fission attempts to break a loop into multiple loops over the same index range, but each new loop takes only part of the original loop's body. This can improve locality of reference, both of the data being accessed in the loop and the code in the loop's body. Fusion or combining – this combines the bodies of two adjacent loops that would iterate the same number of times (whether or not that number is known at compile time), as long as they make no reference to each other's data.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Cours associés (6)
CS-476: Embedded system design
Hardware-software co-design is a well known concept in embedded system design.It is also a concept required in designing FPGA-accelerators in data-centers.This course teaches how to transform algorith
CS-307: Introduction to multiprocessor architecture
Multiprocessors are a core component in all types of computing infrastructure, from phones to datacenters. This course will build on the prerequisites of processor design and concurrency to introduce
CS-420: Advanced compiler construction
Students learn several implementation techniques for modern functional and object-oriented programming languages. They put some of them into practice by developing key parts of a compiler and run time
Afficher plus
Séances de cours associées (55)
Écosystème Spark: Choix architecturaux
Explore les choix architecturaux de l'écosystème Spark, y compris les RDD et la tolérance aux pannes.
Analyse des flux de données : Inefficacités et optimisations de la traduction
Explore les inefficacités de traduction, les optimisations, les fonctions de levage, la conversion de fermeture et les concepts d'analyse de flux de données tels que les expressions disponibles et les variables en direct.
Flotter pour une meilleure natation
Analyse le compromis entre la consommation d'énergie et la vitesse de nage.
Afficher plus
Publications associées (115)

Decoding electroencephalographic responses to visual stimuli compatible with electrical stimulation

Silvestro Micera, Simone Romeni, Laura Toni, Fiorenzo Artoni

Electrical stimulation of the visual nervous system could improve the quality of life of patients affected by acquired blindness by restoring some visual sensations, but requires careful optimization of stimulation parameters to produce useful perceptions. ...
Aip Publishing2024

MOD2IR: High-Performance Code Generation for a Biophysically Detailed Neuronal Simulation DSL

Felix Schürmann, Pramod Shivaji Kumbhar, Omar Awile, Ioannis Magkanaris

Advances in computational capabilities and large volumes of experimental data have established computer simulations of brain tissue models as an important pillar in modern neuroscience. Alongside, a variety of domain specific languages (DSLs) have been dev ...
ACM2023

Learning-based techniques for lensless reconstruction

Yohann Loïc Yann Perron

In this internship, I explore different optimization algorithms for lensless imaging. Lensless imaging is a new imaging technique that replaces the lens of a camera with a diffuser mask. This allows for simpler and cheaper camera hardware. However, the rec ...
2023
Afficher plus
Concepts associés (4)
Dependence analysis
In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement S2 depends on S1 if S1 must be executed before S2. Broadly, there are two classes of dependencies--control dependencies and data dependencies. Dependence analysis determines whether it is safe to reorder or parallelize statements. Control dependency is a situation in which a program instruction executes if the previous instruction evaluates in a way that allows its execution.
Parallélisation automatique
La Parallélisation automatique est une étape de la compilation d'un programme qui consiste à transformer un code source écrit pour une machine séquentielle en un exécutable parallélisé pour ordinateur à Symmetric multiprocessing. L'objectif de la parallélisation automatique est de simplifier et de réduire la durée de développement des programmes parallèles, qui sont notablement plus compliqués à écrire que les programmes séquentiels mais permettent des gains de vitesse sur les machines parallèles.
Auto-vectorisation
L'auto-vectorisation est une technique de compilation de langage de programmation, permettant d'adapter automatiquement des boucles de fonctions traitant des vecteurs, ou, plus généralement, des matrices, à un processeur vectoriel ou bien un SIMD. On appelle plus généralement, le fait d'adapter des traitements à des processeurs vectoriels, de façon manuelle ou automatique, une vectorisation. Le compilateur Gnu GCC utilise des techniques d'auto-vectorisation basées en 2011 sur le framework tree-ssa pour la majorité des SIMD (3DNow!, SSE (et SSE2, SSE3), ARM NEON et l'équivalent d'ARM pour l'embarqué, MVE.
Afficher plus

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.