Summary
Automatic vectorization, in parallel computing, is a special case of automatic parallelization, where a computer program is converted from a scalar implementation, which processes a single pair of operands at a time, to a vector implementation, which processes one operation on multiple pairs of operands at once. For example, modern conventional computers, including specialized supercomputers, typically have vector operations that simultaneously perform operations such as the following four additions (via SIMD or SPMD hardware): However, in most programming languages one typically writes loops that sequentially perform additions of many numbers. Here is an example of such a loop, written in C: for (i = 0; i < n; i++) c[i] = a[i] + b[i]; A vectorizing compiler transforms such loops into sequences of vector operations. These vector operations perform additions on blocks of elements from the arrays a, b and c. Automatic vectorization is a major research topic in computer science. Early computers usually had one logic unit, which executed one instruction on one pair of operands at a time. Computer languages and programs therefore were designed to execute in sequence. Modern computers, though, can do many things at once. So, many optimizing compilers perform automatic vectorization, where parts of sequential programs are transformed into parallel operations. Loop vectorization transforms procedural loops by assigning a processing unit to each pair of operands. Programs spend most of their time within such loops. Therefore, vectorization can significantly accelerate them, especially over large data sets. Loop vectorization is implemented in Intel's MMX, SSE, and AVX, in Power ISA's AltiVec, and in ARM's NEON, SVE and SVE2 instruction sets. Many constraints prevent or hinder vectorization. Sometimes vectorization can slow down execution, for example because of pipeline synchronization or data-movement timing. Loop dependence analysis identifies loops that can be vectorized, relying on the data dependence of the instructions inside loops.
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (5)
DH-406: Machine learning for DH
This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple
CS-119(c): Information, Computation, Communication
L'objectif de ce cours est d'introduire les étudiants à la pensée algorithmique, de les familiariser avec les fondamentaux de l'Informatique et de développer une première compétence en programmation (
CS-307: Introduction to multiprocessor architecture
Multiprocessors are a core component in all types of computing infrastructure, from phones to datacenters. This course will build on the prerequisites of processor design and concurrency to introduce
Show more
Related lectures (31)
Computing Centroids and Tolerances
Addresses common problems in computing centroids and the importance of tolerances.
DBMS Processing Models
Explores DBMS processing models, operator boundaries, fusion, and code generation using LLVM.
Matlab Programming: Script and Function
Explores Matlab programming with scripts and functions, vectorization, and 2D graphics.
Show more
Related publications (23)

Synthetic Generation of Activity-related data

Quentin Philippe Bochud

The field of synthetic data is more and more present in our everyday life. The transportation domain is particularly interested in improving the methods for the generation of synthetic data in order to address the privacy and availability issue of real dat ...
2023
Show more
Related concepts (6)
Intrinsic function
In computer software, in compiler theory, an intrinsic function (or built-in function) is a function (subroutine) available for use in a given programming language whose implementation is handled specially by the compiler. Typically, it may substitute a sequence of automatically generated instructions for the original function call, similar to an inline function. Unlike an inline function, the compiler has an intimate knowledge of an intrinsic function and can thus better integrate and optimize it for a given situation.
Loop optimization
In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capabilities. Most execution time of a scientific program is spent on loops; as such, many compiler optimization techniques have been developed to make them faster. Since instructions inside loops can be executed repeatedly, it is frequently not possible to give a bound on the number of instruction executions that will be impacted by a loop optimization.
AltiVec
AltiVec is a single-precision floating point and integer SIMD instruction set designed and owned by Apple, IBM, and Freescale Semiconductor (formerly Motorola's Semiconductor Products Sector) — the AIM alliance. It is implemented on versions of the PowerPC processor architecture, including Motorola's G4, IBM's G5 and POWER6 processors, and P.A. Semi's PWRficient PA6T. AltiVec is a trademark owned solely by Freescale, so the system is also referred to as Velocity Engine by Apple and VMX (Vector Multimedia Extension) by IBM and P.
Show more