In computer architecture, a delay slot is an instruction slot being executed without the effects of a preceding instruction. The most common form is a single arbitrary instruction located immediately after a branch instruction on a RISC or DSP architecture; this instruction will execute even if the preceding branch is taken. Thus, by design, the instructions appear to execute in an illogical or incorrect order. It is typical for assemblers to automatically reorder instructions by default, hiding the awkwardness from assembly developers and compilers.
When a branch instruction is involved, the location of the following delay slot instruction in the pipeline may be called a branch delay slot. Branch delay slots are found mainly in DSP architectures and older RISC architectures. MIPS, PA-RISC, ETRAX CRIS, SuperH, and SPARC are RISC architectures that each have a single branch delay slot; PowerPC, ARM, Alpha, and RISC-V do not have any. DSP architectures that each have a single branch delay slot include the VS DSP, μPD77230 and TMS320C3x. The SHARC DSP and MIPS-X use a double branch delay slot; such a processor will execute a pair of instructions following a branch instruction before the branch takes effect. The TMS320C4x uses a triple branch delay slot.
The following example shows delayed branches in assembly language for the SHARC DSP including a pair after the RTS instruction. Registers R0 through R9 are cleared to zero in order by number (the register cleared after R6 is R7, not R9). No instruction executes more than once.
R0 = 0;
CALL fn (DB); /* call a function, below at label "fn" /
R1 = 0; / first delay slot /
R2 = 0; / second delay slot /
/**** discontinuity here (the CALL takes effect) /
R6 = 0; / the CALL/RTS comes back here, not at "R1 = 0" /
JUMP end (DB);
R7 = 0; / first delay slot /
R8 = 0; / second delay slot /
/ discontinuity here (the JUMP takes effect) /
/ next 4 instructions are called from above, as function "fn" /
fn: R3 = 0;
RTS (DB); / return to caller, past the caller's delay slots /
R4 = 0; / first delay slot /
R5 = 0; / second delay slot /
/ discontinuity here (the RTS takes effect) *****/
end: R9 = 0;
The goal of a pipelined architecture is to complete an instruction every clock cycle.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
In computer architecture, predication is a feature that provides an alternative to conditional transfer of control, as implemented by conditional branch machine instructions. Predication works by having conditional (predicated) non-branch instructions associated with a predicate, a Boolean value used by the instruction to control whether the instruction is allowed to modify the architectural state or not. If the predicate specified in the instruction is true, the instruction modifies the architectural state; otherwise, the architectural state is unchanged.
In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow in the instruction pipeline. Branch predictors play a critical role in achieving high performance in many modern pipelined microprocessor architectures. Two-way branching is usually implemented with a conditional jump instruction.
A branch is an instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order. Branch (or branching, branched) may also refer to the act of switching execution to a different instruction sequence as a result of executing a branch instruction. Branch instructions are used to implement control flow in program loops and conditionals (i.e., executing a particular sequence of instructions only if certain conditions are satisfied).
Ce cours aborde la programmation de systèmes embarqués: la cross-compilation, l'utilisation d'une FPU dans des microcontrôleurs, l'utilisation d'instructions DSP et les mécanismes à disposition dans l
Multiprocessors are a core component in all types of computing infrastructure, from phones to datacenters. This course will build on the prerequisites of processor design and concurrency to introduce
The course studies techniques to exploit Instruction-Level Parallelism (ILP) statically and dynamically. It also addresses some aspects of the design of domain-specific accelerators. Finally, it explo
In this thesis, we revisit classic problems in shared-memory distributed computing through the lenses of (1) emerging hardware technologies and (2) changing requirements. Our contributions consist, on the one hand, in providing a better understanding of th ...
High-performance branch target buffers (BTBs) and the L1I cache are key to high-performance front-end. Modern branch predictors are highly accurate, but with an increase in code footprint in modern-day server workloads, BTB and L1I misses are still frequen ...
Code-reuse attacks are dangerous threats that attracted the attention of the security community for years. These attacks aim at corrupting important control-flow transfers for taking control of a process without injecting code. Nowadays, the combinations o ...
Explores MIPS assembly language, covering function calls, memory management, and data structures, including recursive functions, programming constructs, arrays, and linked lists.