Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Because the market has an insatiable appetite for new functionality, performance is becoming an increasingly important factor. The telecommunication and network domains are especially touched by this phenomenon but they are not the only ones. For instance, the automotive applications are also affected by the passion around the electronic devices that are programmable. This thesis work will focus on embedded applications based on programmable processing unit. Indeed, nowadays, for time to market reasons, re-use and flexibility are important parameters. Consequently, embedded processors seem to be the solution because the behavior of the circuit can be modified by software what does not cost a lot compared to Application Specific Integrated Circuits (ASICs) or Digital Signal Processors (DSPs) where hardware modifications are necessary. This choice is judicious compared to multi-pipeline processors like superscalar or Very Long Instruction Word (VLIW) architectures or even in comparison to a Field Programmable Gate Array (FPGA) which require more silicon area, consume more energy and are not as robust as simple scalar processors. Nevertheless, commercial scalar processors, dedicated to embedded systems, have poor frequencies which has a negative effect on their performance. This phenomenon is even more visible with deep-submicron technologies where the primary memories and wire delays do not scale as fast as the logic. Furthermore, the memory speed decreases when their capacity of storage increases and depends on both their organization (associativity, word size, etc.) and the IPs of the foundry. Likewise, synthesizable IP memories have a greater access time than their hard macrocell counterparts. This thesis work proposes a new synthesizable architecture for scalar embedded processors dedicated to alleviate the drawbacks previously mentioned and called Memory Wall : so, its goal is to push back the limits of frequency without introducing wasted cycles used to solve data and control dependencies, independently of the foundry. The architecture that came out, called Deep Submicron RISC (DSR), is made up of a single pipeline with eight stages that executes the instructions in order. In addition to tackle the memory access time and to alleviate the delays of wires, it is appropriate to minimize the power consumption. The proposed architecture is compared to two MIPS architectures, the MIPS R3000 and the MIPS32 24k in order to be able to analyze the performance of the architectures themselves, independently of both Instruction Set (ISA) MIPS 1 and compiler. The R3000 is a processor born in the 90's and the 24k came out in 2004. Obviously, the study reveals that the five-stage processor remains efficient – especially in comparison to the MIPS24k – when the critical path passes by the core and not by the primary memories. Even if the MIPS24k tackles in part the Memory Wall, DSR is much more efficient and reach a gain of efficiency – defined as performance/surface – of 72% thanks to its High-density version (DSR-HD) compared to a five-stage processor. DSR is even more efficient than the two MIPS processors when the transistor channel length decreases, the wire delays are important or the memories are large and their organization complex.
Adam Shmuel Teman, Robert Giterman