When it comes to performance, embedded systems share many problems with their higher-end counterparts. The growing gap between top processor frequency and memory access speed, the memory wall, is one such problem. Driven, in part, by low energy consumption and low cost requirements, embedded systems are often customized to a single application, or a very small set of applications. In addition, time-to-market requirements and the increasing complexity of embedded systems drives the need for fully or partially automated design tools and also to the extensive use of caches and cache hierarchies. The introduction of multi-processor-based embedded platforms has accelerated this trend; as the design space for embedded systems has grown, designers have become unclear as to whether automatic processor customization tools can cope with this increased complexity. The recent introduction of new techniques addressing the automatic customization, such as Architecturally Visible Storage (AVS) memory-enhanced Instruction Set Extension (ISE) identification algorithms, has also created new challenges. AVS memories are distinct from the cache hierarchy and rely on Direct Memory Access (DMA) transfers to communicate with main memory. In an embedded system containing hardware-managed caches, these extra AVS memories, in combination with their corresponding DMA transfers, cause coherence and consistence problems. Although the problems of coherence and consistence are well known in multi-processor systems, conventional solutions may be expensive in terms of area and power consumption, rendering them unacceptable for use in embedded systems. This thesis presents two low cost coherence mechanisms that solve these two problems. The first mechanism addresses embedded systems that already contain a hardware coherence protocol, like many high-end embedded multi-processor systems. Traditionally, the DMA transactions are transparent to the hardware coherence protocol. By ensuring visibility of these DMA transactions to the hardware coherence protocol, coherence can be guaranteed between AVS memories and data cache(s). As a result, minor changes to the DMA engine are required. Moreover, by forcing the processor pipeline to stall if a DMA transfer is active, memory consistence can be guaranteed. This mechanism provides significant speedup when compared to the execution of a non-ISE-enhanced system; however, due to the increase in bus traffic, this speedup comes at the expense of an increase in energy consumption. Coherent and Speculative DMA are both implementations of this mechanism. Single-processor systems do not contain hardware coherence protocols, and would therefore benefit from a lower-cost solution to the coherence and consistence problems than a hardware coherence protocol. By tightly coupling the AVS memories to the hardware cache, coherence and consistence for the complete system can be guaranteed. This coupling requires insignificant changes to the hardware cache's
David Atienza Alonso, Marina Zapater Sancho, Luis Maria Costero Valero, Darong Huang, Qunyou Liu
Adam Shmuel Teman, Robert Giterman