Publication

Energy-Aware Processing Platform Exploration for Embedded Biosignal Analysis

Ahmed Yasir Dogan
2013
Thèse EPFL
Résumé

According to the World Health Organization, lifestyle-related diseases, e.g., cardiovascular diseases are the major cause of mortality worldwide. An accurate and continuous medical supervision is highly required for diagnosis and treatment of such diseases. Our traditional healthcare delivery systems, however can’t cope with consequential increasing healthcare costs and medical management needs. Personal health monitoring systems are poised to offer large-scale and cost-effective solutions to this problem. The use of wearable, miniaturized and autonomous wireless sensor nodes, featuring continuous on-node analysis of biosignals, can indeed provide ambulatory long-term and real-time monitoring required by the patients, and enables faster coordination with medical personnel. In such autonomous nodes, due to very limited available energy resources and costly wireless transmission, an ultra-low-power (ULP) on-node processing platform for advanced biosignal analysis is crucial. In this thesis, I explore ULP processing architectures for on-node biosignal analysis applications; where commonly, moderately complex arithmetic manipulations on single- or multiple- input signals are carried out. To achieve energy efficiency while providing sufficient processing capability to apply advanced biosignal analysis, in this thesis near-threshold (near-Vt h ) computing is exploited. Hence, severe performance degradation and reliability issues, occurring at deeply scaled voltages, can be avoided. In Chapter 3, I introduce a near-Vth computing single-core architecture, consisting of a ULP core, an instruction memory (IM) and a data memory (DM). The ULP core features an instruction set architecture (ISA) customized for biosignal applications. I explore that an ISA with minimal instruction set achieves considerable energy savings compared to the state-of-the-art cores, when executing biosignal applications (i.e., up to 54% compared to an established ISA). The proposed single-core architecture accomplishes high energy efficiency for most of single-input biosignal analysis applications, since it fully exploits near-Vth computing. However, the single-core architecture achieves limited voltage scaling, hence reduced energy awareness, for most of multiple-input biosignal analysis applications, where computational workload requirements are such high that the single-core architecture can’t attain these throughputs in near-Vth regime. To alleviate the performance degradation issue that prevents the single-core architecture from exploiting near-Vt h computing typically for multiple-input biosignal analysis, I propose parallel processing of biosignals on multi-core architectures. To this end, In Chapter 4, a multiple instruction, multiple data (MIMD) multi-core architecture is introduced. The MIMD architecture comprises several ULP cores, individual IMs, and a multi-bank DM shared through a lightweight interconnect between the cores and the DM. I prove that parallel processing of multiple-input biosignals leads to better energy efficiency than the sequential processing (i.e., on a single-core) for moderate and high biosignal workloads. In particular, the MIMD architecture achieves up to 62% power savings with respect to the single-core architecture for high biosignal workloads (i.e., 167 MOps/s). On the other hand, parallel processing of multiple-input biosignals can be penalized at low workloads due to high leakage power dissipation in multi-core architectures. In particular, the MIMD architecture fails against the single-core architecture in terms of energy efficiency for workloads lighter than 1.7 MOps/s. One of the major burden of power dissipation in MIMD architectures is costly multiple instruction fetch. To mitigate this issue, I propose data-level parallelism through single instruction, multiple data (SIMD) paradigm. To this end, in Chapter 4 a novel hybrid multi-core architecture, that supports SIMD and MIMD operations, is introduced. The SIMD operations, coupled with data and instruction broadcasting, enable coordinated multiple accesses to memories, hence reduced instruction fetch power. Additionally, the hybrid multi-core architecture features partial power gating of memories to achieve leakage power savings, vital at low workloads (a few 100 kOps/s). I show that SIMD processing of multiple-input biosignals leads to better energy efficiency compared to the MIMD processing. In particular, when SIMD operations are exploited, the hybrid multi-core architecture achieves up to 45.7% power saving compared to the MIMD architecture for moderate biosignal workloads. I also ascertain that partial power gating of memories is an effective technique to alleviate leakage issue in multi-core architectures. More specifically, partial power gating of the IM in the hybrid multi-core architecture leads to 38.8% power saving at low workloads. Finally, to alleviate issues with applications involving such program parts that limit SIMD execution of applications (i.e., conditional program parts), I propose to resynchronize the cores for stable lockstep code execution in case of synchronization loss. Hence, SIMD operations are exploited even for applications with conditional program parts. To this end, in Chapter 4 a lightweight software-directed hardware synchronizer is introduced. I reveal that for applications with conditional program parts, lockstep SIMD execution accomplishes up to 64% power saving with respect to the elementary SIMD execution at moderate workloads (i.e.,89 MOps/s).

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Concepts associés (48)
Single instruction multiple data
Single Instruction on Multiple Data (signifiant en anglais : « instruction unique, données multiples »), ou SIMD, est une des quatre catégories d'architecture définies par la taxonomie de Flynn en 1966 et désigne un mode de fonctionnement des ordinateurs dotés de capacités de parallélisme. Dans ce mode, la même instruction est appliquée simultanément à plusieurs données pour produire plusieurs résultats.
Instruction set architecture
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation. In general, an ISA defines the supported instructions, data types, registers, the hardware support for managing main memory, fundamental features (such as the memory consistency, addressing modes, virtual memory), and the input/output model of a family of implementations of the ISA.
Intel Core
Intel Core is a line of streamlined midrange consumer, workstation and enthusiast computer central processing units (CPUs) marketed by Intel Corporation. These processors displaced the existing mid- to high-end Pentium processors at the time of their introduction, moving the Pentium to the entry level. Identical or more capable versions of Core processors are also sold as Xeon processors for the server and workstation markets. The lineup of Core processors includes the Intel Core i3, Intel Core i5, Intel Core i7, and Intel Core i9, along with the X-series of Intel Core CPUs.
Afficher plus
Publications associées (317)

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

David Atienza Alonso, Giovanni Ansaloni, Alireza Amirshahi

The increasing complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption. Hardware acceleration tackles the ensuing challenges by designing processors and accelerators tailored for t ...
2024

EdgeAI-Aware Design of In-Memory Computing Architectures

Marco Antonio Rios

Driven by the demand for real-time processing and the need to minimize latency in AI algorithms, edge computing has experienced remarkable progress. Decision-making AI applications stand out for their heavy reliance on data-centric operations, predominantl ...
EPFL2024

Exploring High-Performance and Energy-Efficient Architectures for Edge AI-Enabled Applications

Joshua Alexander Harrison Klein

The desire and ability to place AI-enabled applications on the edge has grown significantly in recent years. However, the compute-, area-, and power-constrained nature of edge devices are stressed by the needs of the AI-enabled applications, due to a gener ...
EPFL2024
Afficher plus

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.