Publication

A Real-Time Multi-Camera Depth Estimation ASIC with Custom On-Chip Embedded DRAM

Résumé

The capability to process high-resolution videos in real-time is becoming more important in a wide variety of applications such as autonomous vehicles, virtual reality or intelligent surveillance systems. The high-accuracy and complex video processing algorithms needed in these applications led to increased challenges for the system design, due to the amount of computations to be processed instantly. Furthermore, video processing algorithms operate on large amounts of data, but storing this data in dense off-chip memories leads to difficulties to meet bandwidth requirements. Hence, embedded memories are usually required to temporally store data on-chip, close to the processing units. However, on-chip embedded memories often dominate most of the silicon real-estate and power budget of modern video processing system-on-chips. Considering the current trend toward videos of higher resolutions and faster frame rates, these challenges are expected to dramatically increase in the future. One of the most important kernels required in modern video processing systems is the depth perception, since depth information is needed for many advanced video processing algorithms. Depth maps can be created using stereo-matching, which denotes the problem of finding dense correspondences in pairs of images. However, computing high-quality depth maps in real-time, on high-resolution images at high-frame rate is challenging due to the computational complexity of stereo-matching algorithms. Furthermore, their need for large memories and bandwidth limits the performance of depth estimation units, increases their power consumption, and renders them challenging for system integration. In this thesis, we develop task-specific solutions from the algorithmic level to the circuit level that accelerate the computation operations and data transfers, and optimize the on-chip data storage of such depth estimation units. First, we present hardware oriented stereo-matching algorithms and their hardware implementations, tailored to increase parallelism while using only on-chip memory to produce high-quality, high-resolution depth maps. Based on that, we propose a multi-camera depth map estimation ASIC implemented in 28nm, which is capable of computing in real-time up to 2K resolution depth maps at 32fps with up to 256-pixel disparity range using two/three cameras. Our design achieves the highest reported disparity range capability at the lowest power consumption and highest frame rate, while computing high-quality depth maps. It also features a stream-in/out interface for easy integration in existing vision systems. Despite having optimized the complexity of the stereo-matching process, a considerable share of the proposed ASIC area and power budget is consumed by the on-chip memory. To address this issue, we focus next on how data can be stored effectively on-chip. An emerging on-chip memory alternative to conventional SRAM is the logic-compatible GC-eDRAM, due to its high-density, low-power, and inherent two-ported operation. In this thesis, we propose a single-well mixed 3T gain-cell implementation in 28nm FD-SOI. Based on this concept, a custom 24kbit GC-eDRAM macro suitable for modern real-time video processing units was fabricated in 28nm FD-SOI, resulting in the highest density logic-compatible embedded memory reported in the literature, with improved data retention time compared to conventional 3T gain-cells, and lower static power compared to conventional SRAM.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Concepts associés (44)
Système sur une puce
thumb|Puce ARM Exynos sur le smartphone Nexus S de Samsung. Un système sur une puce, souvent désigné dans la littérature scientifique par le terme anglais (d'où son abréviation SoC), est un système complet embarqué sur un seul circuit intégré (« puce »), pouvant comprendre de la mémoire, un ou plusieurs microprocesseurs, des périphériques d'interface, ou tout autre composant nécessaire à la réalisation de la fonction attendue.
Processeur graphique
Un processeur graphique, ou GPU (de l'anglais Graphics Processing Unit), également appelé coprocesseur graphique sur certains systèmes, est une unité de calcul assurant les fonctions de calcul d'image. Il peut être présent sous forme de circuit intégré (ou puce) indépendant, soit sur une carte graphique ou sur la carte mère, ou encore intégré au même circuit intégré que le microprocesseur général (on parle d'un SoC lorsqu'il comporte toutes les puces spécialisées).
Central processing unit
A central processing unit (CPU)—also called a central processor or main processor—is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs). The form, design, and implementation of CPUs have changed over time, but their fundamental operation remains almost unchanged.
Afficher plus
Publications associées (349)

EdgeAI-Aware Design of In-Memory Computing Architectures

Marco Antonio Rios

Driven by the demand for real-time processing and the need to minimize latency in AI algorithms, edge computing has experienced remarkable progress. Decision-making AI applications stand out for their heavy reliance on data-centric operations, predominantl ...
EPFL2024

Multi-Ported GC-eDRAM Bitcell with Dynamic Port Configuration and Refresh Mechanism

Adam Shmuel Teman, Robert Giterman

Embedded memories occupy an increasingly dominant part of the area and power budgets of modern systems-on-chips (SoCs). Multi-ported embedded memories, commonly used by media SoCs and graphical processing units, occupy even more area and consume higher pow ...
MDPI2024

Highly Parallel RTL Simulation

Verification and testing of hardware heavily relies on cycle-accurate simulation of RTL.As single-processor performance is growing only slowly, conventional, single-threaded RTL simulation is becoming impractical for increasingly complex chip designs and s ...
EPFL2024
Afficher plus

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.