**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

Theo Lasser, Paul James Marchand, Marcin Antoni Sylwestrzak, Daniel Pawel Szlag

*Elsevier, *2017

Article

Article

Résumé

We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Program summary Program title: CudaOCMproc Catalogue identifier: AFBT_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 x faster in comparison to the CPU data processing time) (C) 2017 Published by Elsevier B.V.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (14)

Tomographie en cohérence optique

vignette|Image OCT d'un sarcome
La tomographie en cohérence optique ou tomographie optique cohérente (TCO ou OCT) est une technique d' bien établie qui utilise une onde lumineuse pour capturer des i

Tomographie

vignette|Principe de base de la tomographie par projections : les coupes tomographiques transversales S1 et S2 sont superposées et comparées à l’image projetée P.
La tomographie est une technique d’,

MATLAB

MATLAB (« matrix laboratory ») est un langage de script émulé par un environnement de développement du même nom ; il est utilisé à des fins de calcul numérique. Développé par la société The MathWorks,

Publications associées (2)

Chargement

Chargement

For the last thirty years, electronics, at first built with discrete components, and then as Integrated Circuits (IC), have brought diverse and lasting improvements to our quality of life. Examples might include digital calculators, automotive and airplane control assistance, almost all electrical household appliances, and the almost ubiquitous Personal Computer. Application-Specific Integrated Circuits (ASICs) were traditionally used for their high performance and low manufacturing cost, and were designed specifically for a single application with large volumes. But as lower product lifetimes and the pressures of fast marketing increased, ASICs' high design cost pushed for their replacement by Microprocessors. These processors, capable of implementing any functionality through a change in software, are thus often called General Purpose Processors. General purpose processors are used for everyday computing tasks, and found in all personal computers. They are also often used as building blocks for scientific supercomputers. Superscalar processors such as these require ever more processing power to run complex simulations, video games or versatile telecoms services. In the case of embedded applications, e.g. for portable devices, both performance and power consumption must be taken into account. In a bid to adapt a processor to some extent to select applications, fully reconfigurable logic can greatly improve the performance of a processor, since it is shaped for the best possible execution with the available resources. However, as reconfigurable logic is far slower than custom logic, this gain is possible only for some specific applications with large parallelism, after a detailed study of the algorithm. Even though this process can be automated, it still requires large computing resources, and cannot be performed at run time. To reduce the loss in speed compared to custom logic, it is possible to limit the reconfigurability to increase the breadth of applications where performance can be improved. However, as the application space increases, a careful analysis and design of the reconfigurability is required to minimize the speed loss, notably when dynamic reconfiguration is considered. As a case study, we analyze the feasibility of adding limited reconfigurability to the Floating Point Units (FPUs) of a general purpose processor. These rather large units execute all floating point operations, and may also be used for integer multiplication. If an application contains few or infrequent instructions that must be executed by the FPU, this idle hardware only increases power consumption without enhancing performance. This is often the case in non-scientific applications and even many recent and detailed video games which make heavy use of hardware display accelerators for 3D graphics. In a fast multiplier such as can be found in the FPU of a high performance processor, the logic to perform multiplication is a large tree of compressors to add all the partial products together. It is possible to add logic to allow the reconfiguration of part of this tree as several extra Arithmetic and Logic Units (ALU). This requires a detailed timing analysis for both the reconfigurable FPU and the extra ALUs, taking into account effects such as added wires and longer critical paths. Finally, the algorithm to decide when and how to reconfigure must be studied, in terms of eciency and complexity. The results of adding this limited reconfigurability to a mainstream superscalar processor over a large set of compute intensive benchmarks show gains of up to 56% in the best case, with an average gain of 11%. The application to an idealized huge top processor still shows slightly positive average gains, as the limits of available parallelism are reached, bounded by both the application and many of the characteristics of the processor. In all cases, binary compatibility is maintained, allowing the re-use of all existing software. We show that adding limited reconfigurability to a general purpose superscalar processor can produce interesting gains over a wide range of applications while maintaining binary compatibility, and without large modifications to the original design. Limited reconfigurability is worthwhile as it increases the design space, allowing gains to apply to a larger set of applications. These gains are achieved through careful study and optimization of the reconfigurable logic and the decision algorithm.

Tristan Bolmont, Arno Pino Bouwens, Theo Lasser, Daniel Pawel Szlag

Optical coherence tomography (OCT) and optical coherence microscopy (OCM) allow the acquisition of quantitative three-dimensional axial flow by estimating the Doppler shift caused by moving scatterers. Measuring the velocity of red blood cells is currently the principal application of these methods. In many biological tissues, blood flow is often perpendicular to the optical axis, creating the need for a quantitative measurement of lateral flow. Previous work has shown that lateral flow can be measured from the Doppler bandwidth, albeit only for simplified optical systems. In this work, we present a generalized model to analyze the influence of relevant OCT/OCM system parameters such as light source spectrum, numerical aperture and beam geometry on the Doppler spectrum. Our analysis results in a general framework relating the mean and variance of the Doppler frequency to the axial and lateral flow velocity components. Based on this model, we present an optimized acquisition protocol and algorithm to reconstruct quantitative measurements of lateral and axial flow from the Doppler spectrum for any given OCT/OCM system. To validate this approach, Doppler spectrum analysis is employed to quantitatively measure flow in a capillary with both extended focus OCM and OCT.