Concept# Quadruple-precision floating-point format

Summary

In computing, quadruple precision (or quad precision) is a binary floating point–based computer number format that occupies 16 bytes (128 bits) with precision at least twice the 53-bit double precision.
This 128-bit quadruple precision is designed not only for applications requiring results in higher than double precision, but also, as a primary function, to allow the computation of double precision results more reliably and accurately by minimising overflow and round-off errors in intermediate calculations and scratch variables. William Kahan, primary architect of the original IEEE-754 floating point standard noted, "For now the 10-byte Extended format is a tolerable compromise between the value of extra-precise arithmetic and the price of implementing it to run fast; very soon two more bytes of precision will become tolerable, and ultimately a 16-byte format ... That kind of gradual evolution towards wider precision was already in view when IEEE Standard 754 for Floating-Point Arit

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related people

No results

Related concepts (9)

Floating-point arithmetic

In computing, floating-point arithmetic (FP) is arithmetic that represents subsets of real numbers using an integer with a fixed precision, called the significand, scaled by an integer exponent of a

Rounding

Rounding means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $with$, the fraction 312/937 with 1/3, or the expre

Hexadecimal

In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal sy

Related courses (5)

MATH-251(a): Numerical analysis

This course presents numerical methods for the solution of mathematical problems such as systems of linear and non-linear equations, functions approximation, integration and differentiation and differential equations.

ME-412: Experimental methods in engineering mechanics

This course gives an advanced treatment of experimental methods in the context of mechanics by way of example. Students will construct significant components of experimental apparatus, use their apparatus to collect data, interpret the resulting data, and write reports on the experiments.

EE-110: Logic systems (for MT)

Ce cours couvre les fondements des systèmes numériques. Sur la base d'algèbre Booléenne et de circuitscombinatoires et séquentiels incluant les machines d'états finis, les methodes d'analyse et de synthèse de systèmelogiques sont étudiées et appliquée

Related units

No results

Related publications (3)

Loading

Loading

Loading

Related lectures (7)

Zvonimir Bujanovic, Daniel Kressner

The Schur decomposition of a square matrix A is an important intermediate step of state-of-the-art numerical algorithms for addressing eigenvalue problems, matrix functions, and matrix equations. This work is concerned with the following task: Compute a (more) accurate Schur decomposition of A from a given approximate Schur decomposition. This task arises, for example, in the context of parameter-dependent eigenvalue problems and mixed precision computations. We have developed a Newton-like algorithm that requires the solution of a triangular matrix equation and an approximate orthogonalization step in every iteration. We prove local quadratic convergence for matrices with mutually distinct eigenvalues and observe fast convergence in practice. In a mixed low-high precision environment, our algorithm essentially reduces to only four high-precision matrix-matrix multiplications per iteration. When refining double to quadruple precision, it often needs only 3-4 iterations, which reduces the time of computing a quadruple precision Schur decomposition by up to a factor of 10-20.

We introduce a multi-dimensional point-wise multi-domain hybrid Fourier-Continuation/WENO technique (FC-WENO) that enables high-order and non-oscillatory solution of systems of nonlinear conservation laws, and essentially dispersionless, spectral, solution away from discontinuities, as well as mild CFL constraints for explicit time stepping schemes. The hybrid scheme conjugates the expensive, shock-capturing WENO method in small regions containing discontinuities with the efficient FC method in the rest of the computational domain, yielding a highly effective overall scheme for applications with a mix of discontinuities and complex smooth structures. The smooth and discontinuous solution regions are distinguished using the multi-resolution procedure of Harten [A. Harten, Adaptive multiresolution schemes for shock computations, J. Comput. Phys. 115 (1994) 319-338]. We consider a WENO scheme of formal order nine and a FC method of order five. The accuracy, stability and efficiency of the new hybrid method for conservation laws are investigated for problems with both smooth and non-smooth solutions. The Euler equations for gas dynamics are solved for the Mach 3 and Mach 1.25 shock wave interaction with a small, plain, oblique entropy wave using the hybrid FC-WENO, the pure WENO and the hybrid central difference-WENO (CD-WENO) schemes. We demonstrate considerable computational advantages of the new FC-based method over the two alternatives. Moreover, in solving a challenging two-dimensional Richtmyer-Meshkov instability (RMI), the hybrid solver results in seven-fold speedup over the pure WENO scheme. Thanks to the multi-domain formulation of the solver, the scheme is straightforwardly implemented on parallel processors using message passing interface as well as on Graphics Processing Units (GPUs) using CUDA programming language. The performance of the solver on parallel CPUs yields almost perfect scaling, illustrating the minimal communication requirements of the multi-domain strategy. For the same RMI test, the hybrid computations on a single GPU, in double precision arithmetics, displays five- to six-fold speedup over the hybrid computations on a single CPU. The relative speedup of the hybrid computation over the WENO computations on GPUs is similar to that on CPUs, demonstrating the advantage of hybrid schemes technique on both CPUs and GPUs. (C) 2013 Elsevier Inc. All rights reserved.

Neuron tree topology equations can be split into two subtrees and solved on different processors with no change in accuracy, stability, or computational effort; communication costs involve only sending and receiving two double precision values by each subtree at each time step. Splitting cells is useful in attaining load balance in neural network simulations, especially when there is a wide range of cell sizes and the number of cells is about the same as the number of processors. For compute-bound simulations load balance results in almost ideal runtime scaling. Application of the cell splitting method to two published network models exhibits good runtime scaling on twice as many processors as could be effectively used with whole-cell balancing.