Concept

Floating-point error mitigation

Résumé
Floating-point error mitigation is the minimization of errors caused by the fact that real numbers cannot, in general, be accurately represented in a fixed space. By definition, floating-point error cannot be eliminated, and, at best, can only be managed. Huberto M. Sierra noted in his 1956 patent "Floating Decimal Point Arithmetic Control Means for Calculator": Thus under some conditions, the major portion of the significant data digits may lie beyond the capacity of the registers. Therefore, the result obtained may have little meaning if not totally erroneous. The Z1, developed by Konrad Zuse in 1936, was the first computer with floating-point arithmetic and was thus susceptible to floating-point error. Early computers, however, with operation times measured in milliseconds, were incapable of solving large, complex problems and thus were seldom plagued with floating-point error. Today, however, with supercomputer system performance measured in petaflops, floating-point error is a major concern for computational problem solvers. The following sections describe the strengths and weaknesses of various means of mitigating floating-point error. Though not the primary focus of numerical analysis, numerical error analysis exists for the analysis and minimization of floating-point rounding error. Error analysis by Monte Carlo arithmetic is accomplished by repeatedly injecting small errors into an algorithm's data values and determining the relative effect on the results. Extension of precision is the use of larger representations of real values than the one initially considered. The IEEE 754 standard defines precision as the number of digits available to represent real numbers. A programming language can include single precision (32 bits), double precision (64 bits), and quadruple precision (128 bits). While extension of precision makes the effects of error less likely or less important, the true accuracy of the results are still unknown. Variable length arithmetic represents numbers as a string of digits of variable length limited only by the memory available.
À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.