Round-off errorIn computing, a roundoff error, also called rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic. Rounding errors are due to inexactness in the representation of real numbers and the arithmetic operations done with them. This is a form of quantization error.
Double-precision floating-point formatDouble-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. Floating point is used to represent fractional values, or when a wider range is needed than is provided by fixed point (of the same bit width), even if at the cost of precision. Double precision may be chosen when the range or precision of single precision would be insufficient.
Floor and ceiling functionsIn mathematics and computer science, the floor function is the function that takes as input a real number x, and gives as output the greatest integer less than or equal to x, denoted ⌊x⌋ or floor(x). Similarly, the ceiling function maps x to the least integer greater than or equal to x, denoted ⌈x⌉ or ceil(x). For example (floor), ⌊2.4⌋ = 2, ⌊−2.4⌋ = −3, and for ceiling; ⌈2.4⌉ = 3, and ⌈−2.4⌉ = −2. Historically, the floor of x has been–and still is–called the integral part or integer part of x, often denoted [x] (as well as a variety of other notations).
Fixed-point arithmeticIn computing, fixed-point is a method of representing fractional (non-integer) numbers by storing a fixed number of digits of their fractional part. Dollar amounts, for example, are often stored with exactly two fractional digits, representing the cents (1/100 of dollar). More generally, the term may refer to representing fractional values as integer multiples of some fixed small unit, e.g. a fractional amount of hours as an integer multiple of ten-minute intervals.
Quadruple-precision floating-point formatIn computing, quadruple precision (or quad precision) is a binary floating point–based computer number format that occupies 16 bytes (128 bits) with precision at least twice the 53-bit double precision. This 128-bit quadruple precision is designed not only for applications requiring results in higher than double precision, but also, as a primary function, to allow the computation of double precision results more reliably and accurately by minimising overflow and round-off errors in intermediate calculations and scratch variables.
Quantization (signal processing)Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all lossy compression algorithms.
Floating-point arithmeticIn computing, floating-point arithmetic (FP) is arithmetic that represents subsets of real numbers using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. Numbers of this form are called floating-point numbers. For example, 12.345 is a floating-point number in base ten with five digits of precision: However, unlike 12.345, 12.3456 is not a floating-point number in base ten with five digits of precision—it needs six digits of precision; the nearest floating-point number with only five digits is 12.
ModuloIn computing, the modulo operation returns the remainder or signed remainder of a division, after one number is divided by another (called the modulus of the operation). Given two positive numbers a and n, a modulo n (often abbreviated as a mod n) is the remainder of the Euclidean division of a by n, where a is the dividend and n is the divisor. For example, the expression "5 mod 2" would evaluate to 1, because 5 divided by 2 has a quotient of 2 and a remainder of 1, while "9 mod 3" would evaluate to 0, because 9 divided by 3 has a quotient of 3 and a remainder of 0; there is nothing to subtract from 9 after multiplying 3 times 3.
Machine epsilonMachine epsilon or machine precision is an upper bound on the relative approximation error due to rounding in floating point arithmetic. This value characterizes computer arithmetic in the field of numerical analysis, and by extension in the subject of computational science. The quantity is also called macheps and it has the symbols Greek epsilon . There are two prevailing definitions. In numerical analysis, machine epsilon is dependent on the type of rounding used and is also called unit roundoff, which has the symbol bold Roman u.
Significance arithmeticSignificance arithmetic is a set of rules (sometimes called significant figure rules) for approximating the propagation of uncertainty in scientific or statistical calculations. These rules can be used to find the appropriate number of significant figures to use to represent the result of a calculation. If a calculation is done without analysis of the uncertainty involved, a result that is written with too many significant figures can be taken to imply a higher precision than is known, and a result that is written with too few significant figures results in an avoidable loss of precision.