Multiply–accumulate operation

Content sourced from Wikipedia, licensed under CC BY-SA 3.0.

Multiply–accumulate (MAC) is a common computing operation: you multiply two numbers and add the product to an accumulator, written as a = a + (b × c). A MAC unit in hardware combines a multiplier, an adder, and a running total register so the product can be added to the total in one cycle. This is much faster than older methods that built a product bit by bit.

When using integers, the MAC result is exact up to the chosen word size (often modulo a power of two). Floating-point numbers, however, have limited precision, and floating-point arithmetic is not fully associative or distributive. This means the way you perform the multiply and add can affect the final result due to rounding.

A fused multiply–add (FMA) performs the whole expression a + (b × c) in one operation with a single rounding. This is typically faster and more accurate than doing the multiply, then rounding, then adding, then rounding again. IEEE 754-2008 standardizes FMA to use one rounding. Some hardware can execute multiple FMAs together, such as dot products across several SIMD lanes in a single cycle.

FMAs are powerful for many workloads, including vector and matrix operations, while avoiding extra rounding errors. But they can produce surprising results in edge cases, such as when squaring numbers that are very close, if the intermediate product is rounded before the final sum. So, while FMAs speed up computation and often improve accuracy, they require careful use in some numerical contexts.

In hardware, using FMAs can also enable efficient software implementations of division and square roots, reducing the need for separate hardware units. Many processors and GPUs support FMA, and some offer multiple FMAs at once for high-throughput work.

Standards and languages reflect FMA support. The C standard includes the fma() function, and modern compilers (like GCC and Clang) often contract a*b + c into fma(a, b, c) when the target architecture supports it. OpenCL provides mad as a similar, sometimes lower-accuracy option. Some OpenCL profiles allow reduced accuracy for performance, and some GPU instruction sets expose a non-FMA variant (MAD) that truncates the product before adding.

A few historical notes: the concept of MAC dates back to early machines like Percy Ludgate’s Analytical Machine in 1909. The first modern devices to use MAC were digital signal processors, but today MAC units are common in many CPUs and GPUs. Some architectures implement fast, truncated MAD instructions for speed, trading away precision. For example, older VAX POLY-like sequences perform multiply–add steps with extended precision internally, then round to the target format, which is not the same as a true IEEE FMA.

Key points
- MAC: a = a + (b × c); used for fast accumulation of products.
- FMA: a single, fused operation with one final rounding, offering speed and accuracy benefits.
- Floating-point caveat: order of operations and rounding matter; FMA helps but can have edge-case behavior.
- Uses: vector/math workloads, dot products, SIMD, and speeding up division or square roots in some designs.
- Standards and tools: IEEE 754-2008, fma() in C, compiler support for contracting multiplications and additions, and OpenCL options for reduced-accuracy MAD.

This page was last edited on 2 February 2026, at 03:45 (CET).