This is an annotated bibliography on the topic of approximate computing. It's a living document meant to exhaustively catalog everything we know about approximation along with the earlier work that influenced it. It's also a collaborative, open-source project: to contribute, see its home on GitHub.
You can also download the BibTeX citation database, read a PDF version of this document, or view the raw Markdown.
Here's the definition of approximate computing that this document uses:
Approximate computing is the idea that computer systems can let applications trade off accuracy for efficiency. It includes any technique where the system intentionally exposes incorrectness to the application layer in return for conserving some resource.
That definition is clearly broad enough to include many ideas that have existed since the beginning of (computational) time. Floating-point numbers, for example, approximate real-number arithmetic to save space and time over arbitrary-precision numerical representation. This document focuses on the study of approximate computing in general and system-level techniques that apply this theory to create new trade-offs.
There are two main research directions in approximate computing, corresponding to the two main sections in this annotated bibliography. This first is on techniques for approximation: specific strategies for exploiting resilience in applications for efficiency gains. The second is on programming approximate systems: assuming that approximation techniques exist, a host of new programmability problems arise.
This section enumerates techniques for implementing approximation. There are three main categories: approximation in computer architecture (i.e., computation and storage hardware), approximation in software via program transformation, and approximation elsewhere (e.g., networks).
This section deals with hardware-oriented approximation techniques. We categorize the techniques according to the hardware component they affect.
One straightforward strategy for approximation in floating-point units is to dynamically adapt mantissa width [133, 149].
A paper by proposes fuzzy memoization for FPUs [4]. The idea is to store previously-computed results, as in ordinary memoization, but also to provide a “match” even when inputs are merely close to other, previously-seen inputs. (Fuzzy memoization comes up elsewhere in approximate computing too.)
To facilitate voltage overscaling techniques for approximation, some work designs functional units that are more resilient to timing errors than traditional, precise designs [49]. Other work extends this graceful voltage–error scaling to coarser computational blocks [91].
The above paragraph must be missing other work on voltage-overscaling-tolerant units. .
Alternative number representations work in tandem with relaxed functional units to bound the numerical error that can result from bit flips [125].
A body of VLSI work has designed approximate adders, which are allowed to yield incorrect results for some minority of input combinations [45, 46, 54, 57, 84, 118, 140, 143, 148, 154]. propose metrics for evaluating these adders [72].
Should we break down the adder work into finer categories? Also, there is now more work multipliers that deserves its own paragraph .
Several categories of work apply approximation to memory technologies. The general idea is to spend less energy on retaining or accessing data; in return, there is a small probability that bits will flip in the memory.
SRAM structures spend significant static power on retaining data, so they represent another opportunity for fidelity trade-offs [21, 63, 120].
Similarly, DRAM structures can reduce the power spent on refresh cycles where bit flips are allowed [74, 77].
In persistent memories where storage cells can wear out, approximate systems can reduce the number of bits they flip to lengthen the useful device lifetime [41]. Similarly, low-power writes to memories like flash can exploit its probabilistic properties while hiding them from software [73, 108, 134].
. [112]
Spintronic memories exhibit similarly favorable trade-offs between access cost and error [101].
A broad category of work has proposed general techniques for making quality trade-offs when synthesizing and optimizing general hardware circuits [9, 12, 83, 97, 100, 136, 137, 147]. Other tools focus on analyzing approximate circuit designs [135, 139].
Near-threshold voltage domains also present a new opportunity for embracing unpredictable circuit operation [56].
propose to place and route processor designs with paths that do not exhibit a “cliff” where voltage scaling causes catastrophic failures [53]. The original idea there was for so-called better-than-worst-case (BTWC) designs such as Razor [35], not approximate computing, but the connection to voltage-overscaling architectures such as Truffle [37] is clear.
Need more citations on voltage overscaling here. .
As a dual to adding errors in some circuits, some researchers have explored differential fault protection in the face of universally unreliable circuits. As process sizes continue to shrink, it is likely that reliable transistors will become the minority; redundancy and checking will be necessary to provide reliable operation [68]. Circuit design techniques have been proposed that reduce the cost of redundancy by providing it selectively for certain instructions in a CPU [128], certain blocks in a DSP [5, 47, 55], or to components of a GPU [95]. Other work has used criticality information to selectively allocate software-level error detection and correction resources [32, 59, 119].
Microarchitectural mechanisms can exploit different opportunities from circuit-level techniques. Specifically, “soft coherence” relaxes intercore communication [76], and load value approximation [85, 132] approximates numerical values instead of fetching them from main memory on cache misses.
Recent work has proposed system organizations that apply approximation at a coarser grain. One set of techniques uses external monitoring to allow errors even in processor control logic [151, 152]. Other approaches compose separate processing units with different levels of reliability [65]. Duwe [33] proposes run-time coalescing of approximate and precise computations to reduce the overhead of switching between modes. Other work allocates approximation among the lanes of a SIMD unit [1]. In all cases, the gains from approximation can be larger than for lower-level techniques that affect individual operations. As the granularity principle from outlines, techniques like these that approximate entire computations, including control flow, have the greatest efficiency potential.
Stochastic computing is an alternative computational model where values are represented using probabilities [8, 20, 27, 79, 93, 94, 141]. For example, a wire could carry a random sequence of bits, where the wire's value corresponds to the probability that a given bit is a 1. Multiplication can be implemented in this model using a single and gate, so simple circuits can be low-power and area-efficient. A persistent challenge in stochastic circuits, however, is that reading and output value requires a number of bits that is exponential in the value's magnitude. Relaxing this constraint represents an opportunity for an time–accuracy trade-off.
Aside from hardware-level accuracy trade-offs, there are opportunities for adapting algorithms to execute with varying precision. Algorithmic quality–complexity trade-offs are not new, but recent work has proposed tools for automatically transforming programs to take advantage of them. Transformations include removing portions of a program's dynamic execution (termed code perforation) [121], unsound parallelization of serial programs [86], eliminating synchronization in parallel programs [82, 89, 102, 103], identifying and adjusting parameters that control output quality 51], randomizing deterministic programs [[88, 155], dynamically choosing between different programmer-provided implementations of the same specification [6, 7, 10, 39, 138, 142], and replacing subcomputations with invocations of a trained neural network [36].
Some work on algorithmic approximation targets specific hardware: notably, general-purpose GPUs [44, 109, 110, 114]. In a GPU setting, approximation strategies benefit most by optimizing for memory bandwidth and control divergence.
Recently, a research direction has developed in automated program repair and other approaches to heuristically patching software according to programmer-specified criteria. These techniques are typically approximate in that they abandon a traditional compiler's goal of perfectly preserving the original program's semantics. Notably, Schulte et al. [116] propose to use program evolution to optimize for energy.
Precimonious [107] addresses the problem of choosing appropriate floating-point widths, which amount to a trade-off between numerical accuracy and space or operation cost. Similarly, STOKE's floating-point extension [115] synthesizes new versions of floating-point functions from scratch to meet different accuracy requirements with optimal efficiency.
Neural acceleration is a recent technique that treats code as a black box and transforms it into a neural network [25, 36, 80, 129]. It is, at its core, an algorithmic transformation, but it integrates tightly with hardware support: a digital accelerator 36], analog circuits [[124], FPGAs [92], GPUs [44], or, recently, new analog substrates using resistive memory [67] or memristors [75].
While architecture optimizations and program transformations dominate the field of proposed exploitations of approximate software, some recent work has explored the same trade-off in other components of computer systems.
Network communication, with its reliance on imperfect underlying channels, exhibits opportunities for fidelity trade-offs [52, 78, 117, 126]. Notably, SoftCast [52] transmits images and video by making the signal magnitude directly proportional to pixel luminance. BlinkDB, a recent instance of research on approximate query answering, is a database system that can respond to queries that include a required accuracy band on their output [2]. Uncertain\<T> [13] and Lax [127] propose to expose the probabilistic behavior of sensors to programs. In a distributed system or a supercomputer, approximation techniques can eschew redundancy and recovery for efficiency [50].
This work tends to assume an existing, domain-specific notion of “quality” for each application. As the principle in suggests, these quality metrics need careful consideration: one quality metric is not necessarily just as good as another. Recent work has proposed guidelines for rigorous quality measurement [3].
Recently, language constructs that express and constrain approximation have become a focus in the programming-languages research community. Relax 32] is a language with ISA support for tolerating architectural faults in software. Rely [18] uses specifications that relate the reliability of the input to an approximate region of code to its outputs. [
A related set of recent approximate-programming tools attempt to adapt a program to meet accuracy demands while using as few resources as possible. Chisel 90] is an extension to Rely that searches for the subset of operations in a program that can safely be made approximate. ExpAX [[38] finds safe-to-approximate operations automatically and uses a metaheuristic to find which subset of them to actually approximate.
Some other programming systems that focus on energy efficiency include approximation ideas: Eon [123] is a language for long-running embedded systems that can drop tasks when energy resources are low, and the Energy Types language [29] incorporates a variety of strategies for expressing energy requirements.
Aside from programming languages, separate programmer tools can help analyze and control the effects of approximation.
A quality-of-service profiler helps programmers identify parts of programs that may be good targets for approximation techniques [87]. Conversely, debugging tools can identify components where approximation is too aggressive [104]. Some verification tools and proof systems help the programmer prove relationships between the original program and a candidate relaxed version [15–17, 144].
As an alternative to statically bounding errors, dynamic techniques can monitor quality degradation at run time. The critical challenge for these techniques is balancing detection accuracy with the added cost, which takes away from the efficiency advantages of approximation. Some work has suggested that programmers can provide domain-specific checks on output quality [43, 104]. Recent work has explored automatic generation of error detectors [58]. A variety of techniques propose mechanisms for run-time or profiling feedback to adapt approximation parameters [7, 10, 51, 153].
One specific research direction, probabilistic programming languages, focuses on expressing statistical models, especially for machine learning [11, 19, 42, 60, 61, 96, 113, 145]. The goal is to enable efficient statistical inference over arbitrary models written in the probabilistic programming language.
Earlier work examines the semantics of probabilistic behavior in more traditional programming models [62]. Similarly, the probability monad captures a variable's discrete probability distribution in functional programs [99]. Statistical model checking tools can analyze programs to prove statistical properties [64, 66]. Recently, Bornholt et al. [13] proposed a construct for explicitly representing probability distributions in a mainstream programming language.
As the studies in Section [sec:related:studies] repeatedly find, error tolerance varies greatly in existing software, both within and between programs. Independent of approximate computing, programming-languages researchers have sought to identify and enhance error resilience properties.
SJava analyzes programs to prove that errors only temporarily disrupt the execution path of a program [34]. Program smoothing [22–24] and robustification [122] both find continuous, mathematical functions that resemble the input–output behavior of numerical programs. Auto-tuning approaches can help empirically identify error-resilient components [105]. Finally, Cong and Gururaj describe a technique for automatically distinguishing between critical and non-critical instructions for the purpose of selective fault tolerance [30].
This category of proto-approximate-computing work focuses on analyzing applications to measure their resilience to error. These papers typically assume a particular model of error—often hardware-inspired, such as random bit flips in memory—and execute programs under simulation, measuring crashes and output-quality degradation. To measure output quality, these studies typically define a straightforward metric for each application, such as PSNR for media outputs.
Summarize the papers in the next paragraph. .
Three papers by Li and Yeung in 2006–08 [69–71]; other papers that need summaries [26, 48, 81, 106, 130]. A 2009 study in SELSE, 31], precedes the authors' later work on software-directed fault recovery [[32].
One category of studies focuses on specific application domains. Wong and Horowitz identify resilience specifically in probabilistic-inference applications [146]. 40] address video applications, and [150] address physical simulation for animation. Other studies have focused on integrated circuit designs rather than software applications [[14, 28].
LLFI is a tool based on LLVM for performing this kind of simulation by injecting errors at the compiler-IR level [131].
Some of these studies conclude that there is a useful distinction between critical and non-critical program points, typically instructions [48, 130, 131]. This conclusion is borne out in later work on systems that exploit this distinction [74, 111].
Recent work in security has exploited patterns in these variability-based errors in DRAM to deanonymize users [98].