Most modern FPGA arithmetic designs use bit-parallel binary arithmetic – typically two’s complement for signed computation. This generally makes for fast arithmetic, but has the distinct disadvantage that silicon area scales with precision of computation. Occasionally – much more often in the distant past when FPGA area was a precious resource – people compute with bit serial binary arithmetic. In this case, time scales with precision.

One problem with digit serial binary arithmetic for multiplication and division is that you need to know, in advance, the location of your least significant digit: while precision unfolds as time, precision is fixed a priori.

Milos Ercegovac‘s online arithmetic forms an interesting counterpoint to this: in this arithmetic, digits are produced most-significant-digit-first. This suggests that the longer we compute, the more precise our answer will be – and we can terminate whenever we’ve got a good enough answer. Or run out of time. The problem with this approach is that the hardware arithmetic units generally implemented for digit-serial multiplication or division still scale with precision requirements, placing an unwanted a priori bound on computational precision.

Enter my former BEng project student Aaron Zhao, who presented our paper “An Efficient Implementation of Online Arithmetic” at the IEEE International Conference on Field-Programmable Technology in Xi’an, China. Aaron’s contribution – for his BEng final year project – was to design a library of online arithmetic units whose logic area is constant with precision (of course, they still need more RAM for more precision.) This opens up a lot of practical possibilities for our research.

Precision (and energy) can scale elastically with time in FPGA-based compute.

From a software engineer’s pov (game developer), something that scares me about this is how hard it would be to debug when problems come up and to make sure I’m getting adequate test coverage over all possibilities. Imagine this is on a GPU where I have some sort of temporal coherence from frame to frame. If some of the frames data comes in at different precisions, and i mix it into that buffer, i’m going to have different results, possibly even generating NaN’s and Inf’s that only reproduce when power / precision is limited in specific ways on specific frames.

Besides that, this is a really neat idea. If that can be protected against (either in software or through some other means), i’d hop on board this train (:

One use case where this issue disappears is where you’re not after the use result itself, but rather a predicate that depends on the result. Then you could compute for long enough to guarantee a correct predicate evaluation, which is much easier to debug than current standard practice of basing the predicate on approximate numerical values, often with no guarantee of accuracy.

In the wider context, debugging numerical software (or hardware) is a very nontrivial activity, of course. No matter what finite precision representation is used. It’s one of the reasons I’m working on formal methods for this, e.g. https://constantinides.net/2016/11/08/rounding-error-and-algebraic-geometry/

From a software engineer’s pov (game developer), something that scares me about this is how hard it would be to debug when problems come up and to make sure I’m getting adequate test coverage over all possibilities. Imagine this is on a GPU where I have some sort of temporal coherence from frame to frame. If some of the frames data comes in at different precisions, and i mix it into that buffer, i’m going to have different results, possibly even generating NaN’s and Inf’s that only reproduce when power / precision is limited in specific ways on specific frames.

Besides that, this is a really neat idea. If that can be protected against (either in software or through some other means), i’d hop on board this train (:

LikeLike

One use case where this issue disappears is where you’re not after the use result itself, but rather a predicate that depends on the result. Then you could compute for long enough to guarantee a correct predicate evaluation, which is much easier to debug than current standard practice of basing the predicate on approximate numerical values, often with no guarantee of accuracy.

In the wider context, debugging numerical software (or hardware) is a very nontrivial activity, of course. No matter what finite precision representation is used. It’s one of the reasons I’m working on formal methods for this, e.g. https://constantinides.net/2016/11/08/rounding-error-and-algebraic-geometry/

LikeLike