Iterating Exactly

I’m very excited to share that my PhD student, He Li, will tomorrow be presenting his paper ARCHITECT: Arbitrary-precision Constant-hardware Iterative Compute at the IEEE International Conference on Field-Programmable Technology 2017 (joint work also with James Davis and John Wickerson.)

Anyone who has done any numerical computation will sooner or later encounter a loop like this:

while( P(x) ) 
  x = f(x);

Where P(x) denotes a predicate determining when the loop will exit, f is a function transforming the state of the loop at each iteration, and x is – critically – a vector of real numbers. Such examples crop up everywhere, for example the Jacobi method, conjugate gradient, etc.

How do people tend to implement such loops? They approximate them by using a finite precision number system like floating point instead of reals.

OK, let’s say you’ve done your implementation. You run for 1000 iterations and still the loop hasn’t quit. Is that because you need to run for a few more iterations? Or is it because you computed in single precision instead of double precision? (Or double instead of quad, etc.) Do you have to throw away all your computation, go back to the first iteration, and try again in a higher precision? Often we just don’t know.

He’s paper solves this problem. As time progresses, we increase both the iteration and the accuracy to which a given iterate is known, snaking through the two-dimensional iteration / precision space, linearising two countably infinite dimensions into the single countably infinite dimension of time (clock cycle) using a trick due to Cantor.

Screen Shot 2017-12-11 at 13.37.24
Linearising iteration (vertical) and precision (horizontal) into time (arrows)

This is the essence of our contribution.

To make it work in practice, efficiently in hardware, requires some tricks. For a start, we need to be able to support arbitrary precision arithmetic on finite computational hardware (only memory space growing with precision, not compute hardware). Secondly, we need to compute from most-significant to least-significant digit, iteratively refining our computation as we proceed. This form of computation is not supported naturally by standard binary arithmetic, but is supported by redundant arithmetic. We make use of online arithmetic to enable this transformation.

So now you don’t need to worry – rounding error will not stop you getting your answer. There’s an FPGA design for that.

Passing Data Structures to FPGAs

Next week, my former PhD student and postdoctoral researcher, Felix Winterstein, will present our paper Pass a Pointer: Exploring Shared Virtual Memory Abstractions in OpenCL Tools for FPGAs at the IEEE International Conference on Field-Programmable Technology in Melbourne, Australia.

Before launching his current startup, Xelera, Felix and I worked together on the problem of automating the production of custom memory systems for FPGA-based accelerators. I previously blogged about some highly novel work we’d done during his PhD on high-level synthesis for code manipulating complex data structures like trees and linked lists. Full detail can be found in the book version of his PhD thesis. All this work – as exciting as it is – was based on sequential C code description as the input format to a high-level synthesis tool.

Many readers of this blog will be aware that OpenCL is rapidly becoming viewed as an alternative way to write correctness-portable code for FPGA development, with both Intel and Xilinx offering OpenCL flows based around OpenCL 1.X. However, OpenCL 2.0 offers a number of interesting features around shared virtual memory which could radically simplify programming, at the cost of making the compiler significantly more complex for FPGA-based computation. It is this issue we address in the paper Felix will present next week.

There’s lots of exciting program analysis work that could be built on top of Felix’s framework, and I’m keen to explore this further – if a reader of this blog would like to collaborate in this direction or like to do a PhD in this field, feel free to get in touch.

Perhaps most importantly, Felix’s framework is open source – check it out at https://github.com/constantinides/FPGA-shared-mem and let us know if you use it!