On Monday, my PhD student Erwei Wang will present our work (joint also with James Davis and Peter Cheung) called *LUTNet: Rethinking Inference in FPGA Soft Logic* at the IEEE International Symposium on Field-Programmable Custom Computing Machines in San Diego, California.

In this paper, we take a very unusual approach to the design of a deep neural network accelerator in hardware: for us, *the nodes in the neural network are Boolean lookup tables*.

We were motivated initially by the fact that in very low precision FPGA neural network architectures, lookup tables are often used for arithmetic, but they are often used for very specific functions: while a -LUT is capable of implementing *any* nonlinear Boolean function with inputs, it ends up getting used for only a tiny fraction of these functions. A good example is binarised neural networks (BNNs) such as FINN, where LUTs end up being used to implement XNOR gates (multiplication over ) and popcount functions. Our research question is therefore: rather than restricting ourselves to these functions, *can we make better use of the LUTs by embracing the nonlinearity and the -input support they give us*?

We show that this is indeed possible. Our basic approach is to start with a weight-binarised neural network, add inputs to each node to bring them up to support, and then retrain the Boolean function implemented by that node. Retraining Boolean functions is a bit tricky, of course, because neural network training algorithms are not designed for this purpose. We generate a smooth interpolating function over the LUT entries, allowing us to use standard neural network training software (we use TensorFlow).

The end result is that the re-trained neural network is far more prunable than the original, because the extra inputs to the -LUTs compensate for the removal of other nodes. Thus we end up with a much sparser neural network for the same classification accuracy. The sparsity improves our area by a factor of two or more, yet the more complex inference functions at each node are effectively provided “for free” by the FPGA architecture.

Circuit netlist? Neural network? Same thing!