This week, my PhD student Felix Winterstein presented our work (joint with Intel and MIT) on how to customise memory systems to support parallel applications, at the FPGA 2015 conference.

If you are working with FPGAs, you have a huge freedom to develop an arbitrary on-chip memory system to support your application. Tools are getting quite good at developing such systems for regular array based code. But pointer manipulating programs that build, operate on, and destroy memory structures on the fly will not pass through commercial HLS tools, let alone produce good quality results.

We’ve shown that this issue can be addressed using a tool we’ve developed based on the theoretical foundation of Separation Logic. The tool automatically figures out when functional units can have private on-chip caches, which functional units need shared coherent caches, and – in the latter case – when commands can be reordered to aid parallelisation. It uses this analysis to automatically produce C source code that will pass through commercial HLS tools, providing the necessary hints and pragmas to parallelise the code, and automatically generates the on chip caches to support the parallel datapath.

I’m really very excited about this work, I view it as a big step towards efficient automatic hardware implementation of full-featured C code.