On Wednesday, my PhD student Felix Winterstein will present his paper Custom-Sized Caches in Application-Specific Memory Hierarchies at FPT 2015 in New Zealand. This is joint work with MIT and Intel, specifically the team who developed LEAP.
Imagine that your FPGA-based algorithm needs to access data. Lots of data. What do you do? Store that data in SDRAM and spend ages carefully thinking about how to make efficient use of scarce on-chip memory to get the best performance out of your design. What if that whole process could be automated?
Felix has been working for a while on a method to do just that. He’s been able to produce some incredibly exciting work, making use of the latest developments in software verification to devise an automatic flow that analyses heap manipulating software and how it should be efficiently implemented in hardware. At FPGA in February this year, we were able to show how our analysis could automatically design a parallel caching scheme, and know – prove, even! – when costly coherence protocols could be avoided. The remaining part of the puzzle was to automatically know – in a custom parallel caching scheme – how much on-chip memory you should allocate to which parallel cache. This is the last piece of the puzzle we will present on Wednesday.