Sizing Custom Parallel Caches

On Wednesday, my PhD student Felix Winterstein will present his paper Custom-Sized Caches in Application-Specific Memory Hierarchies at FPT 2015 in New Zealand. This is joint work with MIT and Intel, specifically the team who developed LEAP.

Imagine that your FPGA-based algorithm needs to access data. Lots of data. What do you do? Store that data in SDRAM and spend ages carefully thinking about how to make efficient use of scarce on-chip memory to get the best performance out of your design. What if that whole process could be automated?

Felix has been working for a while on a method to do just that. He’s been able to produce some incredibly exciting work, making use of the latest developments in software verification to devise an automatic flow that analyses heap manipulating software and how it should be efficiently implemented in hardware. At FPGA in February this year, we were able to show how our analysis could automatically design a parallel caching scheme, and know – prove, even! – when costly coherence protocols could be avoided. The remaining part of the puzzle was to automatically know – in a custom parallel caching scheme – how much on-chip memory you should allocate to which parallel cache. This is the last piece of the puzzle we will present on Wednesday.

 

Where High Performance Reconfigurable Computing Meets Embedded Systems

I have just returned from attending HiPEAC 2015 in Amsterdam. As part of the proceedings, I was asked by my long-time colleague Juergen Becker to participate in a panel debate on this topic.

Panels are always good fun, as they give one a chance to make fairly provocative statements that would be out of place in a peer review publication, so I took up the opportunity.

In my view, there are really two key areas where reconfigurable computing solutions provide an inherent advantage over most high performance computational alternatives:

  • In embedded systems, notions of correct or efficient behaviour are often defined at the application level. For example, in my work with my colleagues on control system design for aircraft, the most important notion of incorrect behaviour is would this control system cause the aircraft to fall out of the sky? An important metric of efficient behaviour might be how much fuel does the aircraft consume? These high level specifications, which incorporate the physical world (or models thereof) in its interaction with the computational process, allow a huge scope for novelty in computer architecture, and FPGAs are the ideal playground for this novelty.
  • In real time embedded systems, it is often important to know exactly how long a computation will take in advance. Designs implemented using FPGA technology often provides such guarantees – down to the cycle – where others fail. There is, however, potentially a tension between some high level design methodologies being proposed and the certainty of the timing of the resulting architecture.

Despite my best attempts to stir up controversy, there seemed to be very few dissenters to this view from other members of the panel, combined with a general feeling that the world is no longer one of “embedded” versus “general purpose” computing. If indeed we can draw such divisions, they are now between “embedded / local” and “datacentre / cloud”, though power and energy concerns dominate the design process in both places.