Readers of this blog may remember that my PhD student Jianyi Cheng (jointly supervised by John Wickerson) has been working on high-level synthesis, combining dynamic scheduling with static scheduling. His latest contribution, to be presented on 28th February at the ACM FPGA conference, is about finding islands of static control flow in a sea of dynamic behaviour.
Here’s the story so far:
- In 2018, our collaborators Lana Josipović and Paolo Ienne published some interesting work on dynamically-scheduled HLS which piqued my interest at the time.
- In 2020, we worked with them to show that it’s possible to combine dynamic and static scheduling in HLS to achieve performance near that of the dynamic schedule and area near that of the static schedule. While we automated the production of such a combined circuit, what this work left unsolved was the question of how to identify the parts that would work best with static scheduling.
So now, two years later, we are back at the same conference to present a method to do just that. We now have an automated flow to select parts of a program to statically schedule, resulting in a 4x reduction in area combined with a 13% boost in performance compared to a fully dynamic circuit, a result that is close to the best achievable — as shown by exhaustively enumerating different parts of the program to schedule statically.
The basic idea of the paper is to develop the concept of a static island — a part of a dataflow graph where making decisions on scheduling of operations once, at compile time, is likely to have minimal impact on performance (or may even improve it) while opening the door to static resource sharing. We can throw a boundary around these islands, synthesise them efficiently with commercial HLS tools (we use Xilinx Vitis HLS), and integrate the result into the overall dynamic circuit using our previous open-source compiler flow.
So what makes a good static island? Unsurprisingly, these islands should exhibit static control flow or control flow with balanced path timing, e.g. in a conditional statement the if
and else
branch should take the same time, and loops should have constant dependence distances (or none at all). Jianyi also shows that there is an advantage to having these islands consume their inputs at offset-times, e.g. for a two-input island we may wish the static scheduler to be aware that second input is available — on average — two cycles after the first. He shows precisely how to generate ‘wrapper’ circuits for these components, allowing them to communicate with a dynamically scheduled environment.
The overall design flow, shown below, is now fully automated – freeing the user from writing the pragmas we required two years ago.
