On Tuesday, my PhD student Junyi Liu presented our work (joint with John Wickerson) on Loop Splitting for Efficient Pipelining in High-Level Synthesis to the assembled audience at the IEEE International Symposium on Field-Programmable Custom Computing Machines in Washington DC.
A primary way in which FPGA applications tend to get their blindingly fast performance is through overlapping loop iterations in time – known variously as loop pipelining or software pipelining. These days, you can expect high-level synthesis tools to do this for you. Sometimes.
Unfortunately there are cases where the tools can’t get squeeze out performance. This paper addresses two such cases in a unified framework. The first is the case where some pesky loop iterations get in the way. Consider this trivial example:
for( int i=0; i&lt;N; i++ ) A[2*i] = A[i] + 0.5f;
In this case the early loop iterations are the problematic ones because A depends on A, A on A, etc. These tight dependences hinder pipelining, leading existing HLS tools to throw in the towel.
The second case is where there are loop invariant parameters that are not known until the loop executes. Consider the case:
for( int i=0; i&amp;lt;N; i++ ) A[i+m] = A[i] + 0.5f;
Without knowing the value of m at compile time, the dependence structure is unknown – we might have no read-after-write dependences, tight read-after-write dependences, or the dependences might be so many iterations away that we just don’t care and can pipeline away to our heart’s content. A limited version of this latter issue was addressed in Junyi’s earlier paper, the predecessor of this work.
In the new FCCM 2016 paper, we show that both these cases can be analysed using a parametric polyhedral framework, and show that we can automatically derive source-to-source transformations to significantly accelerate the loops in these cases. The end result? A push button approach that could gain you a factor of more than 4x in performance if your pipelining is being stymied by pesky dependences.