California / FPGA 2026

This month I took a trip to California for the FPGA 2026 conference, together with my two new PhD students Ben Zhang and Bardia Zadeh, which I combined with a number of visits in the San Francisco Bay Area in the days preceding the conference. This post provides a brief summary of my visit.

First up on our travels was dinner with Rocco Salvia. Rocco was my Research Assistant – we worked together some years ago on automating the analysis of average-case numerical behaviour of reduced-precision floating-point computation. He is now working for Zoox, the robotaxi company owned by Amazon where his first-rate engineering skills are being put to good use!

The following day we went to visit Max Willsey at UC Berkeley and his PhD student Russel Arbore. Max and I (together with others) organised a Dagstuhl workshop on e-graphs recently, and we went to pick up the research conversations we left behind a month ago in Germany and spend some good quality whiteboard time together. Russel and Max are working on some really exciting problems in program analysis.

That afternoon, we had the chance to catch up with my old friend and colleague Satnam Singh, now working for the startup harmonic.fun. Harmonic is a really exciting company, combining modern AI tools with Lean-based formal theorem proving. Expect great things here.

George, Satnam, Bardia and Ben, enjoying coffee near the Harmonic office.

The following morning, we went to visit AMD, with whom I have longstanding collaborations. Amongst others, we met my two former PhD students Sam Bayliss and Erwei Wang there, and discussed our ongoing work on e-graphs and on efficient machine learning, as well as finding out the latest work in Sam’s team at AMD including their release of Triton-XDNA.

That afternoon we visited NVIDIA’s stunning HQ to meet with Rajarshi Roy and Atefeh Sohrabizadeh. I know both of them through the EPSRC International Centre-to-Centre grant I led: Rajarshi was introduced to me by Bryan Catanzaro as the author of some really interesting work on reinforcement learning for computer arithmetic design, and spoke at our EPSRC project’s annual workshop. Atefeh was a PhD student affiliated with the Centre (advised by Jason Cong, UCLA) and spent some time visiting my research group. We heard about the recent NVIDIA work on AI models to aid software engineering and of combined speech and language.

Rajarshi, Ben, Bardia, George and Atefeh at NVIDIA

It has long been a tradition that Peter Cheung, when in the Bay Area, organises a get-together of alumni of the Circuits and Systems Group (formerly Information Engineering group, when Bob Spence was Head of Group). This time was no exception – we met up with many of our department’s former students, and some came to a great dinner too. It’s always a delight to hear about the activity of our alumni, spread across the tech companies in the Bay Area.

After a flying visit to my former PhD student’s family, we then made it down to Monterey for FPGA 2026. Regular readers of this blog will know that I’ve been attending FPGA for more than 20 years, have been Program Chair, General Chair, Finance Chair and am now Steering Committee member of the conference. So it always feels a little like “coming home”. I also love Monterey – despite the touristy bits – and am a fan of Steinbeck‘s writing in which he immortalised Monterey with some of the best opening lines ever (of Cannery Row): “Cannery Row in Monterey in California is a poem, a stink, a grating noise, a quality of light, a habit, a nostalgia, a dream.”

This year, the general chair of FPGA 2026 was Jing Li, and the great programme was put together by Grace Zgheib.

My favourite paper at FPGA this year also won the best paper prize. Duc Hoang and colleagues identified that Kolmogorov-Arnold Networks are a natural fit to the LUT-based neural networks my group pioneered e.g. [1,2]. They form a really interesting design point, overcoming the exponential scaling of area with the product of precision and neuron fanin present in both my SOTA work with Marta Andronic and earlier work like Xilinx LogicNets, to produce a design that scales exponentially only in the precision. I very much enjoyed reading this paper and seeing it presented, and I think it opens up new areas of future work in this area.

Duc Hoang and his coauthor Aarush Gupta, receiving the best paper award from Grace and Jing

I also particularly enjoyed the work of Shun Katsumi, Emmet Murphy and Lana Josipović (ETH Zurich) on eager execution in elastic circuits. I previously collaborated with Lana on elastic circuits, and it’s great to see the latest work in this area and the use of formal verification tools to prove correctness of performance enhancements. I had a very nice discussion with Lana about possible ways to take this work further.

Rouzbeh Pirayadi, Ayatallah Elakhras, Mirjana Stojilović and Paolo Ienne (EPFL) had a really interesting paper on avoiding the overhead of load-store queues in dynamic high-level synthesis. (This paper was also the runner-up best paper).

From my own institution, Oliver Cosgrove, Ally Donaldson and John Wickerson had a great paper on fuzzing FPGA place and route tools, which has led a vendor to fix a bug they uncovered through their tool.

There were many other good papers, but just to mention a couple that I found particularly aligned to my own interests: EdgeSort on the design of line-rate streaming sorters and HACE on extracting CDFGs from RTL were both really interesting to hear presented.

It was great to be reunited with so many international colleagues and to provide my new students Bardia and Ben with the chance to begin their journey of integration into this welcoming community.

Me, John, Bardia, Oliver and Ben after the end of the final conference session

FCCM 2025

I’ve recently returned from the IEEE International Symposium on Field-Programmable Custom Computing Machines (known as FCCM). I used to attend FCCM regularly in the early 2000s, and while I have continued to publish there, I have not attended myself for some years. I tried a couple of years ago, but ended up isolated with COVID in Los Angeles. In contrast, I am pleased to report that the conference is in good health!

The conference kicked off on the the evening of the 4th May, with a panel discussion on the topic of “The Future of FCCMs Beyond Moore’s Law”, of which I was invited be be part, alongside industrial colleagues Chris Lavin and Madhura Purnaprajna from AMD, Martin Langhammer from Altera, and Mark Shand from Waymo. Many companies have tried and failed to produce lasting post-Moore alternatives to the FPGA and the microprocessor over the decades I’ve been in the field and some of these ideas and architectures (less commonly, associated compiler flows / design tools) have been very good. But, as Keynes said, “markets can remain irrational longer than you can remain solvent”. So instead of focusing on commercial realities, I tried to steer the panel discussion towards the genuinely fantastic opportunities our academic field has for a future in which power, performance and area innovation changes become a matter of intellectual advances in architecture and compiler technology rather than riding the wave of technology miniaturisation (itself, of course, the product of great advances by others).

The evening panel, as imagined by AI. I’m 2nd to left. The AI tool was clearly unaware of Martin’s height difference!

The following day, the conference proper kicked off. Some highlights for me from other authors included the following papers aligned with my general interests:

AutoNTT: Automatic Architecture Design and Exploration for Number Theoretic Transform Acceleration on FPGAs from Simon Fraser University, presented by Zhenman Fang.
RealProbe: An Automated and Lightweight Performance Profiler for In-FPGA Execution of High-Level Synthesis Designs from Georgia Tech, presented by Jiho Kim from Callie Hao‘s group.
High Throughput Matrix Transposition on HBM-Enabled FPGAs from the University of Southern California (Viktor Prasanna‘s group).
ITERA-LLM: Boosting Sub-8-Bit Large Language Model Inference Through Iterative Tensor Decomposition from my colleague Christos Bouganis‘ group at Imperial College, presented by Keran Zheng.
Guaranteed Yet Hard to Find: Uncovering FPGA Routing Convergence Paradox from Mirjana Stojilovic‘s group at EPFL – and winner of this year’s best paper prize!

In addition, my own group had two full papers at FCCM this year:

Banked Memories for Soft SIMT Processors, joint work between Martin Langhammer (Altera) and me, where Martin has been able to augment his ultra-high-frequency soft-processor with various useful memory structures. This is probably the last paper of Martin’s PhD – he’s done great work in both developing a super-efficient soft-processor and in forcing the FPGA community to recognise that some published clock frequency results are really quite poor and that people should spend a lot longer thinking about the physical aspects of their designs if they want to get high performance.
NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference, joint work between my PhD student Marta Andronic and me. I think this is a landmark paper in terms of the results that Marta has been able to achieve. Compared to her earlier NeuraLUT work which I’ve blogged on previously, she has added a way to break down large LUTs into trees of smaller LUTs, and a hardware-aware way to learn sparsity patterns that work best, localising nonlinear interactions in these neural networks to within lookup tables. The impact of these changes on the area and delay of her designs is truly impressive.

Martin explaining efficient
memory structures for soft processors

Overall, it was well worth attending. Next year, Callie will be hosting FCCM in Atlanta.

FPGA & HPCA 2025

I recently returned from two back-to-back conferences, FPGA 2025 in Monterey, California and HPCA 2025 in Las Vegas, Nevada. In this blog post, I will summarise some of the things I found most interesting at these conferences.

Before I even got to the first conference, I was delighted to have the chance to meet in San Francisco with Cole Harry, who runs the new Imperial Global USA. They have an exciting plan of work to develop links between Imperial and academics, industrialists, alumni and VCs in the Bay Area. I would strongly recommend reaching out to Cole if you are based in the Bay Area and would like to get involved.

FPGA, the ACM International Symposium on FPGAs, is always a joy to attend. This year we had a great balance of industry and academia attending, as is often the case. The conference recently moved to introduce keynote talks. I’m on the fence about the value of keynotes at FPGA, but this year they were both exceptionally good. The first was from Steve Reinhardt (Senior Fellow, AMD) and the second was from my long-term colleague John Wawrzynek (UC Berkeley). It was very gratifying that both keynote speakers singled out our work on LUT-based machine learning, started by my PhD student Erwei Wang (now with AMD) with his 2019 paper LUTNet, as an example of where the field should be heading in the future. In Steve’s case, this was part of his overall summary of architectures for AI. In John’s case, this was part of his call to follow Carver Mead‘s advice to “listen to what the silicon is telling you!” John’s keynote was a wonderful trip down memory lane for me – he highlighted many times in the last 30 years or so where the FPGA community has been well ahead of the broader community in working through and adopting various technologies and ideas. It was great to be reminded of the papers I had seen presented – and got excited about – when I was a PhD student myself (1998-2001). John also gave a personal shout out to my PhD student Marta Andronic for the great work she is doing.

Session 1 of FPGA was on AI for FPGAs. The first paper, FlightVGM: Efficient Video Generation Model Inference with Online Sparsification and Hybrid Precision on FPGAs, was a collaboration between various Chinese universities. This paper won the best paper prize at the conference. I liked their unusual packing of DSP blocks for non-standard word-lengths. The second paper was from Alireza Khataei, the PhD student of my colleague and friend Kia Bazargan. They presented an intriguing approach to using decision trees as FPGA inference engines. The results were good, and have left me wondering how brittle they may be to linear transformations of the input space, given that DNN based work will be invariant to these transformations (modulo quantisation error) whereas the axis-alignment of these decision tree boundaries will not. The third paper was a collaboration with us at Imperial (and others) led by Olivia Weng from Ryan Kastner‘s group at UCSD. Olivia presented an empirical exploration of the ensembling of weak classifiers, including our LUT-based classifiers. The final presentation of this session was from our group, a short paper by Olly Cassidy, our undergraduate student, which I describe in an earlier blog post.

Session 2 of FPGA was on CAD. It began with my former PhD student David Boland presenting a collaboration he has undertaken with my other former PhD student Kan Shi and others on efficient (simulation-based) verification of High-level Synthesis (HLS) designs, using FPGA-based acceleration.

Session 3 of FPGA was on HLS. This session included some interesting work presented by Jinming Zhuang, from Brown (with the excellent Peipei Zhou) and Cornell (including my friend Zhiru Zhang) on MLIR for compilation targeting AMD’s AI engines, and also great work from Suhail Basalama (who used to be affiliated with my EPSRC SpatialML centre) and his advisor and my long-term colleague Jason Cong from UCLA. It was really nice to see the shared-buffer to FIFO conversion in this work.

Session 5 of FPGA was on architecture. Readers of this blog may remember the work on dynamic HLS started by Lana Josipović (now at ETH) when she was Paolo Ienne‘s PhD student at EPFL. Authors of the first paper, presented by Louis Coulon from EPFL asked the question of how one may wish to redesign FPGA architecture to better suit this model of HLS. I also liked the second talk, a collaboration between several universities, presenting a paper on incorporating $n$ -out-of- $k$ element sparsity in tensor processing tiles.

Session 6 of FPGA was also on High-level Synthesis. Notable contributions included a presentation from Stéphane Pouget (UCLA) on work with Louis-Noël Pouchet (Colorado State) and Jason Cong that proposed a MINLP to combine pragma insertion with loop transformations; back in 2009, I began to look at nonlinear programming for HLS transformations with my former student Qiang Liu – the paper at FPGA this year explored a really interesting practical design space for modern HLS. My former PhD student David Boland again presented in this session, this time presenting a collaboration between two of my other former PhD students who could not make the conference: Jianyi Cheng and Kan Shi (the latter mentioned above) and others, this time on verification of dynamically-scheduled high-level synthesis. The third talk on this session was presented by Robert Szafarczyk and coauthors from the University of Glasgow, looking at dynamic loop fusion in high-level synthesis based on an interesting monotonicity program analysis; dynamic transformations hold out a lot of promise – I began to look at this with my student Junyi Liu in 2015 – the paper at FPGA this year provides an interesting and different new direction.

HPCA in Las Vegas was a new conference to me. I was prompted to attend due to a collaboration I had with Mohamed Abdelfattah‘s group at Cornell Tech, under the auspices of my EPSRC Centre-to-Centre grant. This collaboration led to my PhD student Marta Andronic spending some months embedded in Mohamed’s group at Cornell Tech, and out of this grew a paper presented at HPCA this year by Yuzong Chen, Mohamed’s PhD student. Yuzong presented both a memory-efficient encoding and a corresponding accelerator architecture for LLM acceleration.

HPCA was co-located with PPoP and CGO, and they shared keynote sessions. Charles Leiserson – a household name in computing – gave the first keynote, associated with PPoP. The presentational style was uniquely engaging. The message was simple: the end of Moore’s Law demands more focus on performance engineering. It’s a message that’s not new, but was wonderfully delivered.

The second keynote, associated with CGO, was given by John Regehr. This was also excellent, spanning work (of his group and others) on compiler correctness and fuzzing, formalisation of compilers, alive2, souper, and bit-width independent rewrites. The latter topic is one that John’s group and mine have communicated over in the past, as it arose in the context of my PhD student Sam Coward‘s work, where we would ideally have liked bit-width independent rewrites, but settled for proving rewrites correct for reasonable bit-width ranges. John’s talk emphasised the social and economic foundations of impact in the compiler world.

The final keynote, associated with HPCA, was given by Cliff Young from Google. This talk was truly outstanding in both content and delivery. He started with a good summary of ML algorithms from an architect’s perspective. He spoke about TPUs and systems built out of TPUs at Google. Perhaps more significant, from my perspective, than the (great) technical content was the non-technical content of his talk. Cliff spoke about how the academic community is key to the long-term health of the field, and how even at Google it is literally only a handful of people who have the ability to think as long-term as academics, as people are just too busy building things. He emphasised the need for major algorithmic developments: “modern ML is not efficient, it is only effective“, was his slogan, alongside the tongue-in-cheek “I very much doubt that the transformer is all we need”. He reflected on the fallow periods in his career, and emphasised that they were absolutely necessary to enable the productive periods – a lesson that could be well learnt by research assessment processes across the world: “the only thing you can actually optimise in your career is, ‘are you enjoying yourself and learning?'” – a great manifesto. He spoke about his own imposter syndrome and about the social anxiety prevalent amongst some of the most impressive international researchers – he also had a mitigation: working together across discipline boundaries allows people to be ‘the respected expert’ in their area without the internalised expectation that you must know everything, providing an element of psychological safety. He spoke about his preference for simplicity over building complex things (something I very much share). And amusingly shared “the best piece of professional advice I was ever given”, which turned out to be “Wow, if you’d only shut the fuck up a bit, you’d be 1000x better!” This lecture was a joy to listen to.

In addition to meeting new and old friends over the past couple of weeks, it was a wonderful to meet students of former students. This year, I got to meet Muhammad Ali Farooq, who has just started off on a PhD programme with Aman Arora, but before that was the student of my former PhD student Abid Rafique at NUST.

The least joyful part of my trip was Las Vegas – a city that seems to have been designed to induce sensory overload. But no matter: the conferences definitely made up for it. And the highlight of my trip was most definitely the weekend between the two conferences where I got to spend time with the lovely family of my former PhD student Sam Bayliss in the Bay Area.