DATE 2019: Some Highlights

This week, I attended the Design, Automation & Test in Europe (DATE) conference in Florence, Italy. DATE is a large conference, which I have attended irregularly since I was a PhD student. This year, the general chair was a long-standing colleague from the FPGA community, Jürgen Teich.

Readers can find a summary of some of the talks I found most interesting below.

On Tuesday, my colleague Martin Trefzer chaired a session on Computational and Resource-efficiency in Quantum and Approximate Computing. The work by Sekanina was  interesting, using information on data distribution to drive the construction of approximate circuits. Circuits are constructed from the baseline using a technique called Cartesian Genetic Programming. I have recently been collaborating with Ilaria Scarabottolo and others from Laura Pozzi‘s group on a related problem – see our DAC 2019 paper (to appear) for details – so this was of particular interest.

On Wednesday, I chaired a session When Approximation Meets Dependability, together with my colleague Rishad Shafik. Ioannis Tsiokanos from Queen’s University Belfast presented an interesting approach that dynamically truncates precision in order to avoid timing violations. This is interestingly complementary to the approach I developed with my former PhD student Kan Shi where we first simply allowed timing violations [FCCM 2013], and secondly redesigned data representation based on Ercegovac‘s online arithmetic, so that timing violations caused low-magnitude error [DAC 2014].

David Pellerin from Amazon gave an interesting keynote address which very heavily emphasised Amazon’s F1 FPGA offering, which was – of course – music to my ears.

On Thursday morning, I attended the session Architectures for Emerging Machine Learning Techniques. Interestingly, there was a paper there making use of Gustafson’s posits within hardware-accelerated deep learning, a technique they dub Deep Positron.

The highlight talk for me was Ed Lee‘s Thursday keynote, A Fundamental Look at Models and Intelligence. Although I’ve been aware of Lee’s work, especially on Ptolemy, since I did my own PhD, I don’t think I’ve ever had the pleasure of hearing him lecture before. It was insightful and entertaining. A central theme of the talk was that models mean two different things for scientists and engineers: a scientist builds a model to correspond closely to a ‘thing’; an engineer builds a ‘thing’ to correspond closely to a model. He used dichotomy to illuminate some of the differences we see between neuroscience-inspired artificial intelligence and the kind of AI we see as very popular at the moment, such as deep learning. Lee’s general-readership book Plato and the Nerd – which has been on my “to read” list since my colleague and friend Steve Neuendorffer mentioned it to me a few years ago – has just climbed several notches up that list!

On Thursday afternoon, I attended the session on The Art of Synthesizing Logic. My favourite talk in this session was from Heinz Reiner, who presented a collaboration between EPFL and UC Berkeley on Boolean rewriting for logic synthesis, in which exact synthesis methods are used to replace circuit cuts, rather than resorting to a pre-computed database of optimal function implementations. During the talk, Reiner also pointed the audience to an impressive-looking GitHub repo, featuring what looks like some very useful tools.

Friday is always workshop day at DATE, featuring a number of satellite workshops.  I attended the workshop entitled Quo Vadis, Logic Synthesis?, organised by Tiziano Villa and Luca Carloni. This was a one-off workshop in celebration of the 35th anniversary of the publication of the influential Espresso book on two-level logic minimisation.

Villa talked the audience through the history of logic synthesis, starting with the Quine-McCluskey method.

My favourite talk in this workshop was from Jordi Cortadella, who spoke about a method for synthesising Boolean relations. This is the problem of synthesising a the cheapest implementation of a function f : {\mathbb B}^n \to {\mathbb B}^m, which one is free to choose from amongst the given relation R, i.e. viewing f as a relation f \subset {\mathbb B}^n \times {\mathbb B}^m, one is free to chose f – and its implementation – subject to the requirement that f \subseteq R for some given relation R (not necessarily a function). This is a strict generalisation of the well-known problem of Boolean ‘don’t care’ conditions, a.k.a. incompletely specified functions. Cortadella presented a method leveraging the known approaches to this latter problem, by exploring the semi-lattice of relations between these sets generated by \subseteq in a structured way, using a form of branch-and-bound.

Soeken also presented a very interesting summary of three uses of SAT within logic synthesis, namely Schmitt’s ASP-DAC paper on SAT-based LUT mapping, Eén’s paper on using logic synthesis for efficient SAT (rather than SAT for efficient logic synthesis) and Haaswijk‘s recent PhD on making exact logic synthesis more scalable by providing partial topological information – a topic that interestingly has some echoes in work I’m soon to present at the Royal Society.

The workshop was an enjoyable way to end DATE, and I was disappointed to have to leave half-way through – there may well have been other interesting talks presented in the afternoon.

Highlights of CSE 2019

Over the second half of this week, I’ve been attending the SIAM Computational Science and Engineering conference in Spokane, Washington – a short flight north (and a radical change in weather) from my earlier conference in California this week.

IMG_3274
Spokane, WA in February. Temperatures were as low as -12℃.

This was my first SIAM conference. I was kindly invited to speak on the topic of floating-point error analysis by Pierre Blanchard, Nick Higham and Theo Mary. I very much enjoyed the sessions they organised and indeed the CSE conference, which I hope to be able to attend more regularly from now on.

My own talk was entitled Approximate Arithmetic – A Hardware perspective. I spoke about the rise of architecture specialisation as driving the need for closer collaboration between computer architects and numerical analysts, about some of our work on automatic error bounds Boland and Constantinides (2011) and Magron, Constantinides and Donaldson (2017), on code refactoring Gao and Constantinides (2015), as well as some of our most recent work on machine learning (I will blog separately about this latter topic over the next couple of months.)

The CSE conference is very large – with 30-40 small parallel sessions happening at any given moment – so I cannot begin to summarise the conference. However, I include some notes below on other talks I found particularly interesting.

Plenary Sessions

I very much enjoyed the plenary presentation by Rachel Ward on Stochastic Gradient Descent (SGD) in Theory and Practice. She introduced the SGD method very nicely, and looked at various assumptions for convergence. She took a particularly illuminating approach, by looking at applying SGD to the simple special case of solving a system of linear equations by minimising F(w) = \frac{1}{2}||Aw-b||^2 in the case where \exists w^*. Aw^* = b. She showed that if the system is under-determined, then SGD converges to the solution of minimum 2-norm, and therefore has an inherent regularising effect. I was surprised by some of the results on overparameterised neural networks, showing that SGD finds global minimisers and that there really doesn’t tend to be much overfitting despite the huge number of parameters, pointing to the implicit regularisation caused by the SGD algorithm itself. I learnt a lot from this talk, and have several papers on my “to read” list as a result, in particular:

There was also an interesting plenary from Anima Anandkumar on the role of tensors in machine learning. The mathematical structure of tensors and multi-linear algebra are topics I’ve not explored before – mainly because I’ve not seen the need to spend time on them. Anandkumar certainly provided me with motivation to do that!

Floating-Point Error Analysis

Theo Mary from the University of Manchester gave a very good presentation of his work with Nick Higham on probabilistic rounding error analysis, treating numerical roundoff errors as zero-mean independent random variables of arbitrary distribution, making use of Hoeffding’s inequality to a produce a backward error analysis. Their work is described in more detail on their own blog post and – in more depth – in their their very interesting paper. It’s a really exciting and useful direction, I think, given the greater emphasis on average-case performance from modern applications, together with both very large data sets and very low precision computation, the combination of which renders many worst-case analyses meaningless. In a similar vein, Ilse Ipsen also presented a very interesting approach: a forward error analysis, more specialised in that she only looked at inner products, but also without the assumption of independence, making use of Azuma’s inequality. The paper on this topic has not yet been finished, but I certainly look forward to reading it in due course!

Reducing Communication Costs

There were a number of interesting talks on mitigating communication costs. Lawrence Livermore National Labs presented several papers relating to the ZFP format they’ve recently proposed for (lossily) compressed floating-point vectors, at a mini-symposium organised by Alyson Fox, Jeffrey Hittinger, and James Diffenderfer. Diffenderfer’s talk developed a bound on the norm-wise relative error of vectors reconstructed from ZFP; Alyson Fox’s talk then extended this to the setting of iterative methods, noting as future work their interest in probabilistic analyses. In the same session, Nick Higham gave a crystal clear and well-motivated talk on his recent work with Srikara Pranesh and Mawussi Zunonslides and paper are available. This work extends the applicability of Nick’s earlier work with Erin Carson to cases that would have over- or under-flowed, or led to subnormal numbers, without the scaling technique developed and analysed here. They use matrix equilibration – this reminded me of some work I did with my former PhD student Juan Jerez and colleague Eric Kerrigan, but in our case for a different algorithm kernel and targeting fixed-point arithmetic, where making use of the full dynamic range is particularly important. The Higham, Pranesh and Zunon results are both interesting and practically very useful.

In a different session, Hartwig Anzt spoke about the work he and others have been doing to explicitly decouple storage precision from compute precision in sparse linear algebra. The idea is simple but effective: take the high-order bits of the mantissa (and the sign / exponent) and store them in one chunk of data and – separately – store the low-order bits in another chunk. Perform all arithmetic in high precision (because it’s not the computation that’s the bottleneck), but convert low-precision stored data to high precision on the fly at data load (e.g. by packing low-order bits with zeros.) Then, at run-time, decide whether to load the full-precision data or only the low-precision data, based on current estimates of convergence. This approach could also make a good case study application for the run-time adaptation methodology we developed with U. Southampton in the PRiME project.

A Reflection

Beyond the technical talks, there were two things that stood out for me since I’m new to the conference. Firstly, there were many more women than in the typical engineering conferences I attend. I don’t know whether the statistics on maths versus engineering are in line with this observation, but clearly maths is doing something right from which we could learn. Secondly, there were clear sessions devoted to community building: mentoring sessions, tutorials for new research students, SIAM student chapter presentations, early career panels, presentations on funding programmes, diversity and inclusion sessions, a session on helping people improve their CV, an explicit careers fair, etc. Partly this may simply reflect the size of the conference, but even so, this seems to be something SIAM does particularly well.