# Watch Where You’re Pointing That!

This week Nadesh Ramanathan, a member of research staff in my group, will be presenting a paper at the virtual FPL 2020 conference entitled “Precise Pointer Analysis in High Level Synthesis” (jointly with John Wickerson and myself). This blog post is intended as an accessible summary of the key message of the paper.

People are now aiming to generate hardware accelerators for more complex algorithms than things like classical CNNs, low-level image processing tasks, and other bread-and-butter hardware acceleration tasks. Inevitably, this is a difficult task to get right, and the prevalence of C/C++-based high-level synthesis (HLS) tools offers a great opportunity to experiment with the design space. Sophisticated algorithms written in C/C++ often incorporate pointers, which have long been difficult for HLS tools. Previously, I proposed a relatively sophisticated analysis using separation logic, together with my PhD student Felix Winterstein, which is an intensive analysis specialised to certain data structures. Nadesh’s most recent work can, in some sense, be viewed as the opposite. He is trying to make more simple, but more generally applicable pointer analyses more widely understood and used within HLS, while trying to quantify how much this might bring to hardware accelerator design.

The basic idea is that since FPGA compile times are long, we can afford to spend a bit more time being precise about which variables can point to which other variables. The question is, what are the benefits of being more precise in the context of HLS? Nadesh has studied two different types of ‘sensitivity’ of pointer analyses – to flow and to context. Flow-sensitive analyses consider the ordering of memory operations, context sensitive analyses consider the calling context of functions. The most common form of analysis in HLS is Andersen analysis, which is neither flow- nor context-sensitive. So how much do we gain by utilising more precise analyses?

Nadesh studies this question by modifying the LegUp source code, showing that over the PTABen benchmark set, area utilisation can be halved and performance doubled by using these analyses. This suggests that as we move towards greater diversity in hardware accelerators, HLS tool developers should think carefully about their pointer analyses.

# When are Digits Correct?

Often, we compute with iterative algorithms. Start with some value, often an initial guess to be refined, and keep iterating until some stopping criterion is met. If we actually think about what goes on in a modern digital computer when we execute these algorithms, we soon see that – often – the same digits end up being computed time and again. As we converge to a value, it’s reasonable to expect that most of the time the most significant digits become stable. But we still compute them, time and again at each iteration, wasting computational resource.

In general, in standard binary representations, this re-computation may not be avoidable – most-significant digits might be stable for 1000 iterations and then flip, e.g. from 0.99999 to 1.00000. As a child, I used to play with such iterations using my HP32S calculator – a gift from fred harris – it provided endless entertainment.

There is, however, a class of number representations in which these digit flips can be avoided: redundant number representations. These representations have a long history – indeed, as my friend and colleague Miloš Ercegovac has identified, they can be traced back as far as a 1727 publication in Phil. Trans. Royal Soc by John Colson FRS. Miloš developed these ideas to a mature state between the 1970s and today, in the guise of Online Arithmetic.

Together with my PhD students He Li (now research staff at Cambridge) and Ian McInerney and collaborator James Davis, I have recently done some work on methods to detect and establish exactly when digits become stable using such schemes and what the implications might be for hardware that makes use of this stability. In our recent IEEE Transactions on Computers paper, we adapt standard forward error analyses of stationary iterative methods to this setting. We mathematically derive some conditions that can be checked at run-time to determine when you don’t need to compute certain digits in any future iteration, and also present a toy hardware implementation taking advantage of this approach using a non-standard arithmetic processor design.

We hope that – in the future – only what needs to be computed will be computed.

# Learning in Vienna

I was delighted to be asked by Axel Jantsch to speak recently at the opening of the new Christian Doppler Laboratory (CDL) for Embedded Machine Learning. This is a new collaborative laboratory between TU Wien (led by Jantsch) and TU Graz (Bischof) as well as several strong industrial partners from Germany and Austria. Its scope and content is very closely aligned to the EPSRC Center for Spatial Computational Learning, which I launched last November, the latter bringing together Imperial and Southampton in the UK with Toronto in Canada, UCLA in the USA, and – just announced – Sydney in Australia. I was therefore delighted to bring fraternal greetings from our centre, and to begin discussions over how we could work together in the future to build a truly global collaborative research effort in this space.

In addition to hearing about the work plan and objectives for the new CDL, the meeting heard from three external academics (Christoph Lampert, Bernt Schiele and myself) on their recent relevant research. I spoke about the work I recently published in my article in the Philosophical Transactions of the Royal Society. I found both Christoph and Bernt’s talks inspiring – I briefly summarise the key points of interest to me, below.

Christoph Lampert from IST spoke on Embedded and Adaptive Models for Visual Scene Understanding. He explained that while others are working on more efficient hardware and more efficient ML models, his group is focusing on matching models to problems and making models adaptive to the environment in which they’re applied. With this in mind, he focused on two recent papers, one from WACV 2020 and one from ICCV 2019. The WACV paper is on object detection, proposing compute-efficient method that avoids the twin pitfalls of proposal-based systems like Faster R-CNN and grid-based methods like YOLO. The ICCV paper is on multi-exit architectures, where latency constraints can cause an early termination of deep neural network inference computation and we want to have a useful result still in these circumstances. The paper discusses how to train such networks using ideas from Hinton’s et al.’s “Knowledge Distillation”. This also reminded me of some work from our research group [1,2] featuring early exits or a focus on getting good results quickly in latency-constrained systems.

Bernt Schiele from MPI and Saarland University spoke on Bright and Dark Sides of Computer Vision and Machine Learning. He spoke about several very interesting topics, including scene context (e.g. ‘Not Using the Car to See the Sidewalk‘, CVPR 2019), Disentangling Adversarial Robustness and Generalization (CVPR 2019) and reverse engineering and stealing deep models (e.g. ‘Knock-off Nets’, CVPR 2019). All are very interesting and timely topics, but the work on disentangling adversarial robustness and generalisation was particularly interesting to me, since I’ve given the topic of generalisation some thought in the context of efficient DNN accelerator hardware. Schiele argued for a stronger distinction to be made in the research community between different classes of adversarial examples — he focused on the idea of “on-manifold adversarial examples” where an adversarial example needs to actually be a correct instance of the class, rather than an arbitrary perturbation of an image — the latter commonly used in the literature, which Schiele referred to as a ‘regular’ adversarial example. His talk showed how on-manifold examples could be studied. The main take-home messages were that ‘regular’ robustness and generalisation are not contradictory, but that ‘on-manifold’ adversarial examples can be found, and such robustness in this instance is generalisation.

# When to Schedule?

On Tuesday, Jianyi Cheng will present our recent work Combining Dynamic and Static Scheduling in High-level Synthesis at the ACM International Symposium on FPGAs in Monterey. This is joint work between Jianyi and his two supervisors, John Wickerson and myself, as well as our collaborators from EPFL, Lana Josipović and her PhD supervisor Paolo Ienne.

As I’ve described in previous blog posts [1,2], Lana has been doing interesting work over the last few years, developing a tool flow for dynamically-scheduled high-level synthesis (HLS). Typically in modern HLS tools like VivadoHLS or LegUp, scheduling decisions are made statically – at compile time. However, Lana’s tool flow makes these decisions dynamically, at run time, using handshaking circuitry, reminiscent of Page and Luk’s work compiling occam into FPGAs.

In our paper, we have teamed up with EPFL to build a flow that can result in the best of both worlds. Static scheduling can be very efficient, sharing resources and leading to low area designs. Dynamic scheduling can be very fast, working around actual rather than potential data dependencies. Jianyi’s paper allows the definition of statically scheduled functions within a dynamically scheduled program. He shows that over a range of benchmarks, the results are about half the area of the fully dynamically-scheduled designs while about 1.7x faster than the fully statically-scheduled designs.

# Arithmetic for Neural Networks

Last month, the Royal Society Phil Trans A published my paper Rethinking Arithmetic for Deep Neural Networks, part of a special issue put together by Jack Dongarra, Laura Grigori and Nick Higham. In this blog post, I aim to briefly introduce the ideas in the paper. The paper is open access, so please read it for further detail. In addition, my slides from a talk given on an early version of this work are available from Nick Higham’s blog, and an mp3 recording of me talking to these slides has been made available by the Royal Society here.

The central theme of the paper is that hardware accelerator circuits for neural networks can actually be treated as neural networks. Consider the two graphs below. One of them represents a simple deep neural network where each node performs an inner product and a ReLU operation. The other represents the result of (i) deciding to used 4-bit fixed-point arithmetic, and then (ii) synthesising the resulting network into a circuit, technology-mapped to 2-input Boolean gates.

Although there are obvious differences (in structure, in number of nodes) – there is a core commonality: a computation described as a graph operating on parameterisable functional nodes.

So what can we gain from this perspective?

1. Binarized Neural Networks are universal. The paper proves that any network you want to compute can be computed using binarized neural networks with zero loss in accuracy. It’s simply not the case that some problems need high precision. But, as a corollary, it is necessary to not be tied too closely to the original network topology if you want to be guaranteed not to lose accuracy.

2. Boolean topologies are tricky things. So if we want to rethink topologies, what first principles should we use to do so? In the paper, I suggest looking to inspiration from the theory of metric spaces as one step towards producing networks that generalise well. Topology, node functionality, and input / output encoding interact in subtle, interesting, and under-explored ways.

3. This viewpoint pays practical dividends. My PhD student Erwei Wang and collaborators James Davis and Peter Cheung and I have developed a Neural Network flow called LUTNet, which uses Boolean lookup tables as the main computational node in a neural network, leading to very efficient FPGA implementations.

I’m hugely excited by where this work could go, as well as the breadth of the fields it draws on for inspiration. Please do get in touch if you would like to collaborate on any of the open questions in this paper, or any other topic inspired by this work.

# Machine Learning at FPT 2019

Next week, the IEEE International Conference on Field-Programmable Technology (FPT) will take place in Tianjin in China. I’m proud that my former PhD student Qiang Liu will be General Chair of the conference.

I am a coauthor of two papers to be presented at FPT, one led by my former BEng student Aaron Zhao, now a PhD student at Cambridge supervised by my colleague Rob Mullins, and one led by my former postdoc, Ameer Abdelhadi, now with COHESA / UofT. The paper by Aaron is also in collaboration with two of my former PhD students, Xitong Gao, now with the Chinese Academy of Sciences, and Junyi Liu, now with Microsoft Research.

The first paper, led by Aaron, is entitled ‘Automatic Generation of Multi-precision Multi-arithmetic CNN Accelerators for FPGAs’, and can be found on arXiv here. This paper is a serious look at getting an automated CNN flow for FPGAs that makes good use of some of the arithmetic flexibility available on these devices. Powers-of-two (“free” multiplication) and fixed-point (“cheap” multiplication) are both leveraged.

The second paper, led by Ameer, looks at the computation of a set of approximate nearest neighbours. This is useful in a number of machine learning settings, both directly as a non-neural deep learning inference algorithm and indirectly within sophisticated deep learning algorithms like Neural Turing Machines. Ameer has shown that this task can be successfully accelerated in an FPGA design, and explores some interesting ways to parameterise the algorithm to make the most of the hardware, leading to tradeoffs between algorithm accuracy and performance.

If you’re at FPT this year, please go and say hello to Aaron, Ameer and Qiang.

# Highlights of Asilomar 2019

This week, I attended my first Asilomar Conference on Circuits, Signals and Computers, a very long-running conference series of the IEEE Signal Processing Society, with a very broad range of topics. I decided to attend Asilomar after being invited to give not just one talk, but two, once by my friend and collaborator Miloš Ercegovac from UCLA, and once by my good colleague Zhiru Zhang from Cornell.

No discussion of highlights of Asilomar can go without pointing out the extraordinarily beautiful setting of a conference centre right on Asilomar Beach. I can certainly see why the conference organisers keep coming back year after year – since the 1970s for Miloš and even earlier for my old friend fred harris, who I met there by surprise.

Distinguished Lecture

The conference opened with distinguished lecture by Helmut Bölcskei from ETH Zurich, who gave a wonderful talk about the fundamental limits of deep learning. The key results he presented were about neural networks built of linear computational units and ReLU functions, and he showed how they can approximate a range of different functions. I was already familiar with asymptotic results for infinite depth or infinite width networks, but Bölcskei’s results were different – they showed how the approximation quality can be traded against a metric of neural network complexity that captured the number of bits needed to store the topology and the weights of the network. He was able to show the power of such neural networks across an extremely broad class of functions, and to explain how this comes about.

Compilation for Spatial Computing Architectures

This session was organised by Zhiru Zhang from Cornell and Hongbo Rong from Intel. The first talk, given by Yi-Hsiang Lai from Cornell, described the HeteroCL infrastructure, about which I’ve previously blogged in my description of FPGA 2019. Very closely related to this was Hongbo’s own work at Intel Labs, which makes heavy use of polyhedral methods, and work from the systolic array community on affine and uniform recurrence equations.

I then gave a talk about some of the work my research group has been doing over the past 12+ years in analysis of memory access patterns for High-Level Synthesis, taking in my early foundational work in bringing the polyhedral model to HLS with Qiang Liu (now at Tianjin University), our work on Separation Logic in HLS (now also a book by Felix Winterstein, my former PhD student who leads Xelera Technologies), and our recent work on utilising Microsoft Boogie in this context for multi-threaded HLS by my current PhD student Jianyi Cheng.

Finally, Thierry Moreau from the University of Washington presented his very interesting work on a hardware-software open-source stack for modern deep learning (see the TVM website).

Computer Arithmetic

This session was organised by Miloš Ercegovac from UCLA and Earl Swartzlander from UT Austin. The first talk in this session was from Fredrik Dahlqvist, a postdoc in my group, who spoke about our work together with Rocco Salvia marrying ideas from probabilistic programming with rounding error analysis.

Miloš Ercegovac from UCLA and James Stine from Oklahoma State University looked at how digit iteration techniques for division compare to multiplication-based techniques. Alexander Groszewski and Earl Swartzlander from UT Austin discussed their results from deterministic unary arithmetic inspired by stochastic computing; Keshab Parhi from the audience raised the interesting point of the importance of preservation of temporal structure in specially designed deterministic sequences for purposes of compositionality.

I really enjoyed the unusual talk by Keshab Parhi (U. Minnesota) on Molecular Computing Inspired by Stochastic Logic (see here for more details) via Fractional Coding, building on. If digits are encoded as relative concentrations of molecules, the problem of signal correlation, which tends to take the shine off stochastic computing work, can be avoided. He proposed computation using molecular reaction rates, and showed how to encode values as concentrations of two different molecules; his techniques have been verified in simulation – I would love to see this in a test-tube.

Theory of Deep Learning

This session was organised by Richard Baraniuk and Santiago Segarra (Rice University.)

There was a very enjoyable talk by Alessandro Achille from UCLA on studying deep neural networks from an information-theoretic perspective. He pointed out that real-valued weights appear to contain infinite information, but that by using the principle that small perturbations in weights should not throw-off the classification result completely, we can recover a finite weight encoding. He then moved on to show using a PAC-Bayes bound that good generalisation comes from low weight information. He demonstrated that Stochastic Gradient Descent implicitly minimises Fisher information, but that for generalisation performance, it is Shannon information that should be bounded – he then derived a connection between the two under some conditions.

Tom Goldstein (University of Maryland) gave a stunningly illustrated talk on Understanding Generalization in Neural Nets via Visualization, based on his co-authored paper on the topic. He sought to empirically understand how the continuous piecewise linear functions of modern DNNs, when combined with SGD-based optimisation, lead to functions that generalise well. This was done via a clever process of “poisoning” training data to obtain badly generalising minima.

AI/ML Architectures

This session was organised by Keshab Parhi (University of Minnesota.)

Danny Bankman gave a talk about Stanford’s RRAM-based DNNs. He showed that register-file access accounts for the majority of energy in standard CMOS processor-like architectures, and drew the conclusion that architectures should be “memory-like” in their design, using “conductance-mode arithmetic” with very low precision integer activations, and put the necessary ramp generator for their ADC right inside the RRAM array. Results are verified using SPICE. I know little about RRAM technology, but talking with my colleagues Themis Prodromakis and Tony Kenyon has got me intrigued.

Deep Learning Theory

This session was organised by Tom Goldstein (University of Maryland.)

My favourite talk in this session was by Tom himself, in which he presented an analysis of adversarial attacks in DNNs, again beautifully illustrated – based on his co-authored paper. He showed that due to the high dimensionality of the spaces involved, you are extremely likely to hit – at random – a point in the input space that can be adversarially perturbed. He demonstrated – using the audience as guinea pigs – that adversarial perturbation can also trick humans quite easily on the CIFAR-10 data set. Perhaps my favourite twist on the talk was that he gave the talk wearing an “invisibility cloak” which – if worn – tricks YOLO into not identifying the wearer.

Reflections on Asilomar

I’ve sent PhD students to Asilomar before, but this was the first time I attended myself. It’s a very broad conference, in a beautiful setting. It seems to be a great venue to complement the more technically homogeneous conferences like FPGA which I help to organise – they serve different purposes. Asilomar is a great conference to have your work seen by people who wouldn’t usually follow your work, and to pick up ideas from neighbouring fields.

# Approximating Circuits

Next week, Ilaria Scarabottolo, currently a visiting research student in my research group at Imperial, will present her paper “Partition and Propagate” at DAC 2019 in Las Vegas. In this post, I will provide a brief preview of her work (joint with Giovanni Ansaloni and Laura Pozzi from Lugano and me.)

I’ve been interested in approximation, and how it can be used to save resources, ever since my PhD 20 years ago, where I coined the term “lossy synthesis” to mean the synthesis of a circuit / program where error can be judiciously introduced in order to effect an improvement in performance or silicon area. Recently, this area of research has become known as “approximate computing“, and a bewildering number of ways of approximating behaviour – at the circuit and software level – have been introduced.

Some of the existing approaches for approximate circuit synthesis are point solutions for particular IP cores (e.g. our approximate multiplier work) or involve moving beyond standard digital design methodologies (e.g. our overclocking work.) However, a few pieces of work develop a systematic method for arbitrary circuits, and Ilaria’s work falls into this category.

Essentially, she studies that class of approximation that can be induced solely by removing chunks of a logic circuit, replacing dangling nets with constant values – a technique my co-authors referred to as Circuit Carving in their DATE 2018 paper.

Our DAC paper presents a methodology for bounding the error that can be induced by performing such an operation. Such error can be bounded by exhaustive simulation or SAT, but not for large circuits with many inputs due to scalability concerns. On the other hand, coarse bounds for the error can be derived very quickly. Ilaria’s work neatly explores the space between these two extremes, allowing analysis execution time to be traded for bound quality in a natural way.

Approximation’s time has definitely come, with acceptance in the current era often driven by machine-learning applications, as I explore in a previous blog post. Ilaria’s paper is an interesting and general approach to the circuit-level problem.

# Boolean Circuits are Neural Networks

On Monday, my PhD student Erwei Wang will present our work (joint also with James Davis and Peter Cheung) called LUTNet: Rethinking Inference in FPGA Soft Logic at the IEEE International Symposium on Field-Programmable Custom Computing Machines in San Diego, California.

In this paper, we take a very unusual approach to the design of a deep neural network accelerator in hardware: for us, the nodes in the neural network are Boolean lookup tables.

We were motivated initially by the fact that in very low precision FPGA neural network architectures, lookup tables are often used for arithmetic, but they are often used for very specific functions: while a $K$-LUT is capable of implementing any nonlinear Boolean function with $K$ inputs, it ends up getting used for only a tiny fraction of these $2^{2^K}$ functions. A good example is binarised neural networks (BNNs) such as FINN, where LUTs end up being used to implement XNOR gates (multiplication over $\{-1,+1\}$) and popcount functions. Our research question is therefore: rather than restricting ourselves to these functions, can we make better use of the LUTs by embracing the nonlinearity and the $K$-input support they give us?

We show that this is indeed possible. Our basic approach is to start with a weight-binarised neural network, add inputs to each node to bring them up to $K$ support, and then retrain the Boolean function implemented by that node. Retraining Boolean functions is a bit tricky, of course, because neural network training algorithms are not designed for this purpose. We generate a smooth interpolating function over the LUT entries, allowing us to use standard neural network training software (we use TensorFlow).

The end result is that the re-trained neural network is far more prunable than the original, because the extra inputs to the $K$-LUTs compensate for the removal of other nodes. Thus we end up with a much sparser neural network for the same classification accuracy. The sparsity improves our area by a factor of two or more, yet the more complex inference functions at each node are effectively provided “for free” by the FPGA architecture.

Circuit netlist? Neural network? Same thing!

# Royal Society Discussion Meeting

I was kindly invited to speak at a Royal Society Discussion Meeting before Easter, entitled “Numerical Algorithms for High-Performance Computational Science”, organised by Nick Higham, Laura Grigori and Jack Dongarra. This blog post summarises some of the discussion at the meeting.

I very much enjoyed the format of the meeting: select interesting speakers and allow them 30 minutes to talk about a topic of their choosing related to the theme of the meeting, with 10 full minutes for discussion and in-depth questions after each talk. Posters were also presented by a wide variety of researchers, with each poster presenter given a one-minute lightning-talk slot. Two of my PhD students, Erwei Wang and He Li, took this opportunity. Erwei presented a preview of our LUTNet paper appearing at FCCM very soon (separate blog post to follow), while He presented some of our work on arbitrary precision iterative compute.

Talks by others included:

• David Keyes (KAUST) on the topic “Hierarchical algorithms on hierarchical architectures”. He discussed some very interesting hierarchical low-rank decompositions and also hierarchies of numerical precision.
• Kathy Yelick (Berkeley) spoke on “Antisocial parallelism: avoiding, hiding and managing communication”, a very fruitful area of research in recent years. A few years ago, Abid Rafique, one of my former PhD students (joint with Nachiket Kapre) made use of this work, and it was good to catch up with the current state of research.
• Anna Scaife (Manchester) gave a fascinating insight into the Square Kilometre Array. The sheer volumes of data are mind boggling (zettabytes annually) and pose unique algorithmic challenges.
• Michela Taufer (UTK) discussed molecular dynamics workflows, and how we may be able to harness machine learning to reduce the human bottlenecks in such workflows in the future.
• Rick Stevens (Argonne) gave a very engaging talk about the intersection of machine learning with computational science, exemplified by the Candle project, using deep learning in cancer research. He mentioned many of the emerging architectures for deep learning and their optimisation for low-precision compute. Interested readers may also like our recent survey article on the topic.
• Jack Poulson (Hodge Star) spoke about sampling Determinantal Point Processes, and how links to matrix decomposition algorithms can be used to radically accelerate this process.
• John Shalf (LBNL) spoke about alternative computational models beyond CMOS, new materials for switches, and the growth of hardware specialisation. He proposed the strategy of: hardware-driven algorithm design, algorithm-driven hardware design, and co-design of hardware and algorithm. Having worked in the FPGA community for decades where this has been our mantra, it is great to see hardware specialisation spreading the message of co-design into the HPC community.
• Doug Kothe (ORNL) provided a very interesting insight into exascale computational motifs at the DOE.
• Tony Hey (STFC) set out a compelling argument that academics in applied deep learning should focus on deep learning for scientific data, on the basis that (i) scientific data sets are huge and open and (ii) head-to-head competition with industrial giants on profit-oriented deep-learning applications, without access to their data sets, is a poor choice for academia. I think the same argument could be made for academic computer architecture too. His team are developing benchmarks for scientific machine learning, complementary to MLPerf.
• Erin Carson (Charles University) presented an enchanting talk on iterative linear algebra in low and multiple precision compute. I’m a fan of her earlier work with Nick, and it was great to hear her current thinking and discussion of least-squares iterative refinement.
• Steve Furber (Manchester) spoke about arithmetic in the context of the SpiNNaker machine, and a particular approach they have taken to numerical solution of neural ODEs in fixed-point arithmetic, demonstrating that stochastic rounding can radically improve the quality of their results.
• Tim Palmer (Oxford) argued for low precision compute in weather and climate models, allowing the recouped computational cost to be recycled into better resolution weather models, resulting in higher overall accuracy. This reminded me of the argument I made with my PhD student Antonio Roldão Lopes and collaborator Eric Kerrigan in our paper More FLOPS or More Precision?
• Guillaume Aupy (INRIA) discussed memory-efficient approaches for automatic differentiation and back-propagation.
• Satoshi Matsuoka (RIKEN Centre) took us through the work being done on Post-K, a new Japanese supercomputer being designed to provide compute infrastructure for future workloads at the intersection of big data and AI/ML.
• Mike Heroux (Sandia) spoke about his work developing programming infrastructure for future HPC, in particular for performance portability and for system reliability.

My own talk was entitled “Rethinking Deep Learning: Architectures and Algorithms” – I will save summarising the content for a future blog post. Slides for all these talks will appear on the Manchester Numerical Linear Algebra group website. In addition, each speaker has received an invitation to author an article for a special issue of Philosophical Transactions A – this should be a very interesting read.

I was impressed by the great attendance at the meeting and by the quality of the technical interaction; I met several new and interesting people at the intersection of numerical analysis and scientific computing.

Special thanks to the organisers, Nick, Laura, and Jack for putting an excellent programme together. And congratulations to Jack for the news – a few days after the meeting – of his election to Foreign Member of the Royal Society!