This week, I attended my first Asilomar Conference on Circuits, Signals and Computers, a very long-running conference series of the IEEE Signal Processing Society, with a very broad range of topics. I decided to attend Asilomar after being invited to give not just one talk, but two, once by my friend and collaborator Miloš Ercegovac from UCLA, and once by my good colleague Zhiru Zhang from Cornell.
No discussion of highlights of Asilomar can go without pointing out the extraordinarily beautiful setting of a conference centre right on Asilomar Beach. I can certainly see why the conference organisers keep coming back year after year – since the 1970s for Miloš and even earlier for my old friend fred harris, who I met there by surprise.
The conference opened with distinguished lecture by Helmut Bölcskei from ETH Zurich, who gave a wonderful talk about the fundamental limits of deep learning. The key results he presented were about neural networks built of linear computational units and ReLU functions, and he showed how they can approximate a range of different functions. I was already familiar with asymptotic results for infinite depth or infinite width networks, but Bölcskei’s results were different – they showed how the approximation quality can be traded against a metric of neural network complexity that captured the number of bits needed to store the topology and the weights of the network. He was able to show the power of such neural networks across an extremely broad class of functions, and to explain how this comes about.
Compilation for Spatial Computing Architectures
This session was organised by Zhiru Zhang from Cornell and Hongbo Rong from Intel. The first talk, given by Yi-Hsiang Lai from Cornell, described the HeteroCL infrastructure, about which I’ve previously blogged in my description of FPGA 2019. Very closely related to this was Hongbo’s own work at Intel Labs, which makes heavy use of polyhedral methods, and work from the systolic array community on affine and uniform recurrence equations.
I then gave a talk about some of the work my research group has been doing over the past 12+ years in analysis of memory access patterns for High-Level Synthesis, taking in my early foundational work in bringing the polyhedral model to HLS with Qiang Liu (now at Tianjin University), our work on Separation Logic in HLS (now also a book by Felix Winterstein, my former PhD student who leads Xelera Technologies), and our recent work on utilising Microsoft Boogie in this context for multi-threaded HLS by my current PhD student Jianyi Cheng.
This session was organised by Miloš Ercegovac from UCLA and Earl Swartzlander from UT Austin. The first talk in this session was from Fredrik Dahlqvist, a postdoc in my group, who spoke about our work together with Rocco Salvia marrying ideas from probabilistic programming with rounding error analysis.
Miloš Ercegovac from UCLA and James Stine from Oklahoma State University looked at how digit iteration techniques for division compare to multiplication-based techniques. Alexander Groszewski and Earl Swartzlander from UT Austin discussed their results from deterministic unary arithmetic inspired by stochastic computing; Keshab Parhi from the audience raised the interesting point of the importance of preservation of temporal structure in specially designed deterministic sequences for purposes of compositionality.
I really enjoyed the unusual talk by Keshab Parhi (U. Minnesota) on Molecular Computing Inspired by Stochastic Logic (see here for more details) via Fractional Coding, building on . If digits are encoded as relative concentrations of molecules, the problem of signal correlation, which tends to take the shine off stochastic computing work, can be avoided. He proposed computation using molecular reaction rates, and showed how to encode values as concentrations of two different molecules; his techniques have been verified in simulation – I would love to see this in a test-tube.
Theory of Deep Learning
There was a very enjoyable talk by Alessandro Achille from UCLA on studying deep neural networks from an information-theoretic perspective. He pointed out that real-valued weights appear to contain infinite information, but that by using the principle that small perturbations in weights should not throw-off the classification result completely, we can recover a finite weight encoding. He then moved on to show using a PAC-Bayes bound that good generalisation comes from low weight information. He demonstrated that Stochastic Gradient Descent implicitly minimises Fisher information, but that for generalisation performance, it is Shannon information that should be bounded – he then derived a connection between the two under some conditions.
Tom Goldstein (University of Maryland) gave a stunningly illustrated talk on Understanding Generalization in Neural Nets via Visualization, based on his co-authored paper on the topic. He sought to empirically understand how the continuous piecewise linear functions of modern DNNs, when combined with SGD-based optimisation, lead to functions that generalise well. This was done via a clever process of “poisoning” training data to obtain badly generalising minima.
This session was organised by Keshab Parhi (University of Minnesota.)
Danny Bankman gave a talk about Stanford’s RRAM-based DNNs. He showed that register-file access accounts for the majority of energy in standard CMOS processor-like architectures, and drew the conclusion that architectures should be “memory-like” in their design, using “conductance-mode arithmetic” with very low precision integer activations, and put the necessary ramp generator for their ADC right inside the RRAM array. Results are verified using SPICE. I know little about RRAM technology, but talking with my colleagues Themis Prodromakis and Tony Kenyon has got me intrigued.
Deep Learning Theory
This session was organised by Tom Goldstein (University of Maryland.)
My favourite talk in this session was by Tom himself, in which he presented an analysis of adversarial attacks in DNNs, again beautifully illustrated – based on his co-authored paper. He showed that due to the high dimensionality of the spaces involved, you are extremely likely to hit – at random – a point in the input space that can be adversarially perturbed. He demonstrated – using the audience as guinea pigs – that adversarial perturbation can also trick humans quite easily on the CIFAR-10 data set. Perhaps my favourite twist on the talk was that he gave the talk wearing an “invisibility cloak” which – if worn – tricks YOLO into not identifying the wearer.
Reflections on Asilomar
I’ve sent PhD students to Asilomar before, but this was the first time I attended myself. It’s a very broad conference, in a beautiful setting. It seems to be a great venue to complement the more technically homogeneous conferences like FPGA which I help to organise – they serve different purposes. Asilomar is a great conference to have your work seen by people who wouldn’t usually follow your work, and to pick up ideas from neighbouring fields.