# Book Review: Category Theory for the Sciences

I have been reading Spivak’s book ‘Category Theory for the Sciences’ on-and-off for the last 18 months during vacations as some of my ‘holiday reading.’ I’ve wanted to learn something about category theory for a while, mainly because I feel it’s something that looks fun and different enough to my day-to-day activity not to feel like work. I found Spivak’s book on one of my rather-too-regular bookshop trips a couple of years ago, and since it seemed like a more applied introduction, I thought I’d give it a try. This blog post combines my views on the strengths of this book from the perspective of a mathematically-minded engineer, together with a set of notes on the topic which might be useful for me – and maybe other readers – to refer back to in the future.

The book has a well-defined structure. The first few chapters cover topics that should be familiar to applied mathematicians and engineers: sets, monoids, graphs, orders, etc., but often presented in an unusual way. This sets us up for the middle section of the book which presents basic category theory, showing how the structures previously discussed can be viewed as special cases within a more general framework. The final chapters of the book discuss some topics like monads that can then be explored using the machinery of this framework.

### Set

Chapter 3 covers the category $\mathbf{Set}$ of sets.

One of the definitions here that’s less common in engineering is that of coproduct. The coproduct of sets $X$ and $Y$, denoted $X \sqcup Y$ is defined as the disjoint union of $X$ and $Y$. I first came across disjoint union as a concept when doing some earlier holiday reading on topology. We also here get the first experience of the author’s use of informal ‘slogans’ to capture the ideas present in more formal definitions and theorems: “any time behavior is determined by cases, there is a coproduct involved.” I really like this approach – it’s the kind of marginalia I usually write in books myself, but pre-digested by the author!

The notation $\text{Hom}_{\mathbf{Set}}(X,Y)$ is used for the set of functions from $X$ to $Y$, which we are encouraged to refer to as morphisms, all without really explaining why at this stage – it of course all becomes clear later in the book.

To give a flavour of the kind of exercises provided, we are asked to prove an isomorphism $\text{Hom}_{\text{\bf Set}}(X \sqcup Y, A) \xrightarrow{\cong} \text{Hom}_{\text{\bf Set}}(X, A) \times \text{Hom}_{\text{\bf Set}}(Y, A)$. This reminds me of digital circuit construction: the former corresponds to a time-multiplexed circuit capable of producing outputs from $A$ with inputs from either $X$ or $Y$, together with an identifier for which (to feed a multiplexer select line); the latter corresponds to two circuits producing both the outputs rather than just the selected one.

The fibre product, $X \times_Z Y$, given functions $f : X \to Z$ and $g : Y \to Z$ is defined as the set $X \times_Z Y = \{ (x, z, y) | f(x) = z = g(y)\}$. Any set isomorphic to this fibre product is called a pullback of the diagram $X \xrightarrow{f} Z \xleftarrow{g} Y$.

The fibre sum, $X \sqcup_W Y$, given functions $f : W \to X$ and $g : W \to Y$ is the quotient of $X \sqcup W \sqcup Y$ by the equivalence relation $\sim$ generated by $w \sim f(w)$ and $w \sim g(w)$ for all $w \in W$. A pushout of $X$ and $Y$ over $W$ is any set $Z$ which is isomorphic to $X \sqcup_W Y$.

A good intuitive description in the text is to think of the fibre sum of $X$ and $Y$ as gluing the two sets together, with $W$ as the glue. A nice example from one of the exercises is $W = \mathbb{N}$, $X = \mathbb{Z}$, $Y$ a singleton set, $f : W \to X$ defined by $w \mapsto -(w+1)$. Then we’re left with $X \sqcup_W Y \cong \{ -, 0, 1, \ldots \}$, where $-$ is an element corresponding to the collapsing of all negative elements in $X$ to a single point.

The familiar notions of surjective and injective functions are recast as monomorphisms and epimorphisms, respectively. A function $f : X \to Y$ is a monomorphism if for all sets $A$ and pairs of functions $g, g' : A \to X$, if $fg = fg'$ then $g = g'$. A function $f : X \to Y$ is an epimorphism if for all sets $B$ and pairs of functions $h, h' : Y \to B$, if $hf = h'f$ then $h = h'$. By this time in the book, we get the hint that we’re learning this new language as it will generalise later beyond functions from sets to sets – the chapter ends with various slightly more structured concepts, like set-indexed functions, where the idea is always to define the object and the type of morphism that might make sense for that object.

Monoids, Groups, Graphs, Orders & Databases

In Chapter 4, the reader is introduced to several other structures. With the exception of the databases part, this was familiar material to me, but with slightly unfamiliar presentation.

We learn the classical definition of a monoid and group, and also discuss the free monoid generated by a set $X$ as being the monoid consisting of set $\mbox{List}(X)$ of lists of elements of $X$, with the empty list as the identity and list concatenation as the operation. We discuss the idea of a monoid action on a set $S$, i.e. a function $\circlearrowleft: M \times S \to S$ associated with the monoid $M$ having the properties $e \circlearrowleft s = s$ for any $s \in M$, where $e$ is the identity, and $m \circlearrowleft (n \circlearrowleft s) = (m \star n) \circlearrowleft s$, where $m$ and $n$ are any elements of $M$, $s$ is any element of $S$, and $\star$ is the monoid operation. The discussion of monoids is nice and detailed, and I enjoyed the exercises, especially those leading up to one of the author’s slogans: “A finite state machine is an action of a free monoid on a finite set.”

Graphs are dealt with in a slightly unusual way (for me) as quadruples $G := (V, A, \mbox{src}, \mbox{tgt})$ of a set of vertices $V$, a set of arrows $A$, a functions $\mbox{src}$, $\mbox{tgt}$ mapping each arrow into its source and destination vertex. This corresponds then to what I would call a multigraph, as parallel edges can exist, though it is called a graph in the book.

For each of the structures discussed, we finish by looking at their morphisms.

As an example, let ${\mathcal M} := (M, e, \star)$ and ${\mathcal M'} := (M', e', \star')$ be monoids (the tuple is of the set, identity element, and operation). Then a monoid homomorphism $f: {\mathcal M} \to {\mathcal M'}$ is a function $f: M \to M'$ preserving identity and operation, i.e. (i) $f(e) = e'$ and $f(m_1 \star m_2) = f(m_1) \star' f(m_2)$ for all $m_1, m_2 \in M$.

Similarly, for graphs, graph homomorphisms $f: G \to G'$ for $G := (V, A, \mbox{src}, \mbox{tgt})$ and $G' := (V', A', \mbox{src}', \mbox{tgt}')$ are defined as pairs of functions $f_0: V \to V'$ and $f_1: A \to A'$ preserving the endpoints of arrows, i.e. such that the diagrams below commute.

Order morphisms are similarly defined, this time as those preserving order.

Thus we are introduced to several different (somewhat) familiar mathematical structures, and in each case we’re learning about how to describe morphisms on these structures that somehow preserve the structure.

Categories, Functors, and Natural Transformations

Things really heat up in Chapter 5, where we actually begin to cover the meat of category theory. This is exciting and kept me hooked by the pool. The unusual presentation of certain aspects in previous chapters (e.g. the graphs) all made sense during this chapter.

A category $\mathcal{C}$ is defined as a collection of various things: a set of objects $\text{Ob}(\mathcal{C})$, a set of morphisms $\text{Hom}_{\mathcal{C}}(x,y)$ for every pair of objects $x$ and $y$, a specified identity morphism $\text{id}_x$ for every object $x$, and compositions $\circ: \text{Hom}_{\mathcal{C}}(y,z) \times \text{Hom}_{\mathcal{C}}(x,y) \to \text{Hom}_{\mathcal{C}}(x,z)$. To qualify as a category, such things must satisfy identity laws and also associativity of composition.

Examples of the category of Monoids, of Groups, of Sets, of Graphs, of Pre-Orders, etc., are explored. Isomorphisms between objects in a category are carefully defined.

Functors are introduced as transformations from one category to another, via a function transforming objects and a function transforming morphisms, such that identities and composition and preserved. Functors are then used to rigorously explore the connections between preorders, graphs, and categories, and I finally understood why the author had focused so much on preorders in the previous chapters!

Things started to take a surprising and exciting turn for me when the author shows, via the use of functors, that a monoid is a category with one object (the set of the monoid becomes the morphisms in the category), and that groups can therefore be understood as categories with one object for which each morphism is an isomorphism. This leads to a playful discussion, out of which pops groupoids, amongst other things. Similar gymnastics lead to preorders reconstituted as categories in which hom-sets have one or fewer elements, and graphs reconstituted as functors from a certain two-object category to $\mathbf{Set}$. Propositional logic is re-examined from a categorical perspective, though I suspect this is just scratching the surface of the interplay between category theory and logic, and I’d love to read more about this. For me, this was a highlight – that by playing with these definitions and trying things out, all these wonderful familiar structures arise.

The final part of this chapter deals with natural transformations. Just as functors transform categories to categories, natural transformations transform functors to functors. Specifically, a natural transformation between functor $F : {\mathcal C} \to {\mathcal D}$ and $G : {\mathcal C} \to {\mathcal D}$ is defined as a morphism (in ${\mathcal D}$) $\alpha_X$ for each object $X$ in ${\mathcal C}$, such that the diagram below commutes for every morphism $f : X \to Y$ in ${\mathcal C}$.

I think this section on natural transformations is where the huge number of exercises in the book really came into its own; I doubt very much I would’ve grasped some of the subtleties of natural transformations without them.

We learn about two kinds of composition of natural transformations. The first combines two natural transformations on functors $F,G,H : {\mathcal C} \to {\mathcal D}$, say one $\alpha: F \to G$ and one $\beta: G \to H$ to obtain a natural transformation $\beta \circ \alpha: F \to H$. The other combines a natural transformation $\gamma_1$ between functors $F_1, F_2 : {\mathcal C} \to {\mathcal D}$ with one $\gamma_2$ between functors $G_1, G_2 : {\mathcal D} \to {\mathcal E}$, to produce one, $\gamma_2 \Diamond \gamma_1: G_1 \circ F_1 \to G_2 \circ F_2$.

An equivalence relation ${\mathcal C}' \simeq {\mathcal C}$ – weaker than isomorphism – is introduced on categories as a functor $F : {\mathcal C} \to {\mathcal C}'$ such that there exists a functor $F' : {\mathcal C}' \to {\mathcal C}$ and natural isomorphisms $id_{\mathcal C} \xrightarrow{\cong} F' \circ F$ and $id_{{\mathcal C}'} \xrightarrow{\cong} F \circ F'$. This seemed odd to me — the lack of insistence on isomorphism for equivalence — but the examples all make sense. More importantly it is shown that this requirement is sufficient for the on-homomorphism part of the functor $F$ to be bijective for every fixed pair of objects in ${\mathcal C}$, which feels right. I think my understanding could benefit from further interesting examples of usefully equivalent but non-isomorphic categories though – suggestions welcome!

Limits and Colimits

Chapter 6 is primarily given over to the definitions and discussion of limits and colimits, generalising the ideas in $\mathbf{Set}$ to other categories. The chapter begins by looking again at product and coproduct. As an example, for a category ${\mathcal C}$, with objects $X, Y \in \text{Ob}({\mathcal C})$, a span is defined as a triple $(Z, p, q)$ where $Z$ is an object in ${\mathcal C}$ and $p : Z \to X$ and $q : Z \to Y$ are morphisms in ${\mathcal C}$. A product $X \times Y$ is then defined as as span on $X$ and $Y$ such that for any other span $(Z, p, q)$ on $X$ and $Y$, there exists a unique morphism $t_{p,q}$ such that the following diagram commutes. Again, I rather like the author’s slogan for this: “none shall map to $X$ and $Y$ except through me!”

The chapter then further generalises these ideas to the the ideas of limit and colimit. This section was a step up in abstraction and also one of the least “applied” parts of the book. It has certainly piqued my interest – what are limits and colimits in various structures I might use / encounter in my work? – but I found the examples here fairly limited and abstract compared to the copious ones in previous chapters. It’s in this chapter that universal properties begin to rear their head in a major way, and again I’m still not totally intuitively comfortable with this idea – I can crank the handle and answer the exercises, but I don’t feel like I’m yet at home with the idea of universal properties.

Categories at Work

In the final section of the book, we first study adjunctions. Given categories ${\mathcal B}$ and ${\mathcal A}$, an adjunction is defined as a pair of functors $L: {\mathcal B} \to {\mathcal A}$ and $R: {\mathcal A} \to {\mathcal B}$ together with a natural isomorphism $\alpha$ whose component for $A \in \text{Ob}({\mathcal A})$, $B \in \text{Ob}({\mathcal B})$ is $\alpha_{B,A}: \text{Hom}_{\mathcal A}(L(B), A) \xrightarrow{\cong} \text{Hom}_{\mathcal B}(B, R(A))$. Several preceding results and examples can then be re-expressed in this framework, and understood as examples of adjunctions. Adjuctions are also revisited later in the chapter to show a close link with monads.

Categories of functors are discussed, leading up to a discussion of the Yoneda lemma. This is really interesting stuff, as it seems to cross huge boundaries of abstraction. Consider a category ${\mathcal C}$. Spivak first defines a functor $Y_c := \text{Hom}_{\mathcal{C}}(c, -) : {\mathcal C} \to \mathbf{Set}$ sending objects in ${\mathcal C}$ to the set $\text{Hom}_{\mathcal C}(c,d)$ and similarly for morphisms. This is then used in the following theorem:

Let ${\mathcal C}$ be a category, $c \in \text{Ob}(\mathcal{C})$ be an object, and $I: {\mathcal C} \to \mathbf{Set}$ be a set-valued functor. There is a natural bijection $\text{Hom}_{\mathcal{C}-\mathbf{Set}}(Y_c, I) \xrightarrow{\cong} I(c)$.

This looks super weird: the right-hand side is just a plain old set and yet the left hand-side seems to be some tower of abstraction. I worked through a couple of examples I dreamt up, namely $\mathcal{C} = \mathbf{1}$ (the category with one object) with $I$ mapping that object to an arbitrary set, and $\mathcal{C} = \mathbf{Set}$ with $I = \text{id}_{\mathbf{Set}}$ and convinced myself this works just fine for those two examples. Unfortunately, the general proof is not covered in the book – readers are referred to Mac Lane – and while some discussion is provided regarding links to databases, I felt like I need a proof to properly grasp what’s going on here under the hood.

Monads are then covered. I’ve briefly seen monads in a very practical context of functional programming, specifically for dealing with I/O in Haskell, but I’ve never looked at them from a theoretical perspective before.

A monad on $\mathbf{Set}$ (monads are only defined in the book on $\mathbf{Set}$, for reasons not totally apparent to me) is defined as a triple $(T, \eta, \mu)$ consisting of a functor $T: \mathbf{Set} \to \mathbf{Set}$, a natural transformation $\eta: \text{id}_{\mathbf{Set}} \to T$, and natural transformation $\mu: T \circ T \to T$ such that the following diagrams commute:

I found this discussion enlightening and relatively straight-forward. Firstly, exception monads are discussed, like Maybe in Haskell, and a nice example of wrapping sets and their functions into power sets and their functions, for which I again enjoyed the exercises. Kleisli categories of monads are defined, and it then becomes apparent why this is such a powerful formalism to think about concepts like program state and I/O.

The Kleisli category $\mathbf{Kls}(\mathcal{T})$ of a monad $\mathcal{T} = (T, \eta, \mu)$ on $\mathbf{Set}$ is defined as a category with objects $\text{Ob}(\mathbf{Kls}(\mathcal{T}) = \text{Ob}(\mathbf{Set})$ and morphisms $\text{Hom}_{\mathbf{Kls}(\mathcal{T})}(X,Y) = \text{Hom}_{\mathbf{Set}}(X,T(Y))$. Identity is $\text{id}_X: X \to X$ in $\mathbf{Kls}(\mathcal{T})$ is exactly $\eta_X$ in $\mathbf{Set}$  and composition of morphisms $f: X \to Y$ and $g: Y \to Z$ in $\mathbf{Kls}(\mathcal{T})$ is given by $\mu_z \circ T(g) \circ f$ in $\mathbf{Set}$. You can intuitively see the ‘baggage’ of $T$ being carried through in $\mathbf{Set}$ yet hidden from view in $\mathbf{Kls}$.

Finally, the book ends with a (very) brief discussion of operads. I’ve not come across operads before, and from this brief discussion I’m not really sure what they ‘buy’ me beyond monoidal categories.

General

The exercises are copious and great, especially in the earlier chapters. They kept me engaged throughout and helped me to really understand what I was reading, rather than just think I had. In the later exercises there are many special cases that turn out to be more familiar things. This is both enlightening “oh, an X is just a Y!” and frustrating “yeah, but I want to see Xs that aren’t Ys, otherwise what’s this all heavyweight general machinery for?” I guess this comes from trying to cover a lot of ground in one textbook.

I am struck by just how much mental gymnastics I feel I had to go through to see mathematical structures from the “outside in” as well as from the “inside out”, and how category theory helped me do that. This is a pleasing feeling.

The slogans are great and often amusing, e.g. when describing a functor from the category of Monoids to the category of Sets, “you can use a smartphone as a paperweight.”

The book ends rather abruptly after the brief discussion of operads, with no conclusion. I felt it would be nice to have a summary discussion, and perhaps point out the current directions the field of applied category theory is taking.

Overall, this book has been an enlightening read. It has helped me to see the commonality behind several disparate mathematical structures, and has encouraged me to look for such commonality and relationships in my own future work. Recommended!

# Machine Learning at FPT 2019

Next week, the IEEE International Conference on Field-Programmable Technology (FPT) will take place in Tianjin in China. I’m proud that my former PhD student Qiang Liu will be General Chair of the conference.

I am a coauthor of two papers to be presented at FPT, one led by my former BEng student Aaron Zhao, now a PhD student at Cambridge supervised by my colleague Rob Mullins, and one led by my former postdoc, Ameer Abdelhadi, now with COHESA / UofT. The paper by Aaron is also in collaboration with two of my former PhD students, Xitong Gao, now with the Chinese Academy of Sciences, and Junyi Liu, now with Microsoft Research.

The first paper, led by Aaron, is entitled ‘Automatic Generation of Multi-precision Multi-arithmetic CNN Accelerators for FPGAs’, and can be found on arXiv here. This paper is a serious look at getting an automated CNN flow for FPGAs that makes good use of some of the arithmetic flexibility available on these devices. Powers-of-two (“free” multiplication) and fixed-point (“cheap” multiplication) are both leveraged.

The second paper, led by Ameer, looks at the computation of a set of approximate nearest neighbours. This is useful in a number of machine learning settings, both directly as a non-neural deep learning inference algorithm and indirectly within sophisticated deep learning algorithms like Neural Turing Machines. Ameer has shown that this task can be successfully accelerated in an FPGA design, and explores some interesting ways to parameterise the algorithm to make the most of the hardware, leading to tradeoffs between algorithm accuracy and performance.

If you’re at FPT this year, please go and say hello to Aaron, Ameer and Qiang.