You're unable to read via this Friend Link since it's expired. Learn more
Member-only story
Rethinking GNNs
Over-squashing, Bottlenecks, and Graph Ricci curvature
Over-squashing is a common plight of Graph Neural Networks occurring when message passing fails to propagate information efficiently on the graph. In this post, we discuss how this phenomenon can be understood and remedied through the concept of Ricci curvature, borrowed from the field of differential geometry.

This post was co-authored with Francesco Di Giovanni and Jake Topping and is based on the paper J. Topping, F. Di Giovanni et al., Understanding over-squashing and bottlenecks on graphs via curvature (2022) ICLR, a collaboration between Twitter Cortex and the University of Oxford. This post is part of a series on Graph Neural Networks through the lens of Differential Geometry and Algebraic Topology. See also other posts from the series discussing Neural Diffusion PDEs, cellular sheaves, and topological message passing. For additional details, see a video by Aleksa Gordić explaining the paper and Francesco and Jake’s talk in the Graphs & Geometry reading group.
The majority of Graph Neural Network (GNN) architectures are based on the message-passing paradigm, in which information is propagated between the nodes of the graph along its edges. Traditionally, the input graph is used both as part of the data (along with the node features) as well as the computational structure for information propagation. Recent works showed however that certain graphs in certain situations tend to be “unfriendly” for message passing [1], challenging this paradigm.
More specifically, Message Passing Neural Networks (MPNNs) [2] tend to perform poorly when the learned task requires long-range dependencies (i.e., the output of an MPNN depends on representations of distant nodes interacting with each other) and at the same time, the structure of the graph results in exponentially many long-range neighboring nodes (i.e., the “receptive field” of a node grows exponentially with the radius of the neighbourhood). In such situations, messages coming from non-adjacent nodes need to be propagated and compressed into fixed-size vectors, causing a…