A new paradigm for probabilistic neuromorphic programming

P. Michael Furlong1* and Chris Eliasmith1

1Centre for Theoretical Neuroscience, University of Waterloo, 200 University Ave., Waterloo, N2L 3G1, Ontario, Canada.

 

*Corresponding author(s). E-mail(s): michael.furlong@uwaterloo.ca;

Contributing authors: celiasmith@uwaterloo.ca;

 

Keywords: probability, Bayesian modelling, vector symbolic architecture, fractional binding, spatial semantic pointers

 

1     Introduction

Since it was first introduced neuromorphic hardware has held the promise of capturing some of the efficiency of biological neural computation, which is 3- 6 orders of magnitude more efficient than engineering solutions. To attempt to capture this efficiency, one key feature of biological neural computation that has been replicated in current neuromorphic hardware is its event-based nature. Neural ‘spikes’ are used to transmit information both in neuromor- phic hardware and many neurobiological systems to minimize long distance communication energy costs.

However, the use of such spiking neural networks (SNNs) brings with it challenges for programming the hardware. While there are a variety of tech- niques for tackling this challenge (Eliasmith and Anderson, 2003; Sussillo and Abbott, 2009; Den`eve and Machens, 2016), none have shown convincingly how the broad and powerful class of probabilistic algorithms can be efficiently real- ized in a spiking neural network. In our recent work, we have been exploring the connection between hyper-dimensional computing (HDC), also known as Vector Symbolic Architectures/Algebras (VSAs), and probabilistic computa- tion in order to solve this problem. When we couple this new approach with our past work on systematically building spiking neural networks using the Neural Engineering Framework (NEF; Eliasmith and Anderson, 2003), a new paradigm for building spiking, probabilistic neural networks arises.

In this brief note, we outline these new methods and describe how they relate HDC to various probabilistic operations. Our particular algebra is based on a VSA known as holographic reduced representations (HRRs; Plate, 1994) but focuses on using it in a continuous manner, giving rise to what are called Spatial Semantic Pointers (SSPs; Komer, 2020; Dumont and Eliasmith, 2020). A more in-depth version of this work can be found in (Furlong and Eliasmith, 2022).

 

2     A Method for Probabilistic Programming

2.1    Preliminaries

Kernel Density Estimators (KDEs) estimate the probability of a query point x based on the average of its similarity to members of a dataset of n observations D = {x1, . . . , xn}. Similarity is measured using kernel func- tions, k(·, ·), which are typically valid density functions. KDEs are defined

fX (x) =  1  “£n        kh (x, xi) for kernel bandwidth h R+.

A problem with KDEs is the memory required to maintain the dataset, D,

which can grow without bound, as does the time to compute a query. Rahimi et al. (2007) addressed this problem for KDEs and other kernel machines with the introduction of Random Fourier Features (RFFs). RFFs project data into vectors so that the dot product between two vectors approximates a kernel function, i.e., k(x, y) ≈ ϕ(x) · ϕ(y). The data projection is computed ϕ(x) = (eiω1x, . . . eiωdx)T , where the frequency components ωi are i.i.d samples from some probability distribution G(ω). The choice of G(ω) determines the kernel induced by the dot product.

With RFFs, linear methods can approximate nonlinear kernel methods. Kernels that can be approximated with RFFs of dimensionality d < n improve the memory and time complexity of querying a KDE from linear in the number of samples (n) to linear in the feature representation dimensionality (d).

2.2    Probabilistic Programming with HDC

Here we describe the particular algebra we use and its mapping to probabilistic representations and computations. We use three operators, bundling, binding, and unbinding, to construct latent probabilistic representations, and a fourth, similarity, to realize quasi-probabilities, which can later be converted to exact probabilities. These are the same operators as used for the HRR VSA (Plate, 1994), but here with a mapping to continuous representations and an imple- mentation in spiking neurons. Manipulating VSA-represented data using these operators constructs and manipulates representations that induce kernels for structured data, generalizing the method of RFFs.

2.2.1   Operators

Binding, ⊛, implemented with circular convolution, is at the core of our approach — in VSAs, binding is used to combine two symbols or state represen- tations together to produce slot-filler pairs, e.g., combining a sensing modality type with a sensor value, or an edge in a graph with the edge’s traversal cost. We employ an extension of binding, called fractional binding (Komer, 2020; Dumont and Eliasmith, 2020; Plate, 1992), to represent data in a continuous domain, X ⊆ Rm, into a high-dimensional vector representation (eq. (1)).

ϕX(x/h) = F1  eX x/h                                     (1)

Where x ∈ X and h is a length scale parameter, as in kernel density estimation, and ΘX are the frequency components of what we refer to as the “axis vector”, which can be selected in a number of different ways. ΘX and h define a “type” representation in the high-dimensional space for the low-dimensional space.

Similarity between two VSA-encoded objects is computed with the vector dot product, ·. For SSPs similarity has a strict mathematical meaning through the connection to RFFs, the dot product between two SSPs approximates a kernel function, like those used in kernel density estimation (Voelker, 2020; Frady et al., 2021). Importantly for probabilistic modelling, depending on the selection of ΘX , the dot product between SSPs will induce different kernel (or similarity) functions on the encoded data, expressed:

k(x, x) ϕX(x/h) · ϕX(x/h).                                         (2)

Bundling is used in VSA literature to construct vectors that represent sets of objects, where similarity between a vector and a bundle gives a mea- sure of membership in the set. We use bundles of fractionally-bound objects to represent a distribution, and when we compute the similarity between a query point encoded as an SSP with a bundle of SSPs, we get a quantity that approximates the probability of the query point. In math:

fˆ(x | D) ϕX

 1

(x/h) ·

nh

I; ϕX

xi∈D

(xi/h)                              (3)

where we can replace the normalized sum  1  “£x ∈D ϕX(xi/h) with a memory vector that represents the dataset, MX,n. This memory can be updated online, allowing for changes in the distribution that reflect the experience of an agent. Depending on the choice of ΘX , similarity can take on negative values. Indeed, in the method used by Furlong and Eliasmith (2022), the dot product between SSPs approximates the quasi-kernel sinc function (Voelker, 2020). Consequently, fˆ(x | D) is not a probability distribution, but the special-case Fourier Integral Estimtor (FIE; Davis, 1975, 1977). However, probabilities can be recovered from FIEs through the correction developed by Glad et al. (2003):

fX (x) max {0, ϕX (x/h) · MX,n ξ} .                                    (4)

The conversion in eq. (4) unveils a connection between the VSA encoding we use and how individual ReLU neurons can be used to model probability.

Unbinding is the inverse of binding, and is implemented by binding with the pseudoinverse of the argument, ϕX(x) ⊛ ϕY(y) ⊛ ϕ1(y) ≈ ϕX(x). In cog- nitive modelling, unbinding can be used to select from bundles a subset where the querying vector matches. We have found that unbinding can be used to condition memory vectors, MXY =   (xi,yi)∈D ϕX(xi) ⊛ ϕY(yi), selecting only those elements where yi ≈ y, that is:

f (x | y, D) ϕX · (MXY ⊛ ⊛ϕ1(y)) .                                    (5)

In addition to the above operations, we have also been able to exploit the sparsity of these representations to construct compositional kernels, as binding multiplies kernels and bundling adds them. Exploiting sparsity is a general feature of HDC. Representations of more structured data, like trajectories, graphs, or trees, constructed in VSAs consequently define kernels for those data objects, making this tool for designing neural network a general probabilistic programming framework.

 

2.2.2   Spiking implementation

While we will not describe the NEF method of implementing these oper- ations in spiking neurons in detail here, this has been done elsewhere at length (Eliasmith and Anderson, 2003; Dumont and Eliasmith, 2020; Furlong and Eliasmith, 2022; Eliasmith, 2013). As well, there is a widely used neural simulator, Nengo (https://nengo.ai) that incorporates the NEF directly. To summarize, the method allows any nonlinear dynamical system defined over any dimensionality of vector space to be embedded in a spiking neural network to a degree of precision determined by the neural resources available. These techniques naturally apply to the operators described above.

 

3     Discussion and Conclusion

The above connection of HDC and VSAs to probability models show that we can understand them as a probabilistic model of computation that unifies analytical models with neural networks, expanding on other approaches to modelling probability in VSAs (e.g., Frady et al., 2021; Joshi et al., 2017). VSA representations are differentiable allowing for integration with standard machine learning methods. More importantly, descriptions of data under VSAs give practitioners, for free, kernel functions that can be used in probabilistic models, using simple linear operations.

While we have only briefly outlined the approach here, we and others have successfully used it to demonstrate that, for example, Bayesian Optimization can be performed much more efficiently that using standard methods while preserving and generally improving the accuracy of the inference on standard benchmark functions (Furlong et al., 2022). As well, the methods have been used for optimal path planning for drones (unpublished), the unification of bio- logical models of simultaneous localization and mapping with a probabilistic framing of the problem (Dumont et al., under review), as well as the coordina- tion of paths across multiple actors (Furlong et al., 2023). This latter example demonstrates that the technique can be used to combine continuous and dis- crete probability spaces. While we have focused on the continuous case because it is newer, the methods we have used are well-established for representing symbol-like structures in spiking neurons and performing inference over them (Eliasmith, 2013; Eliasmith et al., 2012).

While we have not discussed specific neuromorphic implementations, NEF models have a long history of being implemented on a variety of neuromorphic hardware (Bekolay et al., 2014), and Nengo has been used with a number of hardware backends, including Loihi (DeWolf et al., 2020) and SpiNNaker (Davies et al., 2013). As a result, all of the critical elements for compiling from a probabilistic program to neuromorphic hardware executables are in place. We look forward to exploring a wide variety of applications using these methods, allowing us to reap the benefits of low power neuromorphic hardware while realizing the effectiveness of probabilistic computation.

 

References

Eliasmith, C. and Anderson, C.H.: Neural engineering: Computation, repre- sentation, and dynamics in neurobiological systems: MIT press (2003)

Sussillo, D. and Abbott, L.F.: Generating coherent patterns of activity from chaotic neural networks.: Neuron 63(4), 544–57 (2009), URL http://dx.doi. org/10.1016/j.neuron.2009.07.018

Den`eve, S. and Machens, C.K.: Efficient codes and balanced networks: Nature Neuroscience  19(3),  375–382  (2016),  URL  http://dx.doi.org/10.1038/nn.4243

Plate, T.A.: Distributed representations and nested compositional structure: University of Toronto, Department of Computer Science (1994)

Komer, B.: Biologically Inspired Spatial Representation: Ph.D. thesis, Univer- sity of Waterloo (2020)

Dumont, N. and Eliasmith, C.: Accurate representation for spatial cognition using grid cells.: In: CogSci (2020)

Furlong, P.M. and Eliasmith, C.: Fractional binding in vector symbolic archi- tectures as quasi-probability statements: In: Proceedings of the Annual Meeting of the Cognitive Science Society, volume 44 (2022)

Rahimi, A., Recht, B. et al.: Random features for large-scale kernel machines.: In: NIPS, volume 3, 5, Citeseer (2007)

Plate, T.A.: Holographic recurrent networks: Advances in neural information processing systems 5 (1992)

Voelker, A.R.: A short letter on the dot product between rotated fourier transforms: arXiv preprint arXiv:2007.13462 (2020)

Frady, E.P., Kleyko, D., Kymn, C.J., Olshausen, B.A. and Sommer, F.T.: Com- puting on functions using randomized vector representations: arXiv preprint arXiv:2109.03429 (2021)

Davis, K.B.: Mean square error properties of density estimates: The Annals of Statistics 1025–1030 (1975)

Davis, K.B.: Mean integrated square error properties of density estimates: The Annals of Statistics 530–535 (1977)

Glad, I.K., Hjort, N.L. and Ushakov, N.G.: Correction of density estimators that are not densities: Scandinavian Journal of Statistics 30(2), 415–427 (2003)

Eliasmith, C.: How to build a brain: A neural architecture for biological cognition: Oxford University Press (2013)

Joshi, A., Halseth, J.T. and Kanerva, P.: Language geometry using random indexing: In: Quantum Interaction: 10th International Conference, QI 2016, San Francisco, CA, USA, July 20-22, 2016, Revised Selected Papers 10, 265–274, Springer (2017)

Furlong, P.M., Stewart, T.C. and Eliasmith, C.: Fractional binding in vec- tor symbolic representations for efficient mutual information exploration: In: Proc. ICRA Workshop, Towards Curious Robots, Mod. Approaches Intrinsically-Motivated Intell. Behav., 1–5 (2022)

Dumont, N.S.Y., Furlong, P.M., Orchard, J. and Eliasmith, C.: Exploit- ing semantic information in a spiking neural slam system: Frontiers in Neuromorphic Engineering (under review)

Furlong, P.M., Dumont, N.S.Y., Antonova, R., Orchard, J. and Eliasmith, C.: Efficient exploration using hyperdimensional bayesian optimization on trajectory spaces (2023), in submission

Eliasmith, C., Stewart, T.C., Choo, X., Bekolay, T., DeWolf, T., Tang, Y. and Rasmussen, D.: A large-scale model of the functioning brain: science 338(6111), 1202–1205 (2012)

Bekolay, T., Bergstra, J., Hunsberger, E., DeWolf, T., Stewart, T., Rasmussen, D., Choo, X., Voelker, A. and Eliasmith, C.: Nengo: A Python tool for building large-scale functional brain models: Frontiers in Neuroinformatics 7(JAN) (2014), URL http://dx.doi.org/10.3389/fninf.2013.00048

DeWolf, T., Jaworski, P. and Eliasmith, C.: Nengo and Low-Power AI Hard- ware for Robust, Embedded Neurorobotics: Frontiers in Neurorobotics 14 (2020), URL http://dx.doi.org/10.3389/fnbot.2020.568359

Davies, S., Stewart, T., Eliasmith, C. and Furber, S.: Spike-based learning of transfer functions with the SpiNNaker neuromimetic simulator: In: Proceed- ings of the International Joint Conference on Neural Networks (2013), ISBN 9781467361293, URL http://dx.doi.org/10.1109/IJCNN.2013.6706962