Mapper – A discrete generalization of the Reeb graph

This is the third of a series of posts on cluster-algorithms and ideas in data analysis.

Mapper is a construction that uses a given cluster-algorithm to associate a simplicial complex to a reference map on a given data set. It was introduced by Carlsson–Mémoli–Singh in [1] and lies at the core of the of the topological data analysis startup Ayasdi. A good reference, I personally enjoyed reading, is [2]. Mapper is closely related to the Reeb graph of a real-valued function on a topological space. Just as the Reeb graph it is able to capture certain topological and geometrical features of the underlying space.

The nerve of a cover

Let $X$ be a topological space. We associate to each open cover $\mathcal{U}=\{U_i\}_I$ a simplicial complex $\check N(\mathcal{U})$ called the nerve of $\mathcal{U}$ defined as follows: there is a vertex $i$ for each $U_i$, and a $k$-simplex spanned by $i_0,…,i_k$ whenever the intersection $U_{i_0} \cap \ldots \cap U_{i_k}$ of the corresponding sets is nonempty.

Reference maps, filters or lenses

Let $f:X \to Y$ be a continuous map between topological spaces $X$ and $Y$. Any open cover $\mathcal{V}$ of $Y$ induces an open cover $f^*\mathcal{V}$ of $X$ obtained by the pullback under $f$, i.e. \[ f^*\mathcal{V} := \big\{ f^{-1}(V) \ | \ {V \in \mathcal{V}} \big\}. \] For example set $Y = \mathbb{R}$ and think of $f$ as a height function on $X$. In the context of the Mapper construction these maps are called filters or lenses.

Cluster functions

For a given space $X$ let $\mathcal{P}(X)$ denote its power set. Then we call an assignment \[ X \leadsto \pi(X)\subset \mathcal{P}(X), \] that associates to a space $X$ a family $\pi(X)$ of subsets of $X$, a cluster function or cluster algorithm — note that this is not really a function in the mathematical sense, but rather in the sense of computer science and programming. We refer to an element in $\pi(X)$ as a cluster. We don’t require $\pi(X)$ to satisfy any properties at that point. From a programming perspective think of it as the signature of a function implementing a certain cluster algorithm.

Given a family of open sets $\mathcal{U}=\{ U_i \}_I$ we define another family $\pi_*(\mathcal{U})$ of open sets by applying $\pi$ to each set $U_i$ and collecting the resulting clusters, i.e. we define \[ \pi_*(\mathcal{U}) := \bigcup_{i \in I} \pi(U_i). \] Finally we have everything in place to define the Mapper-construction.

Put it all together

Let $X$ be a topological space (the data set), $\pi$ be a cluster algorithm, $f:X \to Y$ a reference map to a topological space $Y$, and $\mathcal{V}$ an open cover of $Y$. Then the result of Mapper applied to this triple is the simplicial complex $\mathcal{M}(\pi, f, \mathcal{V})$ defined by \[ \mathcal{M}(\pi, f, \mathcal{V}) := \check{N} \big( \pi_*(f^*\mathcal{V}) \big). \]

References

[1] G. Carlsson, F. Mémoli and G. Singh, Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition, Eurographics Symposium on Point Based Graphics (2007), pp. 91–100.

[2] G. Carlsson, Topology and data, Bull. Amer. Math. Soc. 46 (2009), pp. 255–308.

Google+LinkedInTwitterFacebookShare