# Mapper – A discrete generalization of the Reeb graph

This is the third of a series of posts on cluster-algorithms and ideas in data analysis.

Mapper is a construction that uses a given cluster-algorithm to associate a simplicial complex to a reference map on a given data set. It was introduced by Carlsson–Mémoli–Singh in  and lies at the core of the of the topological data analysis startup Ayasdi. A good reference, I personally enjoyed reading, is . Mapper is closely related to the Reeb graph of a real-valued function on a topological space. Just as the Reeb graph it is able to capture certain topological and geometrical features of the underlying space.

### The nerve of a cover

Let $X$ be a topological space. We associate to each open cover $\mathcal{U}=\{U_i\}_I$ a simplicial complex $\check N(\mathcal{U})$ called the nerve of $\mathcal{U}$ defined as follows: there is a vertex $i$ for each $U_i$, and a $k$-simplex spanned by $i_0,…,i_k$ whenever the intersection $U_{i_0} \cap \ldots \cap U_{i_k}$ of the corresponding sets is nonempty.

### Reference maps, filters or lenses

Let $f:X \to Y$ be a continuous map between topological spaces $X$ and $Y$. Any open cover $\mathcal{V}$ of $Y$ induces an open cover $f^*\mathcal{V}$ of $X$ obtained by the pullback under $f$, i.e. $f^*\mathcal{V} := \big\{ f^{-1}(V) \ | \ {V \in \mathcal{V}} \big\}.$ For example set $Y = \mathbb{R}$ and think of $f$ as a height function on $X$. In the context of the Mapper construction these maps are called filters or lenses.

### Cluster functions

For a given space $X$ let $\mathcal{P}(X)$ denote its power set. Then we call an assignment $X \leadsto \pi(X)\subset \mathcal{P}(X),$ that associates to a space $X$ a family $\pi(X)$ of subsets of $X$, a cluster function or cluster algorithm — note that this is not really a function in the mathematical sense, but rather in the sense of computer science and programming. We refer to an element in $\pi(X)$ as a cluster. We don’t require $\pi(X)$ to satisfy any properties at that point. From a programming perspective think of it as the signature of a function implementing a certain cluster algorithm.

Given a family of open sets $\mathcal{U}=\{ U_i \}_I$ we define another family $\pi_*(\mathcal{U})$ of open sets by applying $\pi$ to each set $U_i$ and collecting the resulting clusters, i.e. we define $\pi_*(\mathcal{U}) := \bigcup_{i \in I} \pi(U_i).$ Finally we have everything in place to define the Mapper-construction.

### Put it all together

Let $X$ be a topological space (the data set), $\pi$ be a cluster algorithm, $f:X \to Y$ a reference map to a topological space $Y$, and $\mathcal{V}$ an open cover of $Y$. Then the result of Mapper applied to this triple is the simplicial complex $\mathcal{M}(\pi, f, \mathcal{V})$ defined by $\mathcal{M}(\pi, f, \mathcal{V}) := \check{N} \big( \pi_*(f^*\mathcal{V}) \big).$

### References

 G. Carlsson, F. Mémoli and G. Singh, Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition, Eurographics Symposium on Point Based Graphics (2007), pp. 91–100.

 G. Carlsson, Topology and data, Bull. Amer. Math. Soc. 46 (2009), pp. 255–308.