Monday, December 9, 2013

On FastMAP: Technique for Dimensionality Reduction

FastMap is a technique for dimensionality reduction.  Consider a dataset consisting of n rows, where each row is a vector of p attributes.  That is, the dimensionality of that dataset is p.  The motivations behind dimensionality reduction are many.  For instance, consider the visualization of such a dataset.  Each row in the dataset can be viewed as a point in p-dimensional space.  But if p is larger than 3, it becomes unworldly to visualize.  Even still, visualization may not explain anything.   Reduction of dimensionality sees more practicality from a computational standpoint for some analysis of the dataset.

FastMap selects a sequence of k of the p dimensions (k <= p), and projects the points from the underlying true-space (p-dimensional space) onto the axes of k-dimensional space.  It was first introduced by [1] in 1995, and identified as an alternative to Multi-dimensional scaling in [4].  FastMap is a generalization of Principal Components Analysis (PCA) [2].

PCA was introduced in 1901 by Pearson [3] as an analog to the principal axes theorem, and later formalized by Hotelling in [2].  PCA can be done by either an eigenvalue decomposition of a covariance matrix from the data, or by a singular value decomposition method of the dataset itself.  The result of PCA is essentially a list of the p-dimensional axes, such that the first principal component (the first axis in the list) has the largest variance along it (accounts for most of the variance in the dataset, and each subsequent component is an orthogonal component with the most accounted variance below the previous.

FastMap employs a procedure similar to PCA.  The theoretical perspective is that each component consists of "pivots" which are the two most extreme points along that axis (remember components are axes).  Given the p-many components, each orthogonal to the previous, it becomes clear that these pivots are the vertices of a convex hull surrounding the dataset.

The FastMap procedure is as follows.  Given a dataset S, and the Euclidean distances between each member in S, we first pick any arbitrary member in S and then look for the member which is furthest away from it.  This is the first of two pivots, call it East.  The second pivot, called West, is the member furthest away from the first pivot.  With these two pivots selected, the projected coordinate (onto the 1-dimensional line) of each member in the dataset is as follows.

x = the member being projected
c = distance from East to West
a = distance from East to x
b = distance from x to West

projection(x) = (a^2 + c^2  - b^2) / (2c).

This projection is based on the law of cosines.  This process can be repeated k times to obtain k dimensions, as each repeat adds elements to the projections.  Each subsequent repeat works implicitly on the projected space as it grows, in which each pair of pivots adds vertices to the shape that surrounds the dataset in its true-space.  In this manner, FastMap is very similar to the vertex growing method of the Simplex Search technique, in which vertices are grown with the intent of converging the convex hull of the Simplex around solutions to the search space.

The connection here for FastMap and Multi-Objective Optimization is as follows.  Note that in the Simplex Search Technique, a simplex is used and the key technique is that only exterior vertices of the simplex are evaluated.  For FastMap to be used within a search technique, only those pivots of the convex hull surrounding the dataset need to be evaluated.

The theory is that if only the convex hull pivots are evaluated, good search results can be attained, but in very few evaluations.  This is the basis of the GALE algorithm for Multi-Objective Optimization.  The validation and evidence to this theory is given in results that say GALE can achieve comparable results to state-of-the-art algorithms such as NSGA-II and SPEA2.

For the reader unfamiliar with GALE: it stands for Geometric Active Learning Evolution.  Whereas standard search tools such as NSGA-II and SPEA2 are well known and extensively referenced in literature, they are "blind" in the sense that their approaches to search involve random perturbations of individuals in the dataset.  The hope and theory for those search tools is that eventually, these random perturbations will lead to better and better solutions, but only after many evaluations.  When the method of evaluation of an individual is complex and untimely, this can become a huge problem.  The advantage of GALE is that it can achieve comparable performance while greatly reducing the number of model evaluations.  In results for standard modeling problems, this speedup factor was between 20 and 89.  (Which means 20-80 evaluations versus 1600-4000 evaluations, for population sizes of MU=100.)  In terms of MU=population size, the expected number of evaluations for NSGA-II and SPEA2 is exactly MU*GENS*2, where GENS = the number of generational iterations of the search tool.  For GALE, this expected value is no greater than GENS*2*log(base2 of MU).



[1] C. Faloutsos and K. Lin, Fastmap: A fast algorithm for indexing, data- mining and visualization of traditional and multimedia datasets, ACM SIGMOD Conference (San Jose, CA), May 1995, pp. 163–174.

[2] Harlod Hotelling, Analysis of a complex of statistical variables into principal components, J. Educ. Psych. 24 (1933), 417–441, 498–520.

[3] Pearson, K. (1901). "On Lines and Planes of Closest Fit to Systems of Points in Space" (PDF). Philosophical Magazine 2 (11): 559–572.

[4] W. S. Torgerson, Multidimensional scaling i: Theory and method,
Psychometrika 17 (1952), 401–419.

[5] George Ostrouchov and Nagiza F. Samatova. 2005. On FastMap and the Convex Hull of Multivariate Data: Toward Fast and Robust Dimension Reduction. IEEE Trans. Pattern Anal. Mach. Intell. 27, 8 (August 2005), 1340-1343. DOI=10.1109/TPAMI.2005.164 http://dx.doi.org/10.1109/TPAMI.2005.164