Scalable High Performance Multidimensional Scaling

Seung-Hee Bae
Seminar

Today is so-called data deluge era. A huge amount of data is flooded in many domains of modern society based on the advancements of technologies and social networks. Dimension reduction is a useful tool for data visualization of such high-dimensional data and abstract data to make data analysis feasible for such large-scale high-dimensional or abstract scientific data. Among the known dimension reduction algorithms, multidimensional scaling (MDS) is investigated in my research due to its theoretical robustness and high applicability. For the purpose of large-scale multidimensional scaling, we need to figure out two main challenges. One problem is that large-scale multidimensional scaling requires huge amounts of computation and memory resources, because it requires O(N2) memory and computation. Another problem is that multidimensional scaling is known as a non-linear optimization problem so that it is easy to be trapped in local optima if EM-like hill-climbing approach is used to solve it.

To tackle two challenges mentioned above, we have applied three methodologies to multidimensional scaling: i) parallelization, ii) interpolation, and iii) deterministic annealing (DA) optimization. Parallelization is applied to provide required huge amounts of computation and memory resources by utilizing large-scale distributed-memory systems, such as multicore cluster systems. In addition, we have investigated an interpolation method which utilizes the known mappings of a subset of the given data, named in-sample data, to generate mappings of the remaining out-of-sample data. This approach dramatically reduces computational complexity and memory requirement. DA optimization method has been applied to multidimensional scaling problem in order to avoid local optima. Experimental results illustrate the proposed methodologies are effective to scale up the mapping capacity of multidimensional scaling algorithm and to improve the mapping quality of multidimensional scaling via avoiding local optima.