Module 3: Introduction to K-Means Algorithm and GPairs Algorithm Using Data Parallel Essentials for Python

Praveen Kundurthy and Bob Chesebrough, Intel
Webinar Beginner
Module 3

We will cover K-Means and Gpairs as examples to demonstrate the implementation of these algorithms with live sample code on the Intel DevCloud and/or JLSE.

K-means is a clustering algorithm that partitions observations from a dataset into a requested number of geometric clusters of points closest to the cluster’s own center of mass. Using an initial estimate of the centroids, the algorithm iteratively updates the positions of the centroids until a fixed point. Intel Extension for Scikit-learn* provides an optimized K-Means clustering algorithm.

The Gpairs distance application takes a set of multidimensional points and computes the Euclidean distance between every pair of points. The algorithm Naively counts Npairs(<r), the total number of pairs that are separated by a distance less than r, for each r**2 in the provided input.

Hands-on Agenda

The talk covers how to calculate the above algorithms using the @Numba JIT method and using @kernel decorator.

  • Code walk-thru of writing the @numba.jit implementation and @kernel implementation of K-Means and Gpairs algorithms.
  • Introduce to Intel Extension for Scikit-learn*
  • Code walk-thru of optimized K-Means implementation using Intel® Extension for Scikit-learn*
  • Code walk-thru and visualize the K-Means and Gpairs algorithms using matplotlib.
  • Compile and execute the same algorithms code sample on CPU and GPU offload.