# Density-Based Spatial Clustering of Applications with Noise¶

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed in [Ester96]. It is a density-based clustering non-parametric algorithm: given a set of observations in some space, it groups together observations that are closely packed together (observations with many nearby neighbors), marking as outliers observations that lie alone in low-density regions (whose nearest neighbors are too far away).

## Details¶

Given the set $$X = \{x_1 = (x_{11}, \ldots, x_{1p}), \ldots, x_n = (x_{n1}, \ldots, x_{np})\}$$ of $$n$$ $$p$$-dimensional feature vectors (further referred as observations), a positive floating-point number epsilon and a positive integer minObservations, the problem is to get clustering assignments for each input observation, based on the definitions below [Ester96]:

core observation

An observation $$x$$ is called core observation if at least minObservations input observations (including $$x$$) are within distance epsilon from observation $$x$$;

directly reachable

An observation $$y$$ is directly reachable from $$x$$ if $$y$$ is within distance epsilon from core observation $$x$$. Observations are only said to be directly reachable from core observations.

reachable

An observation $$y$$ is reachable from an observation $$x$$ if there is a path $$x_1, \ldots, x_m$$ with $$x_1 = x$$ and $$x_m = y$$, where each $$x_{i+1}$$ is directly reachable from $$x_i$$. This implies that all observations on the path must be core observations, with the possible exception of $$y$$.

noise observation

Noise observations are observations that are not reachable from any other observation.

cluster

Two observations $$x$$ and $$y$$ are considered to be in the same cluster if there is a core observation $$z$$, and $$x$$ and $$y$$ are both reachable from $$z$$.

Each cluster gets a unique identifier, an integer number from $$0$$ to $$\text{total number of clusters } – 1$$. Each observation is assigned an identifier of the cluster it belongs to, or $$-1$$ if the observation considered to be a noise observation.

## Computation¶

The following computation modes are available:

## Examples¶

Batch Processing:

Distributed Processing:

Note

There is no support for Java on GPU.

Batch Processing:

Distributed Processing:

Batch Processing:

Batch Processing:

Distributed Processing: