Some DBSCAN advantages:
- DBScan does not require you to know the number of clusters in the data a priori, as opposed to k-means.
- DBScan can find arbitrarily shaped clusters.
- DBScan has a notion of noise.
- DBScan requires just two parameters and is mostly insensitive to the ordering of the points in the database
- DBScan needs to materialize the distance matrix for finding the neighbords. It has a complexity of O((n2-n)/2) since only an upper matrix is needed. Within the distance matrix the nearest neighbors can be detected by selecting a tuple with minimums functions over the rows and columns. Databases solve the neighborhood problem with indexes specifically designed for this type of application. For large scale applications, you cannot afford to materialize the distance matrix
- Finding neighbords is an operation based on distance (generally the Euclidean distance) and the algorithm may find the curse of dimensionality problem
Here you have a DBSCAN code implemented in C++, boost and stl