# Validating clusters using the Hopkins statistic

@article{Banerjee2004ValidatingCU, title={Validating clusters using the Hopkins statistic}, author={Amit Banerjee and Rajesh N. Dav{\'e}}, journal={2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542)}, year={2004}, volume={1}, pages={149-153 vol.1} }

A novel scheme for cluster validity using a test for random position hypothesis is proposed. The random position hypothesis is tested against an alternative clustered hypothesis on every cluster produced by a partitioning algorithm. A test statistic such as the well-known Hopkins statistic could be used as a basis to accept or reject the random position hypothesis, which is also the null hypothesis in this case. The Hopkins statistic is known to be a fair estimator of randomness in a data set… Expand

#### 118 Citations

A recursive clustering methodology using a genetic algorithm

- Mathematics, Computer Science
- 2007 IEEE Congress on Evolutionary Computation
- 2007

A recursive clustering scheme that uses a genetic algorithm-based search in a dichotomous partition space for an optimal dichotomy of the dataset and results compare favorably with state of the art approaches in genetic algorithms-driven clustering. Expand

A Hybrid Heuristic with Hopkins Statistic for the Automatic Clustering Problem

- Computer Science
- IEEE Latin America Transactions
- 2019

The Silhouette Index was considered and a new proposed Hybrid Heuristic Algorithm (HHA) operates to identify the ideal number of groups, reflected in substantially lower computational time and in the solutions quality, that are competitive when compared with the best results reported in the literature. Expand

An improved genetic algorithm for robust fuzzy clustering with unknown number of clusters

- Mathematics
- 2010 Annual Meeting of the North American Fuzzy Information Processing Society
- 2010

In this paper the problem of partitioning noisy data when the number of clusters c is not known a priori is revisited. The methodology proposed is a population-based search in the partition space… Expand

The Fuzzy Mega-cluster: Robustifying FCM by Scaling Down Memberships

- Computer Science
- FSKD
- 2005

A new robust clustering scheme based on fuzzy c-means, called the mega-clustering algorithm is shown to be robust against outliers, and its ability to distinguish between true outliers and non-outliers is interesting. Expand

A context-sensitive crossover operator for clustering applications

- Mathematics, Computer Science
- IEEE Congress on Evolutionary Computation
- 2010

A new context-sensitive crossover operator for genetic search based clustering applications that compares relevant sub-regions in partitions represented by the two parents selected for mating, passing on to the child only high fitness sub-Regions in the partition space. Expand

Giving Fuzziness to Spatial Clusters: a New Index for Choosing the Optimal Number of Clusters

- Computer Science
- Int. J. Artif. Intell. Tools
- 2013

A new index for fuzzy clustering is introduced to determine the optimal number of clusters, which is used in the fuzzy c-means algorithm for the geodemographic segmentation of 285 postal codes. Expand

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

- Mathematics, Computer Science
- Pattern Recognit.
- 2019

An extensive comparison of measures of clusterability is performed and guidelines that clustering users can reference to select suitable measures for their applications are provided. Expand

A Comprehensive Comparison of Different Clustering Methods for Reliability Analysis of Microarray Data

- Computer Science, Medicine
- Journal of medical signals and sensors
- 2013

This study investigates the abilities of mixture decomposition schemes and proposes Hopkins statistic as a method for finding the intrinsic ability of a data to be clustered in comparison with other methods in reliability analysis task. Expand

To Cluster, or Not to Cluster: How to Answer theestion

- 2017

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present.… Expand

Using Cluster Ensembles to Identify Psychiatric Patient Subgroups

- Computer Science, Psychology
- AIME
- 2019

This work applies cluster ensemble techniques to the problem of identifying subgroups of psychiatric patients, which have previously been shown to overcome drawbacks of individual clustering algorithms, and introduces a process guide for modelling and evaluating cluster ensembles in the form of a Meta Algorithmic Model. Expand

#### References

SHOWING 1-10 OF 32 REFERENCES

A test for multidimensional clustering tendency

- Mathematics, Computer Science
- Pattern Recognit.
- 1983

The Cox-Lewis statistic leads to one-sided tests for regularity having reasonable power and provides a sharper discrimination between random and clustered data than other statistics. Expand

Cluster validity for fuzzy clustering algorithms

- Mathematics
- 1981

Abstract The proportion exponent is introduced as a measure of the validity of the clustering obtained for a data set using a fuzzy clustering algorithm. It is assumed that the output of an algorithm… Expand

Tests of randomness based on distance methods

- Mathematics
- 1965

The most familiar method of testing the hypothesis that an observed spatial distribution of points in the Euclidean plane is a realization of a Poisson point process, or in practical terminology that… Expand

A Validity Measure for Fuzzy Clustering

- Mathematics, Computer Science
- IEEE Trans. Pattern Anal. Mach. Intell.
- 1991

The authors present a fuzzy validity criterion based on a validity function which identifies compact and separate fuzzy c-partitions without assumptions as to the number of substructures inherent in… Expand

Visual cluster validity (VCV) displays for prototype generator clustering methods

- Mathematics, Computer Science
- The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03.
- 2003

The proposed approach uses intensity images generated from the results of any prototype generator clustering algorithm as a means for cluster validation. Expand

A conditioned distance ratio method for analyzing spatial patterns

- Mathematics
- 1976

SUMMARY A new distance-based method is proposed for investigating the pattern in the plane formed by points, which may be assumed to be the positions of centres of trees in a forest stand. For each… Expand

Cluster Validity for the Fuzzy c-Means Clustering Algorithrm

- Mathematics, Medicine
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- 1982

The uniform data function is a function which assigns to the output of the fuzzy c-means (Fc-M) or fuzzy isodata algorithm a number which measures the quality or validity of the clustering produced… Expand

Quadratic assignment as a general data analysis strategy.

- Mathematics
- 1976

The quadratic assignment paradigm developed in operations research is discussed as a general approach to data analysis tasks characterized by the use of proximity matrices. Data analysis problems are… Expand

Validating fuzzy partitions obtained through c-shells clustering

- Mathematics, Computer Science
- Pattern Recognit. Lett.
- 1996

Validation of fuzzy partitions induced through c-shells clustering is considered, and a new set of indices are shown to be capable of validating the structure characterized by the shell clustering algorithms. Expand

A test for spatial pattern at several scales using data from a grid of contiguous quadrats.

- Computer Science
- 1974

It is concluded that a set of tests, based on randomisation arguments, provides a fully valid method testing simultaneously for pattern at various scales. Expand