Choosing the appropriate distance measure for improving the agglomerative clustering accuracy: amphetamines
Abstract
The aim of this study was to compare various distance measures from the point of view of their effect on the accuracy of the automated detection of stimulant and hallucinogenic amphetamines based on hierarchical cluster analysis. The dendrograms generated by this unsupervised pattern recognition technique disclose the similarities found during each step of the agglomerative clustering procedure. Hence, finding the most appropriate similarity measure is very important, as it influences the discrimination power of the system. The input database comprises the GC-FTIR spectra of the modeled positives, as well as negatives representing various compounds of forensic interest. The dendrograms were determined by using the average linkage algorithm. Six similarity metrics have been compared by using the cophenetic correlation coefficient. The best results have been obtained with the City Block distance.