Combining PCA and agglomerative clustering: a solution for the automatic recognition of amphetamines
Abstract
We are presenting a chemometrical solution built for the automated recognition of amphetamines based on infrared laser spectroscopy. The system has been trained to distinguish stimulant amphetamines and their main precursors (ephedrines) from their hallucinogenic counterparts, as well as from non-amphetamines. The training database is formed by 36 spectra recorded for compounds belonging to all these four categories of substances. The spectra have been recorded in the infrared domain specific to the UT7 quantum cascade laser (QCL) used as source of infrared radiation, i.e. 1550 - 1330 cm-1. The spectra have been preprocessed with a feature weight, wME, in order to enhance the discrimination power of the system. The training database has been subjected to Principal Component Analysis (PCA). The score plots indicate that spectra preprocessing allows the four classes of compounds to form separate clusters. However, as these clusters are relatively close, and PCA does not define cluster boundaries, the PCA scores were then used to perform the classification as such, based on Hierarchical Cluster Analysis. Agglomerative clustering was used in order to build dendrograms, which indicates that the combination of these two methods of unsupervised pattern recognition yields very good correct classification rates. The dendrograms obtained with the PCA scores corresponding to various combinations of the first three principal components (PC) have been evaluated from the point of view of the main classification figures of merit. The classification tree providing the best results is presented in detail.