Construction of Prediction Models for Classification of New Psychoactive Substances Based on EI-MS Data and Machine Learning
-
-
Abstract
New psychoactive substances (NPS) have become a global health and social problem. Their structures are variable and can be easily modified to produce new compounds. Traditional analytical techniques mostly rely on standard substances and mass spectrometry databases. The increased structural diversity of NPS makes the mass spectrometry databases be unable to comprehensively cover the mass spectra of all possible NPS, which in turn makes it difficult to perform structural identification of completely unknown compounds. Advances in machine learning have emerged as a potential solution to this dilemma. In this study, the k-nearestneighbor (KNN), support vector machine (SVM), random forests (RF) and artificial neural network (ANN) algorithms were constructed based on a dataset of mass spectra of 871 compounds. The four algorithmic models for identifying new psychoactive substances were used for structural classification prediction. The training and test sets were divided according to the ratio of 7:3, and the fit method was invoked on the training set to construct the model and train the parameters of the model, and the generalization ability of the model was evaluated on the test set. A grid search with 5-fold cross-validation was used to optimize the hyperparameters of the models. The performance of the four classification prediction models was evaluated by using the confusion matrix, accuracy, precision, recall and f-scores for each of the four models for characterizing 261 samples from the test set. Overall, the RF prediction model has the best classification prediction for the seven NPS as well as negative samples, with an overall accuracy of 89.27%, which is higher than the other three classification prediction models. The overall accuracies of the KNN, SVM, and ANN models are 79.31%, 83.14%, and 83.52%, respectively. In addition, the RF prediction model also has high accuracy for the NPS prediction of specific classes, and the accuracies for synthetic cathinones, fentanyl, synthetic cannabinoids, and benzodiazepines are 100%, 93%, 95%, and 100%, respectively, which can warrant good prediction for the structural classes of unknown compounds. In conclusion, this study develops a strategy for rapid analysis of new psychoactive substances using machine learning algorithms based on mass spectral datasets, realizing the classification prediction of structural classes of unknown compounds, thus providing a basis for the structural identification of unknown psychoactive compounds.
-
-