Wiley Registry of Tandem Mass Spectral Data, MSforID

MS-based identification of small molecules

The goal of any screening analysis is providing comprehensive information on the chemical composition of a sample. By using adequate analytical strategies the presence of as many as possible compounds should be confirmed. Confirmation is a two-step process that involves detection and identification.

Detection can be seen as the collection of compound-specific data. For LC/MS, collected data may include retention, m/z-values of molecular ions and derived fragment ions as well as relative abundances of fragment ions and isotopologues.

Identification is the process of proving that a compound is the very same that it is alleged or reputed to be. True identity is collected from a number of signs. Identification is accomplished by comparing measured data sets. One set of features is obtained from the analysis of an unknown compound; the other one represents a reference standard of known identity. In this context, a challenging task is the definition of objective measures of identity. Proper settings may reduce the number of false positive identifications and false negative identifications to a minimum, ideally to zero.

To enable definitive identification, the availability of a sufficient amount of specific analytical data is obligatory. Like a fingerprint, this data should represent a unique identifier excluding all other chemical entities from being the compound analyzed. For LC/MS, such a chemical fingerprint can be created in silico (e.g. m/z-values of molecular ions, relative abundances of fragment ions and isotopologues) and/or by analyzing reference standards (e.g. m/z-values of fragment ions). Fingerprints are often stored in databases.

Tandem mass spectral databases

Tandem mass spectral database are indispensable tools for compound annotation in non-targeted LC-MS workflows. Several reviews are available in which progress in development and application of tandem mass spectral databases have been reviewed.

A tandem mass spectral database represents an organized collection of tandem mass spectral data which comes bundled with management systems. The database management system is a software application that interacts with the user, other applications, and the database itself to capture and analyze data. The mass spectrometric data is often accompanied by metadata.

Traditionally, MS/MS databases are acquired by analysis of reference standards. State-of-the-art databases include sets of compound-specific spectra that were acquired by applying different collisions energy settings as well as different instruments. Usually, obtained spectral information is processed prior to storage in a library. Curation efforts may include manual inspection of mass spectra by experienced mass spectrometrists, noise and artifact removal, recalibration of spectra and peak annotations, as well as inter-library comparisons.

There is a vital discussion about robustness and transferability of tandem mass spectral libraries. For a long time, the predominant opinion was that due to the limited reproducibility of tandem mass spectra, libraries will only be useful on the instrument used to acquire reference spectra. Situation has changed. Databases combining advanced library designs with tailor-made search algorithms have been shown to enable reliable compound identification with spectra acquired in different labs with various instruments and diverse instrumental settings.

Wiley Registry of Tandem Mass Spectral Data, MSforID

The published version of the WRTMD contains >10,000 spectra of ~1,200 compounds, mainly pharmaceutical compounds, illicit drugs and metabolites thereof. Therefore, the most important fields of application of the library are forensic toxicology, environmental analysis, and clinical toxicology.

Our database is the most extensively tested tandem mass spectral library available. Studies performed include cross-validation with other tandem mass spectral libraries, library search with data extracted from literature, as well as several multicenter studies covering different types of instruments, including QqQ, IT, LIT, QqLIT, QqTOF, LIT-Orbitrap and LIT-FTICR. Our database was found to be more sensitive, specific, robust and transferable than competitive tandem mass spectral libraries. Thus, the WRTMD can be regarded as the “gold standard“ for benchmarking library performance.


Österreichische Forschungsförderungsgesellschaft: dnatox – Die Kopplung der Flüssigkeitschromatographie mit der Massenspektrometrie als Werkzeug für die Toxin- und DNA-Analytik, KIRAS PL 2 Projekt 813786, 2008-2009.