Software Supporting Alignment Algorithms

Software supporting "Algorithms for alignment of mass spectrometry proteomic data" by Neal Jeffries

This website provides R and Perl based programs that implement the algorithms discussed in the paper. A pdf file of the paper is available from the Bioinformatics website here. R is a free statistical programming language available from http://www.r-project.org/ and Perl is a free general purpose language available from the Comprehensive Perl Archive Network -- both are available for Windows, UNIX, and Macintosh environments though the code presented here is Windows based.

Four separate directories are available, each dedicated to a specific issue. One directory provides code to implement the Ciphergen based algorithm, another provides code for data in a more general .csv (comma separated values) format, and a third provides some sample data to work with. The fourth directory addresses the important question of how does one decide which peaks in two spectra should match (i.e. how should a given misaligned spectrum be matched to a reference spectrum). This is important because both algorithms require there be a list of masses in each file that are thought to correspond to one another -- these are the mi and pi discussed in the paper.

Discussion and implementation of the Ciphergen format algorithm is here.
Discussion and implementation of the general format algorithm is here.
Sample data of the general format. Sample data of the Ciphergen xml format.
Discussion and implementation of how to choose corresponding peaks in two spectra is here.
Discussion of the important issue of making sure the calibrants and the mi and pi are chosen appropriately. This is covered somewhat in the paper but more space is devoted to the question here.

After receiving a few requests I have made available the entire data set, here in the form of a zip file. It contains all 44 .xml files as well as 44 .csv files. The data are very raw; no baseline correction is made and the data extend from 0 to 100K Daltons. There is one sample (designated by filenames containing 0171481010011972) in which the .xml file indicates it may have been run on an IMAC2-Cu chip instead of IMAC3. The individual who ran the samples has indicated it was an IMAC3 chip but I include this information as some questions may arise.

Since this paper was first published one individual has asked how these algorithms differ from the alignment function available in Ciphergen Express. As the cost of Ciphergen Express is considerable we did not have the software and were thus unaware of this function. The version of the function I have seen is presented somewhat as a black box -- however is does ask for a reference spectrum and requires a set of points at which the test and reference spectra are to be evaluated. From these facts it appears the algorithm is probably similar to what is proposed in this paper. However, my approach is more flexible (different points can be chosen for each spectrum to be aligned) and is more transparent in that I explain what the algorithm is doing.

The software is provided free of charge and by using it the user acknowledges both the author and his employer are free of legal liability for its possible flaws or failure. Comments and questions may be directed to neal.jeffries@nih.gov.