Software supporting "Algorithms for alignment of mass spectrometry proteomic data" by Neal Jeffries
This website provides R and Perl based programs
that implement the algorithms discussed in the paper. A pdf
file of the paper is available from the Bioinformatics website
here.
R is a free statistical programming language available from
http://www.r-project.org/ and
Perl is a free general purpose language available from the
Comprehensive Perl Archive Network
-- both are available for Windows, UNIX, and Macintosh environments
though the code presented here is Windows based.
Four
separate directories are available, each dedicated to a specific
issue. One directory provides code to implement the Ciphergen
based algorithm, another provides code for data in a more general
.csv (comma separated values) format, and a third provides some
sample data to work with. The fourth directory addresses the
important question of how does one decide which peaks in two spectra
should match (i.e. how should a given misaligned spectrum be matched
to a reference spectrum). This is important because both
algorithms require there be a list of masses in each file that are
thought to correspond to one another -- these are the mi and
pi discussed in the paper.
Discussion and implementation of the Ciphergen format algorithm is here.
Discussion and implementation of the general format algorithm is here.
Sample data of the general format. Sample data of the Ciphergen xml format.
Discussion and implementation of how to choose corresponding peaks in two spectra is here.
Discussion of the important issue of making sure the calibrants and the mi and pi are chosen appropriately. This is covered somewhat in the paper but more space is devoted to the question here.
After receiving a few requests I have made available the entire data set, here in the form of a zip file. It contains all 44 .xml files as well as 44 .csv files. The data are very raw; no baseline correction is made and the data extend from 0 to 100K Daltons. There is one sample (designated by filenames containing 0171481010011972) in which the .xml file indicates it may have been run on an IMAC2-Cu chip instead of IMAC3. The individual who ran the samples has indicated it was an IMAC3 chip but I include this information as some questions may arise.
Since this paper was first published one individual has asked how these algorithms differ from the alignment function available in Ciphergen Express. As the cost of Ciphergen Express is considerable we did not have the software and were thus unaware of this function. The version of the function I have seen is presented somewhat as a black box -- however is does ask for a reference spectrum and requires a set of points at which the test and reference spectra are to be evaluated. From these facts it appears the algorithm is probably similar to what is proposed in this paper. However, my approach is more flexible (different points can be chosen for each spectrum to be aligned) and is more transparent in that I explain what the algorithm is doing.
The software is provided free of charge and by using it the user acknowledges both the author and his employer are free of legal liability for its possible flaws or failure. Comments and questions may be directed to neal.jeffries@nih.gov.