Software for general format

Software supporting the algorithm for general data

To implement this algorithm it is assumed the investigator has

A .csv file with two columns, the first with putative m/z values and the second column containing associated intensities. It is assumed the file's first row contains header information. This .csv files contains all the mass/intensity data over the entire range of interest, e.g. 2,000 - 100,000 Daltons. The beginning of such a file might look like this:

M/Z	Intensity
2000.21	.02701
2000.68	-.02636
2001.16	-.04542
2001.63	-.00593
2002.10	.08831

The negative intensities are produced by baseline subtraction. An example of a complete file (called alldata.csv) is here.

A second .csv file with two columns containing the mass locations of a small number of peaks in the reference spectrum and the spectrum to be adjusted.

Reference Values (m_i)	Misaligned Values (p_i)
3247.2	3238.8
5510.9	5496.0
7727.9	7708.5
11034.9	11009.5
13831.7	13800.7

The .csv form of this file (called peaks-spline.csv) is here. The content of the header information is not important as long as it is comma-delimited.

R code (called cubic-r-program.r) that reads and processes the information within these first two files.

Output from the R code (alldata-new.csv) is a third .csv file in the same format as the first file with new intensity values for the same m/z values. As the R code is presently written, files 1, 2, and 3 should all be in the same directory. Rather than renaming files in the R code it may be easier to make copies of particular .csv files and rename them alldata.csv and peaks-spline.csv, the names of the files in the R code. The output of the R code is by default named alldata-new.csv -- this could then also be renamed.

If, for some reason, an investigator would like to use a data transformation that is simpler than a cubic spline one alternative might be to try to estimate a constant shift on the time scale (as opposed to the m/z scale). In this instance the same approach of finding m_i and p_i would be performed, but one would use a time of flight values instead of m/z. Once the m_i and p_i are found one can calculate an average difference between the m_i and p_i and use this average to create a shift correction in the misaligned times. It is suggested that the time scale be used because empirically a fixed shift in time leads to a non-linear change in m/z. Though the data are not presented here, this approach yields results that are nearly identical to those obtained with the spline correction. There is some question as to whether such a simple shift structure would be appropriate when differences in spectra arise due to different machines or other factors that may not be captured in a linear shift on the time or m/z scale.