To implement this algorithm it is assumed the investigator has
- A .csv file with two columns, the first with putative m/z values
and the second column containing associated intensities. It is
assumed the file's first row contains header information. This
.csv files contains all the mass/intensity data over the entire range
of interest, e.g. 2,000 - 100,000 Daltons. The beginning of such
a file might look like this:
M/Z
|
Intensity
|
2000.21
|
.02701
|
2000.68
|
-.02636
|
2001.16
|
-.04542
|
2001.63
|
-.00593
|
2002.10 |
.08831
|
The negative intensities are produced by baseline subtraction. An
example of a complete file (called alldata.csv) is here.
- A second .csv file with two columns containing the mass locations
of a small number of peaks in the reference spectrum and the spectrum
to be adjusted.
Reference Values (mi)
|
Misaligned Values (pi)
|
3247.2
|
3238.8
|
5510.9
|
5496.0
|
7727.9
|
7708.5
|
11034.9
|
11009.5
|
13831.7
|
13800.7
|
The .csv form of this file (called peaks-spline.csv) is here.
The content of the header information is not important as long as it is
comma-delimited.
- R code (called cubic-r-program.r)
that reads and processes
the information within these first two files.
Output from the R code (alldata-new.csv) is a
third
.csv file
in the same format as the first file with new intensity values for the
same m/z values. As the R code is presently written, files 1, 2,
and 3 should all be in the same directory. Rather than renaming
files in the R code it may be easier to make copies of particular .csv
files and rename them alldata.csv and peaks-spline.csv, the names of
the files in the R code. The output of the R code is by default
named alldata-new.csv -- this could then also be renamed.
If, for some reason, an investigator would like to use a data
transformation that is simpler than a cubic spline one alternative
might be to try to estimate a constant shift on the time scale (as
opposed to the m/z scale). In this instance the same approach of
finding
mi and
pi would be performed, but one would
use a time of flight values instead of m/z. Once the
mi and
pi
are found one can calculate an average difference between the
mi and
pi
and use this average to create a shift correction in the misaligned
times. It is suggested that the time scale be used because
empirically a fixed shift in time leads to a non-linear change in
m/z. Though the data are not presented here, this approach yields
results that are nearly identical to those obtained with the spline
correction. There is some question as to whether such a simple
shift structure would be appropriate when differences in spectra arise
due to different machines or other factors that may not be captured in
a linear shift on the time or m/z scale.