Mass spectrometry data formats
From Wikipedia, the free encyclopedia
Mass spectrometry is a scientific technique for measuring the mass of ions. It is often coupled to chromatographic techniques such as gas- or liquid chromatography and has found widespread adoption in the fields of analytical chemistry and biochemistry where it can be used to identify and characterize small molecules and proteins (proteomics). The large volume of data produced in a typical mass spectrometry experiment requires that computers be used for data storage and processing. Over the years, different manufacturers of mass spectromters have developed various proprietary data formats for handling such data which makes it difficult for academic scientists to directly manipulate their data. To address this limitation, several open, XML-based data formats have recently been developed by the Trans-Proteomic Pipeline at the Institute for Systems Biology to facilitate data manipulation and innovation in the public sector. These data formats are described here.
Contents |
[edit] mzXML
mzXML is a XML (eXtensible Markup Language) based common file format for proteomics mass spectrometric data.[1][2] Most mass spectrometers do not directly produce mzXML data, but there are several tools available that generate mzXML files from native acquisition files. An open source project known as Sashimi[3] offers a collection of converter programs for some common mass spectrometric file formats. Currently there are converters available at Sashimi for ThermoFinnigan (Xcalibur format using ReAdW), Micromass (MassLynx format using MassWolf) and SCIEX/ABI (SCIEX/ABI Analyst using mzStar). Bruker's free CompassXport tool will nicely[citation needed] generate mzXML (and now mzData) files for many of their native file formats. A java "mzData, mzXML, mzML" converter to all directions is publicly available.[4]
[edit] mzData
The Human Proteome Organization (HUPO) has developed a common file format called mzData which offers similar functionality to mzXML.[5]
[edit] mzML
The existence of the two above standard formats for proteomics data is an undesirable state. Thus, mzData and mzXML developers are currently developing the joint format called mzML.[5][6] As of 2008-06-01, mzML 1.0.0 is ready. This format was officially released at the 2008 American Society for Mass Spectrometry Meeting.[7][8]
[edit] Viewers
Known viewers for mzXML and mzData:
[edit] Converters
Known converters for mzData to mzXML:
- Hermes: Java based application that runs with a graphical user interface, Institute of Molecular Systems Biology, ETH Zurich[15]
- FileConverter: A command line tool that converts to/from various mass spectrometry formats,[16] part of TOPP[17]
Known converters for mzXML:
Known converters for mzML:
- msConvert: A command line tool that converts to/from various mass spectrometry formats; The reference implementation of mzML has been provided by the ProteoWizard project[19].
- ReAdW: The Institute for Systems Biology command line converter for Thermo RAW files, part of the TransProteomicPipeline
Converters for proprietary formats:
- MASSTransit, a software to change data between proprietary formats distributed by Scientific Instrument Services, Inc[20]
[edit] Compressors
- mzSquash: Command line utilities and Java API to compress and uncompress mzML files.[21]
[edit] See also
[edit] References
- ^ Pedrioli PG, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti RH, Apweiler R, Cheung K, Costello CE, Hermjakob H, Huang S, Julian RK, Kapp E, McComb ME, Oliver SG, Omenn G, Paton NW, Simpson R, Smith R, Taylor CF, Zhu W, Aebersold R (2004). "A common open representation of mass spectrometry data and its application to proteomics research". Nat. Biotechnol. 22 (11): 1459–66. doi:. PMID 15529173.
- ^ Lin SM, Zhu L, Winter AQ, Sasinowski M, Kibbe WA (2005). "What is mzXML good for?". Expert review of proteomics 2 (6): 839–45. doi:. PMID 17342793.
- ^ "Sashimi". http://sashimi.sourceforge.net. Retrieved on 2007-10-11.
- ^ Hermes website
- ^ a b Orchard S, Montechi-Palazzi L, Deutsch EW, Binz PA, Jones AR, Paton N, Pizarro A, Creasy DM, Wojcik J, Hermjakob H (2007). "Five years of progress in the Standardization of Proteomics Data 4(th) Annual Spring Workshop of the HUPO-Proteomics Standards Initiative April 23-25, 2007 Ecole Nationale Supérieure (ENS), Lyon, France". Proteomics 7 (19): 3436–40. doi:. PMID 17907277.
- ^ "mzML". http://www.psidev.info/index.php?q=node/257. Retrieved on 2007-10-11.
- ^ "mzML". http://www.psidev.info/index.php?q=node/257. Retrieved on 2008-06-30.
- ^ HUPO-PSI
- ^ Insilicos website
- ^ MS-Spectre website
- ^ OpenMS and TOPP website
- ^ An open source viewer developed under academic projects
- ^ An open source viewer developed by Matt Chambers at Vanderbilt
- ^ An open source viewer developed by at the Fred Hutchinson Cancer Center
- ^ Hermes
- ^ FileConverter
- ^ TOPP
- ^ "mzXML". http://tools.proteomecenter.org/wiki/index.php?title=Formats:mzXML. Retrieved on 2008-06-30.
- ^ "ProteoWizard". http://proteowizard.sourceforge.net. Retrieved on 2008-06-30.
- ^ http://www.sisweb.com/software/masstransit.htm
- ^ mzSquash

