索引于
  • 学术期刊数据库
  • 打开 J 门
  • Genamics 期刊搜索
  • 期刊目录
  • 研究圣经
  • 乌尔里希的期刊目录
  • 电子期刊图书馆
  • 参考搜索
  • 哈姆达大学
  • 亚利桑那州EBSCO
  • OCLC-WorldCat
  • 学者指导
  • SWB 在线目录
  • 虚拟生物学图书馆 (vifabio)
  • 普布隆斯
  • 米亚尔
  • 日内瓦医学教育与研究基金会
  • 欧洲酒吧
  • 谷歌学术
分享此页面
期刊传单
Flyer image

抽象的

Comparison of Programmatic Approaches for Efficient Accessing to mzML Files

Miroslaw J. Gilski, and Rovshan G. Sadygov

The Human Proteome Organization (HUPO) Proteomics Standard Initiative has been tasked with developing file formats for storing raw data (mzML) and the results of spectral processing (protein identification and quantification) from proteomics experiments (mzIndentML). In order to fully characterize complex experiments, special data types have been designed. Standardized file formats will promote visualization, validation and dissemination of data independent of the vendor-specific binary data storage files. Innovative programmatic solutions for robust and efficient data access to standardized file formats will contribute to more rapid wide-scale acceptance of these file formats by the proteomics community. In this work, we compare algorithms for accessing spectral data in the mzML file format. As an XML file, mzML files allow efficient parsing of data structures when using XML-specific class types. These classes provide only sequential access to files. However, random access to spectral data is needed in many algorithmic applications for processing proteomics datasets. Here, we demonstrate implementation of memory streams to convert a sequential access into random access. Our application preserves the elegant XML parsing capabilities. Benchmarking file access times in sequential and random access modes show that while for small number of spectra the random access is more time efficient, when retrieving large number of spectra sequential access becomes more efficient. We also provide comparisons to other file accessing methods from academia and industry.