Log on / register
BioMed Central home | Journals A-Z | Feedback | Support | My details
Open AccessResearch article

Assembling proteomics data as a prerequisite for the analysis of large scale experiments

Frank Schmidt1,4 email, Monika Schmid1 email, Bernd Thiede2 email, Klaus-Peter Pleißner3 email, Martina Böhme3 email and Peter R Jungblut1 email

Max Planck Institute for Infection Biology, Core Facility Protein Analysis, Berlin, Germany

The Biotechnology Centre of Oslo, University of Oslo, Oslo, Norway

Max Planck Institute for Infection Biology, Core Facility Bioinformatics, Berlin, Germany

Interfaculty Institute for Genetics and Functional Genomics, University of Greifswald, Greifswald, Germany

author email corresponding author email

Chemistry Central Journal 2009, 3:2doi:10.1186/1752-153X-3-2

Published: 23 January 2009

Abstract

Background

Despite the complete determination of the genome sequence of a huge number of bacteria, their proteomes remain relatively poorly defined. Beside new methods to increase the number of identified proteins new database applications are necessary to store and present results of large- scale proteomics experiments.

Results

In the present study, a database concept has been developed to address these issues and to offer complete information via a web interface. In our concept, the Oracle based data repository system SQL-LIMS plays the central role in the proteomics workflow and was applied to the proteomes of Mycobacterium tuberculosis, Helicobacter pylori, Salmonella typhimurium and protein complexes such as 20S proteasome. Technical operations of our proteomics labs were used as the standard for SQL-LIMS template creation. By means of a Java based data parser, post-processed data of different approaches, such as LC/ESI-MS, MALDI-MS and 2-D gel electrophoresis (2-DE), were stored in SQL-LIMS. A minimum set of the proteomics data were transferred in our public 2D-PAGE database using a Java based interface (Data Transfer Tool) with the requirements of the PEDRo standardization. Furthermore, the stored proteomics data were extractable out of SQL-LIMS via XML.

Conclusion

The Oracle based data repository system SQL-LIMS played the central role in the proteomics workflow concept. Technical operations of our proteomics labs were used as standards for SQL-LIMS templates. Using a Java based parser, post-processed data of different approaches such as LC/ESI-MS, MALDI-MS and 1-DE and 2-DE were stored in SQL-LIMS. Thus, unique data formats of different instruments were unified and stored in SQL-LIMS tables. Moreover, a unique submission identifier allowed fast access to all experimental data. This was the main advantage compared to multi software solutions, especially if personnel fluctuations are high. Moreover, large scale and high-throughput experiments must be managed in a comprehensive repository system such as SQL-LIMS, to query results in a systematic manner. On the other hand, these database systems are expensive and require at least one full time administrator and specialized lab manager. Moreover, the high technical dynamics in proteomics may cause problems to adjust new data formats. To summarize, SQL-LIMS met the requirements of proteomics data handling especially in skilled processes such as gel-electrophoresis or mass spectrometry and fulfilled the PSI standardization criteria. The data transfer into a public domain via DTT facilitated validation of proteomics data. Additionally, evaluation of mass spectra by post-processing using MS-Screener improved the reliability of mass analysis and prevented storage of data junk.


© 1999-2010 Chemistry Central Ltd unless otherwise stated. Part of Springer Science+Business Media.