Expression Package (EXP-PAC)

From Bioinformatics

Jump to: navigation, search

EXP-PAC is a web based software package for upload, management and analysis of gene expression and sequence data. Unique to this package is SQL based querying of gene expression datasets, distributed normalization of raw gene expression data and analysis of gene expression data across species. The EXP-PAC source code is also available which can be hosted on a Windows, Linux or Mac APACHE server connected to a private or public network.

Contents

Operation

The EXP-PAC system combines the features of two pre-existing software packages MammoSapiens[1] and EST-PAC[2]. MammoSapiens is an e-research system for analysis of gene expression data. EST-PAC is a web based software package for sequence storage and analysis. Like it's predecessors, EXP-PAC is compatible with a number of web browsers including Firefox, Internet Explorer and Opera.

sample

The EXP-PAC system consists of a web interface which interacts with data is up-loaded into a mySQL database. Sequence data is stored and analyzed using software pack-ages and databases installed on the web-server such as Blast[3], ESTScan2[4] and HMM[5] . Gene expression data and results from sequence analysis tools can be displayed by using the query engine. A query interface is provided allowing users to search data stored in the EXP-PAC system, hiding the complexities of the query engine language. Using the SQL query engine, stored gene expression data can be filtered and probes matching the specified query are displayed in a report format.

Features

EXP-PAC provides support for MAGE-tab[6], SOFT[7] and Affymetrix gene expression file formats. Comma delimited statistical data for example p-values can be linked to uploaded gene expression data and searched using the query interface. A number of operators are supported by the statistical search interface including; equals, greater then and less then. Nucleotide or protein sequence data can also be uploaded to the EXP-PAC system. The sequence upload interface supports FASTA files or direct FASTA input through a text box.

A number of data management scripts are also provided for uploaded gene expression data. Through this system, users can edit, annotate and delete their uploaded gene expression data. In addition to these features, the EXP-PAC system provides SQL based analysis, parallel normalization and cross species analysis of gene expression data

SQL Analysis

EXP-PAC provides users with a web interface which allows gene expression data to be queried. Using drop down menus and text boxes, a number of gene expression filtering methods are provided including filtering by; fold change, intensity levels, group average, probe ID and annotation.

A tool called Query Builder allows users to create more complicated queries through generic interfaces that map to the SQL language. Using this tool, users create a database view by specifying tables and columns from list boxes. Created database views can be filtered using alphabetical and numeric values and operators. The results from created SQL queries can be saved in a comma delimited text file.

Normalization

EXP-PAC provides web interfaces to execute R and bio-conductor scripts [8]. The EXP-PAC normalization interface allows multiple normalization methods to be applied to a single dataset. Data can be normalized using the mas5 [9], gcrma [10], rma [11], plier [12], invariantset [13] and qspline [14] algorithms. By specifying the location of a sun grid engine scheduler, normalization methods can be distributed over multiple nodes reducing the time taken for the normalization process. In order to view the distribution of normalized data a graphical interface has also been developed. This interface places the intensity of probes into bins and allows dynamic zooming of bins.

Cross-Species Analysis

Cross-Species analysis allows users to query across data from different species. This analysis method creates a map which matches sequence and gene expression probes together using BLAST hits, unigene ID's and the Uniprot database. Once a map has been created, queries can be performed on the joined datasets through the query interface. This method of analysis can provide quick insight into the operation of biological systems such as lactation. When combined with distributed normalization, cross species analysis can be used to locate key probes more accurately. This is achieved by finding probes with high levels of gene expression probes that are consistent over multiple normalization methods.

MAGE-tab and SOFT Export

Data uploaded into the EXP-PAC system can be exported in MAGE-tab and SOFT formats.

Usage

A version of EXP-PAC is hosted by the International Milk Genomic Consortium [1]. This version has been integrated with data from the international milk genomic consortium web portal.

Notes and references

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

  1. 1.0 1.1 Christophe Lefèvre, K.R.N., Amit Kumar, Yvan Strahm, David Powell, Torsten Seemann, Kerry A. and Daly, A.B., Karensa Menzies, Julie Sharp, Matthew Digby (2008) MammoSapiens: eResearch of the lactation program. Building online facilities for collaborative molecular and evolutionary analysis of lactation and other biological systems from gene sequences and gene expression data., eResearch Australasia. Sebel and Citigate Hotels, Albert Park in Melbourne, Australia.
  2. 2.0 2.1 Strahm, Y., Powell, D. and Lefevre, C. (2006) EST-PAC a web package for EST annotation and protein sequence prediction, Source Code Biol. Med., 1, 2.
  3. 3.0 3.1 S. F. Altschul, W.G., W. Miller, E. W. Myers, and D. J. Lipman (1990) Basic Local Alignment Search Tool, J Mol Biol, 215, 8.
  4. 4.0 4.1 Iseli, C., Jongeneel, C.V. and Bucher, P. (1999) ESTScan: A Program for Detecting, Evaluating, and Reconstructing Potential Coding Regions in EST Sequences. Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology. AAAI Press.
  5. 5.0 5.1 Sonnhammer, E., Eddy, S., Birney, E., Bateman, A. and Durbin, R. (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains, Nucl. Acids Res., 26, 320-322.
  6. 6.0 6.1 Rayner, T., Rocca-Serra, P., Spellman, P., Causton, H., Farne, A., Holloway, E., Irizarry, R., Liu, J., Maier, D., Miller, M., Petersen, K., Quackenbush, J., Sherlock, G., Stoeckert, C., White, J., Whetzel, P., Wymore, F., Parkinson, H., Sarkans, U., Ball, C. and Brazma, A. (2006) A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB, BMC Bioinformatics, 7, 489.
  7. 7.0 7.1 Barrett, T., Suzek, T.O., Troup, D.B., Wilhite, S.E., Ngau, W.-C., Ledoux, P., Rudnev, D., Lash, A.E., Fujibuchi, W. and Edgar, R. (2005) NCBI GEO: mining millions of expression profiles--database and tools, Nucl. Acids Res., 33, D562-566.
  8. 8.0 8.1 Gentleman, R., Carey, V., Bates, D., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J. and Zhang, J. (2004) Bioconductor: open software development for computational biology and bioinformatics, Genome Biol., 5, R80.
  9. 9.0 9.1 Hubbell, E., Liu, W.-M. and Mei, R. (2002) Robust estimators for expression analysis, Bioinformatics, 18, 1585-1592.
  10. 10.0 10.1 Wu, Z., Irizarry, R., Gentleman, R., Martinez-Murillo, F. and Spencer, F. A Model-Based Background Adjustment for Oligonucleotide Expression Arrays, J. Am. Stat. Assoc., 99, 909
  11. 11.0 11.1 Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U. and Speed, T.P. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data, Biostat, 4, 249-264.
  12. 12.0 12.1 Affymetrix, I. (2005) Technical note: guide to probe logarithmic intensity error (PLIER) estimation.
  13. 13.0 13.1 Li, C. and Wong, W.H. (2001) Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection, Proc Natl Acad Sci U S A, 98, 31-36.
  14. 14.0 14.1 Workman, C., Jensen, L., Jarmer, H., Berka, R., Gautier, L., Nielser, H., Saxild, H.-H., Nielsen, C., Brunak, S. and Knudsen, S. (2002) A new non-linear normalization method for reducing variability in DNA microarray experiments, Genome Biol., 3, research0048.0041 - research0048.0016.
Personal tools