|
|
|
Tutorial: Linking models to data and experimental predictions via SB-Pipeline
From Q-bio
Arthur Goldsipe, Gerard Ostheimer, Julio Saez-Rodriguez (Massachusetts Institute of Technology)
- Abstract
- The process of constructing and testing models, particularly those that incorporate significant prior knowledge, involves multiple steps that are currently very poorly integrated. We have developed SB-Pipeline to create an effective workflow based on open-source code, public standards and modern software practice. SB-Pipeline is a multi-faceted software platform that pulls together all of the steps involved in collecting and transforming primary data; constructing, annotating and calibrating models; and distributing and sharing simulations and analyses. SB-Pipeline is primarily concerned with data and model management for the purpose of calibration; in existing modeling tools, calibration is an ancillary rather than a primary activity. Conversely, SB-Pipeline does not recreate existing tools for model simulation. Second, SB-Pipeline implements a robust system for tracking the provenance of data, links between data and models, and the origins of model assumptions in data or the literature. Third, SB-Pipeline is a collection of discrete but interoperable software tools, rather than a single integrated system. SB-Pipeline incorporates standard protocols for import and export of data and thus components can be integrated into external packages. In this workshop we will present (including hand-on exercises) three elements of SB-Pipeline: DataRail, CellNetOptimizer and SB-Wiki. Examples will be drawn from quantitative measurements and PCA-PLSR modeling of the cellular response to DNA damage.
- DataRail is an open source MATLAB toolbox for managing, transforming, visualizing, and modeling data, in particular the varied high-throughput data encountered in Systems Biology (Saez-Rodriguez et al., Bioinformatics, 2008). It supports data-driven models, in particular Multiple Linear Regression (MLR) and Partial Least Squares Regression (PLSR). In addition, the data can be exported using a minimal information standard (MIDAS; Minimal Information for Data Analysis in Systems Biology), as well as specific formats for different topology-based modeling platforms, such as the differential-equations-based PottersWheel and the Boolean-based CellNetAnalyzer.
- CellNetOptimizer (CNO) is an add-on to DataRail that allows users to investigate the discrepancies between a particular network topology, defined as a Boolean network, and a set of high-throughput data. Furthermore, CNO identifies the network topology that optimally describes a data set. CNO uses CellNetAnalyzer as an engine to perform discrete simulations and is currently being extended to a continuous description based on a fuzzy logic formalism.
- SB-Wiki is a wiki-based system which utilizes Semantic Web technologies and a lightweight data entry and cataloging framework to support collaborative management of unstructured and semi-structured Systems Biology data. It aims to capture high-level knowledge of models, specifically the set of assumptions necessary for model interpretation that heretofore have not been encoded in any concrete manner (particularly pre-publication), along with lower-level SBML-compliant RDF annotations. SB-Wiki also tracks the experimental data used to train or evaluate models, such as details of the biological setting (species, cell type, growth or culture conditions) and experimental protocol; SB-Wiki integrates well with DataRail on this point, but both tools also support independent use.
- SB-Pipeline resources can be downloaded from http://code.google.com/p/sbpipeline/
Please contact us at sbpipeline@mit.edu if you are planning to attend the course, so we can send you materials and updated information on the tutorial.
- References
- Saez-Rodriguez, J., Goldsipe, A., Muhlich, J., Alexopoulos, L. G., Millard, B., Lauffenburger, D. A., and Sorger, P. K. (2008). Flexible Informatics for Linking Experimental Data to Mathematical Models via DataRail. Bioinformatics. PMID:18218655
Back to The Second q-bio Conference.