Welcome to the q-bio Summer School and Conference!

Tutorial: Linking models to data and experimental predictions via SB-Pipeline

From Q-bio

Arthur Goldsipe, Gerard Ostheimer, Julio Saez-Rodriguez (Massachusetts Institute of Technology)

Abstract
The process of constructing and testing models, particularly those that incorporate significant prior knowledge, involves multiple steps that are currently very poorly integrated. We have developed SB-Pipeline to create an effective workflow based on open-source code, public standards and modern software practice. SB-Pipeline is a multi-faceted software platform that pulls together all of the steps involved in collecting and transforming primary data; constructing, annotating and calibrating models; and distributing and sharing simulations and analyses. SB-Pipeline is primarily concerned with data and model management for the purpose of calibration; in existing modeling tools, calibration is an ancillary rather than a primary activity. Conversely, SB-Pipeline does not recreate existing tools for model simulation. Second, SB-Pipeline implements a robust system for tracking the provenance of data, links between data and models, and the origins of model assumptions in data or the literature. Third, SB-Pipeline is a collection of discrete but interoperable software tools, rather than a single integrated system. SB-Pipeline incorporates standard protocols for import and export of data and thus components can be integrated into external packages. In this workshop we will present (including hand-on exercises) three elements of SB-Pipeline: DataRail, CellNetOptimizer and SB-Wiki. Examples will be drawn from quantitative measurements and PCA-PLSR modeling of the cellular response to DNA damage.
DataRail is an open source MATLAB toolbox for managing, transforming, visualizing, and modeling data, in particular the varied high-throughput data encountered in Systems Biology (Saez-Rodriguez et al., Bioinformatics, 2008). It supports data-driven models, in particular Multiple Linear Regression (MLR) and Partial Least Squares Regression (PLSR). In addition, the data can be exported using a minimal information standard (MIDAS; Minimal Information for Data Analysis in Systems Biology), as well as specific formats for different topology-based modeling platforms, such as the differential-equations-based PottersWheel and the Boolean-based CellNetAnalyzer.
CellNetOptimizer (CNO) is an add-on to DataRail that allows users to investigate the discrepancies between a particular network topology, defined as a Boolean network, and a set of high-throughput data. Furthermore, CNO identifies the network topology that optimally describes a data set. CNO uses CellNetAnalyzer as an engine to perform discrete simulations and is currently being extended to a continuous description based on a fuzzy logic formalism.
SB-Wiki is a wiki-based system which utilizes Semantic Web technologies and a lightweight data entry and cataloging framework to support collaborative management of unstructured and semi-structured Systems Biology data. It aims to capture high-level knowledge of models, specifically the set of assumptions necessary for model interpretation that heretofore have not been encoded in any concrete manner (particularly pre-publication), along with lower-level SBML-compliant RDF annotations. SB-Wiki also tracks the experimental data used to train or evaluate models, such as details of the biological setting (species, cell type, growth or culture conditions) and experimental protocol; SB-Wiki integrates well with DataRail on this point, but both tools also support independent use.
SB-Pipeline resources can be downloaded from http://code.google.com/p/sbpipeline/

Please contact us at sbpipeline@mit.edu if you are planning to attend the course, so we can send you materials and updated information on the tutorial.

References
  • Saez-Rodriguez, J., Goldsipe, A., Muhlich, J., Alexopoulos, L. G., Millard, B., Lauffenburger, D. A., and Sorger, P. K. (2008). Flexible Informatics for Linking Experimental Data to Mathematical Models via DataRail. Bioinformatics. PMID:18218655


Back to The Second q-bio Conference.