Welcome to the q-bio Summer School and Conference!

The Seventh q-bio Summer School: Cancer

From Q-bio

In this theme we will address a number of biological and mathematical issues related to modeling of evolution of cancer (using the example of leukemias), organized in three core lectures, which will cover the fundamental issues of cell proliferation and mutation dynamics, molecular events affecting specific pathways in cells and the population genetics effects (see the abstracts further on)

This section of the summer school will include a number of instructor-suggested group projects, in which students will apply various numerical techniques to formulate, identify and solve stochastic models of cancer evolution. Students will then apply these tools to model experimental and clinical data. This section of the summer school is organized by Marek Kimmel. Please address all questions about this section of the summer school to its organizer.

Core Instructors

Course-specific Instructors

  • Alexandra Jilkine, Arizona State
  • Rosemary Braun, Northwestern

Core lectures

Stochastic models of proliferation of normal and leukemic cell clones

Marek Kimmel, Department of Statistics, Rice University

Importance of nondeterministic mechanisms in proliferation of normal and cancer cells has been recognized since a very long time. An area of applied probability, the theory of branching processes, has been influenced by attempts to model phenomena such as numerical growth, mutation and extinction of cells. This lecture will review such models, starting from the simplest Galton-Watson process, but then continuing to multistage mutation models, which model evolution of cancer cells under therapy. Finally, we introduce more sophisticated models, involving uneven segregation of genetic and other contents in progeny cells. Examples will be presented of experimental evidence for existence and importance of stochastic effects.

Molecular circuitry of granulocyte differentiation and its alterations leading to leukemia

Seth J. Corey, Department of Hematology, Northwestern University

Hematopoiesis provides the best-characterized system for cell fate decision-making in both health and disease. Yet, the precise roles of external cues, intracellular signaling, gene regulatory networks, and homeostatic mechanisms have been elusive because of their complexity and inobservability. The shortest-lived blood cell, the granulocyte is absolutely essential for host defense and survival. Its pathophysiological importance is apparent in severe congenital neutropenia (SCN). Life-threatening infections in children with SCN can be avoided through the use of recombinant granulocyte colony-stimulating factor (GCSF). However, SCN often transforms into secondary myelodysplastic syndrome (sMDS) or secondary acute myeloid leukemia (sAML). A great unresolved clinical question is do whether chronic, pharmacologic doses of GCSF contribute to this transformation. Two major sets of human clinical and experimental data strongly suggest such a linkage, whereas none had been predicted in mouse models. Firstly, a number of epidemiological clinical trials have demonstrated a strong association between exposure to GCSF and sMDS/sAML. Secondly, mutations in the distal domain of the GCSF Receptor (GCSFR) have been isolated from patients with SCN who developed sMDS/sAML or patients with de novo MDS. Most recently, clonal evolution over 20 years was documented in a patient with SCN who developed sMDS/sAML. What is particularly striking is that out of five different mutations arose arisen in the GCSFR gene, some persisted into the AML clone but others were lost during the course. The lecture will review the process of SCN → MDS → AML transitions in detail, discussing recent work and focusing on conclusions, which may be relevant to a broader set of hematological disorders.

Population genetics of driver and passenger mutations in leukemic stem cells

Cristian Tomasetti, Department of Biostatistics, Harvard University

Important progress has been made in our understanding of cancer thanks to the ever growing amount of data originated by sequencing technologies. One useful approach for better understanding the process of accumulation of somatic mutations in cancer is given by the integration of mathematical modeling with sequencing data of cancer tissues. In this lecture we will review some results in stochastic processes and then formulate and analyze a new mathematical model for the evolution of somatic mutations before and during cancer progression, that is, all relevant phases of a tissue's history will be considered. The model will provide a way to estimate the in-vivo tissue-specific somatic mutation rates from the sequencing data of tumors. The model will also give us novel predictions on the expected number of somatic mutations found in tumors of self-renewing tissues. These results will be then compared and validated by the empirical findings which will be presented. Moreover, using these results, we will also analyze the dynamics of drug resistance in chronic myeloid leukemia, shedding light on some of the general principles behind this phenomenon. Overall, the material taught in this class has substantial implications for the interpretation of the large number of genome-wide cancer studies now being undertaken.

Course-specific lectures

Models of genetic and phenotypic heterogeneity in cancer growth and progression

Marek Kimmel, Department of Statistics, Rice University

This course is still under development. However, it is likely to include some information about evolution of variability of cancer in populations, but also about statistical methods of gauging this variability, based on epidemiological, genetic and genomic data.

Effects of De-differentiation on Waiting Time to Carcinogenesis

Alexandra Jilkine, University of Arizona

Accumulating evidence suggests that many tumors have a hierarchical organization, with the bulk of the tumor composed of relatively differentiated short-lived progenitor cells that are maintained by a small population of cancer stem cells that have the capacity to proliferate indefinitely. It is unclear, however, whether cancer stem cells originate from normal stem cells or from de-differentiated progenitor cells. To address this, we mathematically modeled the effect of de-differentiation on carcinogenesis. We considered a hybrid stochastic-deterministic model of mutation accumulation in both stem cells and progenitors, including de-differentiation of progenitor cells to a stem cell-like state. We performed exact computer simulations of the emergence of tumor subpopulations with k mutations, and we derived semianalytical estimates for the waiting time distribution to fixation. If the stem cell population size is held strictly constant, we found that de-differentiation acts like a positive selective force in the stem cell population and thus speeds carcinogenesis. If the stem cell population size allowed varying stochastically with density-dependent reproduction rates, we found that de-differentiation beyond a critical threshold leads to exponential growth of the stem cell population, even if density-dependent reproduction rates are maintained in the stem cell population. Our results suggest that de-differentiation may play an important role in carcinogenesis. This role, however, depends on how stem cell homeostasis is maintained. Thus, the common modeling assumption of constant stem cell population size may not be adequate, and further progress in understanding carcinogenesis demands a more detailed mechanistic understanding of stem cell homeostasis.

Network Inference and Analysis in Cancer Systems Biology

Rosemary Braun, Northwestern University

The cellular proliferation, migration, and invasion characteristics that are the hallmarks of cancer are due to aberrant signaling in the regulatory networks that ordinarily control growth and apoptosis. These pathways can be compromised in a variety of ways, both in terms of the affected molecules and in terms of the mechanism (eg, by mutation or by altered transcription). Today, modern high-throughput assays yield genome-wide profiles of sequence variation, transcription factor binding, methylation, and expression for each sample of interest, and this exquisitely detailed information provides an unprecedented opportunity to characterize the molecular mechanisms governing malignant transformation. At the same time, the high dimensionality of the data presents analytical challenges.

Mathematical models of regulatory networks are essential for identifying pathological signaling processes in cancer cells. In this lecture, we will discuss various approaches for the systems-level analysis of high-throughput data. This lecture will be divided into two parts. In the first, we will discuss methods to analyze experimental data in the context of networks derived from expert-knowledge pathway databases (eg, http://pid.nci.nih.gov, http://cancer.cellmap.org/cellmap). We will discuss both exploratory techniques (network visualization and summarization) and statistical analyses to make predictions for functional experimental validation. In the second, we will discuss methods for reconstructing regulatory network structure from experimental data, including graph-theoretic (SPaTO, PDM) and information-theoretic (ARACNe) network inference techniques.


Project 1 (MK). Which cancer mutations are “real”?

In this project the focus is on possible approaches to the problem of finding mutations in cancer, which not only accompany carcinogenesis, but which cause it (the “real” cancer mutations). There exist several approaches that can be used. One of them is the theory of the driver and passenger mutations, which is the subject of a separate project. Another is filtration of variants identified in the germline or tumor cells of a patient. Two important elements of the filtration process are: (i) minor allele frequency (MAF) filter, working on the principle that variants common in the population are probably not causing cancer, and (ii) mutation functionality filter, working mostly using the principle of evolutionary conservation, by which only variants of sites normally conserved in evolution are likely to be deleterious (in this case, causing cancer) (see. eg. Hicks et al. 2011). However, particularly in (ii) where a number of algorithms have been developed, there is a lot of ambiguity of how to interpret predictions of one algorithm versus another. An attempt to tackle this issue in a statistically rigorous manner has been developed by Hicks et al. (2012).

Recently, a new approach has been developed which attempts to show that this problem stems largely from mutational heterogeneity (Lawrence et al. 2013). The authors provide a novel analytical methodology, MutSigCV, for resolving the problem. They applyMutSigCV to exome sequences from 3,083 tumour–normal pairs and discover extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology. By incorporating mutational heterogeneity into the analyses, it is claimed that MutSigCV is able to eliminate most of the apparent artefactual findings and enable the identification of genes truly associated with cancer.

Work in this project may be based on either of these two approaches. Methodology developed by Hicks et al. (2012) is extendable from binary classification (neutral vs. deleterious variants) to continuous indices, using a more refined statistical model. Another possibility is to compare this method to MutSigCV, or apply one or both to an unexplored data set. Variations on any of these topics are invited.


Hicks, Stephanie, David A. Wheeler, Sharon E. Plon, and Marek Kimmel. "Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed." Human mutation 32, no. 6 (2011): 661-668.

Hicks, Stephanie, Sharon E. Plon, and Marek Kimmel. "Bernoulli mixture models in application to the evaluation of algorithms estimating functionality of missense mutations." In BMC Proceedings, vol. 6, no. Suppl 6, p. P15. BioMed Central, 2012.

Lawrence, Michael S., Petar Stojanov, Paz Polak, Gregory V. Kryukov, Kristian Cibulskis, Andrey Sivachenko, Scott L. Carter et al. "Mutational heterogeneity in cancer and the search for new cancer-associated genes." Nature (2013)

Project 2 (MK). Identifying passenger and driver mutations

Passenger and driver mutations are concepts developed to understand evolution of populations of cancer cells. In a broad outline, driver mutations are under positive selection, i.e., their presence enhances tumor cell fitness, whereas the passenger mutations are neutral, i.e., they do not change the fitness, but accumulate at a constant rate. In a way, passenger mutations serve as a molecular clock of cancer development. Mathematically, this concept has been explored among other by Bozic et al. (2010). Tests for distinguishing driver from passenger mutations have been pioneered by Torkamani and Schork (2009) and more recently in a much more mathematical manner (based on stochastic processes theory) in an unpublished work by Wiuf (2012). However, characterization of driver mutations remains a serious problem. In the project it is suggested to either apply known methods or suggest and apply a new one.


Bozic, Ivana, Tibor Antal, Hisashi Ohtsuki, Hannah Carter, Dewey Kim, Sining Chen, Rachel Karchin, Kenneth W. Kinzler, Bert Vogelstein, and Martin A. Nowak. "Accumulation of driver and passenger mutations during tumor progression." Proceedings of the National Academy of Sciences 107, no. 43 (2010): 18545-18550.

Torkamani, Ali, and Nicholas J. Schork. "Identification of rare cancer driver mutations by network reconstruction." Genome research 19, no. 9 (2009): 1570-1578.

Project 3 (AJ). Further exploration of the cancer stem cell hypothesis.

Develop a model that explores how CSCs can stochastically change states between non--‐dividing, quiescent cells and proliferating cells. The proliferating cells can differentiate into other types of cancer cells that can divide a given number of times before dying. Consider cases where the dynamics of the cancer stem cell population is independent of the differentiated cells, as well as cases where the probabilities of transitioning between quiescent and proliferating states depend on the size of the tumor. Consider both inhibitory and stimulatory signals from the tumor. Suppose a drug targets only the rapidly dividing cells. What will happen to the proportion of CSCs in the tumor over time? How long before emergence of drug resistance in the proliferating population?


Gupta et al. Stochastic state transitions give rise to phenotypic equilibrium in populations of cancer cells. Cell. 2011 Aug 19;146(4):633--‐44.

Li L, Clevers H. Coexistence of quiescent and active adult stem cells in mammals. Science. 2010 Jan 29;327(5965):542--‐5.

Project 4 (AJ). Spatial extensions of population genetics cancer models.

Consider spatial generalizations of both Moran and branching process models of cancer initiation, where cells also have a spatial poisition. Consider effects of a linear structure (1D) versus a lattice (2D).


Thalhauser et al. Selection in spatial stochastic models of cancer: migration as a key modulator of fitness. Biol Direct. 2010 Apr 20;5:21.

Nowak MA, Michor F, Iwasa Y. The linear process of somatic evolution. Proc Natl Acad Sci U S A. 2003

Enderling et al. Paradoxical dependencies of tumor dormancy and progression on basic cell kinetics. Cancer Res. 2009 Nov 15;69(22):8814--‐21.

Project 5 (RB). Cancer as a disease of dysregulated networks

- Looking at differentially expressed genes in cancer v. healthy cells: Do we see non-random association between differentially expressed genes & graph-theoretic vertex attributes in the pathway graph? Likewise, non-random association between differential co-expression and graph-theoretic edge characteristics?

- If we do the same thing with healthy v. healthy cells (eg, from different healthy tissues, different stages of development, or under different exposures): do we see the same associations, or are cancers "hitting" high-connectivity edges, high-betweenness nodes, etc?

- Is there assortativity (or anti-assortativity) in pathways that contain genes that vary over the course of normal developmental processes (ie, are those developmentally-variable genes more/less likely to be connected in the pathway than by chance)? What about in cancer? That is, do we find evidence in the network architecture for evolved robustness to gene expression variation in the normal data, but which is not present in cancer data?

Project 6 (RB). The regulatory role of miRNAs, TFs, and other

- Using both the data and the network topology of a pathway of interest to obtain a pathway summary statistic [PDM-cite], can we identify TFs or miRNAs that appear to have a systems-level regulatory effect on the pathway as a whole?

- Relatedly, if we look at individual interactions within the pathway, do we find that their "activity" and "consistency" (cf Efroni etal 2007) are a function of the expression of particular TFs/miRNAs? Do we find these relationships differ in tumor v. normal cells?

Project 7 (RB). Boolean network robustness

- If we start with a simple boolean model, eg, of the NF-kB pathway, we can evolve it forward in time from all possible initial state, identifying the basin of attraction for each attractor. We may then examine the robustness of the network to the removal of edges (see, eg, doi:10.1073/pnas.0914180107 ). Do we find that the edges which are crucial to the stability of Boolean network are more likely to exhibit differential co-expression in cancers?

- Alternatively, if we start with time-course data and reconstruct a regulatory backbone network for a small set of genes (using SPT, time-delay ARACNE, etc [cite]) for each phenotype of interest, do we find differences in the robustness (as measured by the size of the basin of attraction for the largest attractor for each of the reconstructed network)?

Project 8 (RB). Systems meta-analysis

- There exists little concordance at the gene-level in the genes identified as significant in GWAS SNP studies, even of the same cancer. Looking broadly at the findings collected by NHGRI http://www.genome.gov/gwastudies/ , do we find concordance at the pathway level amongst cancers v. other diseases (metabolic syndrome, psychiatric illnesses, etc). If so, we find that particular subnetworks, network motifs, or edges/nodes with similar graph-theoretic properties are those being hit? (That is, is there evidence that a tumor gains an evolutionary by mutating certain classes of nodes?)

Further information forthcoming