Function prediction using Markov random fields tigertiger Logo tigertiger Logo    
  • Home
  • About me
    • Address
    • C.V.
    • Travel
  • Projects
    • EMDB
    • Computational Bioimaging
      • Particle picking using SVMs
      • Model-free classification of views
      • Extending Chimera
    • Systems biology
      • Protein function
        prediction
      • Soft clustering
      • Network inference
    • Software
      • EMPI
      • TomoPlane
  • Links

Function prediction using Markov random fields

A large amount of data from a variety of experiments is available on possible protein interactions. We are interested in how these data can be used to predict the functions these proteins perform in the organism. Even without knowing details about how the proteins interact, there is statistical evidence that interacting proteins have an increased probability to be functionally related. Based on probabilistic models such a Markov random fields, we try to find statistical parameters to identify possible candidate for new functional assignments.

Soft clustering and biclustering

Clustering gene expression data involves identifying genes that have similar expression patterns over a variety of experiments. Traditionally, such clustering assigns a discrete cluster label to each gene. We are investigating clustering methods based on multidimensional scaling. In this method, genes are assigned coordinates in a low-dimensional space in such a way that genes with similar expression patterns are assigned places close to each other. Applied in two dimensions, this creates a planar map in which clusters can be visually identified and relations between clusters investigated interactively. By using this method both to map genes and experiments, we are looking for characteristic patterns in gene expression data that can serve as input to network inference applications.

       A soft clustering of genes in a subset of the compendium data set for S. cerevisiae of Hughes et.al. 1999. The lines connect genes or experiments that exhibit strong correlations (red more so than black lines). The placement of the points in the plane is chosen to put correlated points close to each other. The coloring of the points expresses their correlation to the selected point (red in the large cluster).       

An extension to this approach are biclustering and feature selection methods in which we try to identify features that are characteristic for certain clusters and partition the feature space in such a way that correlations and regulatory relationships become visible.

Network inference using Gaussian processes

Large-scale gene expression data provide us with information about how the expression values of different genes are correlated in a variety of experimental settings. However, such a correlation does not immediately imply a functional relation. Graphical models are tools to derive functional relations by trying to match a probabilistic model of genetic regulation to actual experimental data.

We investigate models based on linear correlations and Gaussian processes to describe the regulatory relationships between genes. Such models are conceptually simple as they describe well-known linear regulation, but become computationally expensive when a large number of genes is involved.

       Biclustering and feature selection: Here a subset of experiments has been selected such that the gene clusters in the two pictures are separated. Both images show the same data set, but the correlation (indicated by lines) between genes is calculated using different sets of experiments.       

2009-06-14 20:09 BST     xris