Computational Genomics Group
  • Home
  • Research
  • Publications
  • Teaching
  • Blog
  • Group Members
  • News
  • Computational Biology Book
  • Data Analysis with R Book
  • CG2 github
  • Fiction

Genome Urbanization: Gene segregation in eukaryotes

3/27/2017

0 Comments

 
Our group's last paper has just appeared in Nucleic Acids Research. In it we introduce a new concept that is related to the architecture of eukaryotic genomes and how it may have evolved to segregate functions in different genomic compartments in a way that is reminiscent of the semi-spontaneous social/ratial segregation of cities.

Genome Urbanization is inspired by the pioneering works on social segregation by Thomas Schelling whereby weak constraints may aggregate to shape macroscopic behaviour in various systems. Using an already published experiment in which we had previously studied how the chromatin structure of yeast promoters may shape gene expression under the accumulation of topological stress, we set out to investigate  whether differential gene expression upon a structurally-related stimulus may be reflected upon the spatial organization of genes. The first thing we found was that when put under topological stress, genes tend to form clusters in space, with groups of 6 or more adjacent genes being consistently up- or down-regulated. This was not so much of a surprise. DNA torsional stress acts exactly on the topology of the nucleus and so it would be expected for genes to follow this constraint. Surprisingly though, some clustering of gene expression also happens under other stress conditions  such as heat shock or nutrient deprivation albeit to a much lesser extent. What this pointed to is that genes occupy positions in the nucleus (even in lthe inear dimension) that allows them to respond to stimuli in a coordinated manner, thus the "micro-motives" of Schelling may be reflected on the local prerequisites of certain genes to be close to others due to their shared affinities for the same transcription factors or because the local environment is more favourable to their dedicated function.

Gene clustering is, of course, nothing new and many people have spotted preferences for genes that are spatially related in terms of expression, co-evolution or co-regulation. Our work, in this respect, focused on the chromatin and local gene structure of the observed gene clusters. We were thus able to define two distinct genome components with properties so different that the allusion to urban neighborhoods was almost spontaneous.

Genes that are "shut-down" upon the accumulation of topological stress were predominantly found close to the centromeres and the nuclear core. They were "old" genes in the sense that they coded for conserved, fundamental functions such as gene transcription and protein processing. More importantly, they were placed within very short distances from each other and with a rather "crammed" orientation that put, very frequently, adjacent genes transcribed in opposing directions. On the other hand, genes that were positively regulated, were found on the other extreme of the nucleus, close to subtelomeric regions. They coded for "newly" acquired stress-response functions, they had complex regulation and were surprisingly aligned with a clear tendency for co-directionality. They also had significantly longer "breathing" intergenic space between them. The set of these structural properties may allow them not only to be "resistant" to topological stress but to even harness DNA supercoiling in order to propel transcription.

This discrepancy in so many levels lead us to propose the Genome Urbanization model according to which, older, more conserved genes are preferentially located in the "old city center". The genome's core resembles the urban plan of a medieval city with its narrow meandering streets leaving little space between houses that appear as if touching each other. At the edges of the chromosomes lies what we call the "suburban genome" where the genome's "nouveau riches", new genes with complex functions that are not necessarily constitutively expressed and are employed only under specific conditions, have created a much different landscape. Here, genes are organized in tandem and with longer intergenic spacers inbetween them, in a way that brings to mind the tract housing of US city suburbia.   
Picture
Genome Urbanization may not be a particular property of the yeast genome, even though unicellularity and the increased gene density compared to more complex eukaryotic genomes is likely to make positional constraints more apparent. We nonetheless believe that similar tendencies may exist in bigger, mammalian genomes and that they may reflect even more intricate regulatory patterns that balance transcriptional homeostasis with expression noise and with the ability to respond to a great number of external and internal stimuli.

Given that the original data were already in our hands since 2006 and the fact that I first presented this concept more than 2 years ago in a talk at the IMBB, FORTH in Crete, this work has been no easy task to complete. It took the combined work of two undergraduate students (Maria Tsochatzidou and Maria Malliarou), the crucial assistance of a fellow bioinformatician (Nikolas Papanikolaou) and the support of long-standing collaborator Joaquim Roca at the CSIC, Barcelona who first introduced me to DNA topology. We are currently looking into many interesting perspectives that this work opens up regarding the evolution of genome architecture in eukaryotes and how gene positioning and chromosome structure may provide insight on the way cells employ transcriptional regulation under various conditions.

This paper is also, strictly speaking, the first paper to come entirely out of our group and thus seeing it published on my son's 5th birthday adds to a sense of accomplishment.
0 Comments

and then they were "fragile" (again)

12/21/2016

0 Comments

 
Just the other day we were discussing a new chemical-coupled NGS method that threatened to change our view on nucleosome positioning, according to which a great proportion of nucleosomes which are simply absent from MNase digestion maps but are revealed to occupy regions where once we thought they were not supposed to.
In a recent paper in Genome Research, Tess Jeffers and Jason Lieb, bring back the notion of fragile nucleosomes, this time through conventional "good-old" MNase-Seq. In a nutshell, what Jeffers and Lieb did is not very different from early works on the concept of fragile nucleosomes (i.e. nucleosomes that can be digested by MNase and are thus "lost" from MNase maps) in the sense that they use differential timing MNase digestion, separating the output and treating a specific set of sequences sizes that comes from low digestion times as  the "fragile" fraction. 

Through a rather straightforward analysis of differential nucleosome positioning in C. elegans embryos, the authors were able to recap most of what we knew about nucleosomes already. Fragile nucleosomes are AT-rich, enriched in promoters of low-expression genes, lacking enrichments in regulatory activating marks and isoforms such as H2A.Z. This reassuring(?) image is summarized in the Figure below (Figure 4a in their paper), where fragility and resistance correspond to the fragile and "well-positioned" fractions of the bulk nucleosomes.
Picture
A number of aspects touched upon the paper of Voong et al., such as exon-intron nucleosomes (a matter of personal interest) or different promoter classes remain unchallenged by this study. One however is  and in a particularly informative way. Jeffers and Lieb show that fragile nucleosomes are extremely enriched in areas of the genome where a number of transcription factors are expected to bind, which can explain the discrepancy in previous maps of MNase-defined nucleosomes and pioneer-factors or CTCF binding sites.

Still, as this little story develops it would be interesting to see if the chemical-mapping method was just a firework or is here to make a difference.
0 Comments

in the news: Why CpG islands remain unmethylated?

11/8/2013

0 Comments

 
Picture
A recent work published in Genome Research (Ginno et al. GC skew at the 5' and 3' ends of human genes links R-loop formation to epigenetic regulation and transcription termination. Genome Res. 2012) attempts to elucidate a well-known molecular paradox at the intersection of gene regulation and genomic DNA composition. That is the fact that CpG islands, regions of the genome that are particularly enriched in the dinucleotide 5'-GG-3'  (CpG) are largely protected by DNA methylation even though CpG is the primary substrate of the great majority of DNA methylases. This "protection" of CpG islands forms the basis of their regulatory function since by remaining unmethylated these regions, predominantly located at the promoters of constitutive genes, lead to the activation of gene expression.
​
In this work Ginno et al., elaborate more on a previously published work, where they showed that GpG islands coincide with genomic regions that share an intrinsic pattern of "base overloading", showing increased representation of either Gs or Cs for long tracts of DNA. These patterns, known as GC (or more generally "nucleotide") skews, are representative -the authors show- not only of CpG islands but also of the formation of R-loops, molecular hybrid structures, in which the newly synthesized RNA re-anneals to the DNA to form a sort of triplex. In this very interesting piece of innovative research, the authors further show a link between the location and pattern of the skews and the properties of CpG islands

The data used:  Human ES cells were profiled for R-loop formation with DRIP-Seq, a novel technique established by the same group in the aforementioned previous work. GC skewed regions were defined with the application of a custom R-script called SkewR. DNA methylation and gene expression data were obtained from previous published works

The analysis: The authors clustered all promoters of the human genome based on the existence of a) strong b) weak c) no or d) inverse GC skews (inverse meaning Cs outnumbered Gs instead of the opposite) and then went on to characterize specific properties of both the promoters and the downstream genes. Not surprisingly, they found that a strong (normal or inverse) GC-skew pattern coincided with increased expression but were also able to demonstrate that DNA methylation occurs much less frequently in promoters with GC skew compared to those with an even nucleotide composition. They were also able to correlate GC skews with the formation of R-loops genome-wide. Their results suggest a molecular mechanism (R-loop formation) that is guided by the underlying nucleotide composition (DNA skews) and results in an epigenetic effect (resistance to DNA methylation).

What's next:  This work highlights the -often forgotten- role of underlying DNA sequence composition in molecular genomics studies. The role of skews and possibly of other sequence patterns in R-loop formation and the connection of the latter to the deposition (or its repression) of epigenetic marks are bound to be the focus of works to come.

Read more: A work published last year by the same group (Ginno et al. Molecular Cell 2012) introduces the concept of GC skews in the context of R-loop formation and their role in the maintenance of CpG island DNA composition.
0 Comments

in the news: CpG methylation is not always repressive

10/8/2013

0 Comments

 
Picture

A recent work published in eLife (Hu et al. DNA methylation presents distinct binding sites for human transcription factors. eLife 2013) challenges the notion of suppressive methylated CpG islands by reporting a significant number of transcription factors binding preferentially to methylated cytosines of CpG islands. In mammals, the methylation of CpG sites—which consist of a cytosine base next to a guanine base—is typically thought to reduce gene expression by preventing proteins called transcription factors from binding to regions of DNA called promoters. This can occur directly if methylation disrupts interactions between the DNA and the transcription factors, or indirectly if other proteins that bind to the methylated DNA compete with the transcription factors for binding sites. However, only a small number of proteins that bind to methylated DNA have so far been identified.

The data used: The authors use protein arrays for 1300 TF and their co-factors in order to assess their binding affinity on unmethylated or methylated DNA. The DNA stretches used were in total 154 sequences selected on the basis of high probability to form part of human promoters, being representative of known TF-binding sites and carrying at least one CpG site. TF-binding intensities were then measured for both the unmethylated and the methylated version of each of the sequences. 

The analysis: Differential TF binding for methylated and unmethylated revealed a significant subset (47 proteins) showed increased binding for the CpG-methylated DNA instead of the unmethylated one. The authors showed that this represents an inherent property of the proteins by showing selective binding of specific TF towards different DNA sequences when they contain methylated cytosines and when not. In this sense, the authors coin mC (methylcytosine) as the "fifth base".

What's next: As the authors note the number of TF identified in this study is probably an under-estimation since only a very limited number of DNA targets was used. High-throughput techniques coupling of high-resolution DNA methylation (RRBS) with ChIPSeq to define regions of TF binding that are effectively methylated are probably the most effective way to probe methylated DNA binding directly.

Read more: A work published a bit earlier (Spruijt et al. Cell 2012) where specific direct binding of hydroxy-methylcytosines is assayed at genome-scale. hmC (hydroxy-methylcytosine) is probably the primary candidate for being coined as the "sixth base".

0 Comments

in the news: Splicing regulation of cytokine signaling

10/8/2013

0 Comments

 
Picture
A recent PNAS paper from the group of David Baltimore at CalTech provides evidence for an interesting link between splicing and the coordinated response to cytokine stimulus. 

Hao and Baltimore (2013). RNA splicing regulates the temporal order of TNF-induced gene expression. PNAS.

The main idea: In many inducible systems the mRNA of different genes being induced appears sequentially. This may be either due delayed access of transcription factors at the genes to appear later but one cannot exclude the possibility for all genes' pre-mRNA being readily produced followed by a subsequent delay in maturation due to splicing. The authors test the latter hypothesis in the context of TNF-induced B-cells.

The data used: Immortalized mouse embryonic fibroblasts treated with TNF and analyzed at the levels of gene expression, transcript stability and splicing speed.

The analysis: The authors have previously defined that upon TNF induction, different sets of genes appear to be upregulated in three distinct "waves". An immediate (within 30 minutes of induction), an intermediate (2h after induction) and a late one (12h or more after induction). In this work they measure primary and mature mRNA levels to show that although the primary (unspliced) mRNA levels appear comparable for genes of all three groups, mature (spliced) mRNA levels seem to follow a certain trend that is related to the gene groups. Thus they argue that the 3 "waves" of induction are regulated at the level of mRNA splicing.

What's next: It remains to be seen whether this splicing-regulated time dependence may be reflected in specific properties of the underlying sequence. It would be interesting to see whether splicing may be guided by other measurable properties of the genes' sequences.

Read more: A previous work of the same team that first proposed the "3 waves" of TNF-inducible genes. Hao and Baltimore (2009). The stability of mRNA influences the temporal order of the induction of genes encoding inflammatory molecules. Nature Immunology 10, (13), 281-288.


0 Comments

in the news: Bacterial Genome Architecture

10/8/2013

0 Comments

 
Picture
In a PNAS paper from back in 2010, Yin and colleagues from the University of Georgia propose a very interesting hypothesis according to which the arrangement of operons in the genomes of bacteria is constrained by the degree of their participation in common pathways.The main idea: Operons whose genes are involved in the same pathways tend to be more closely located in the genome so as for their coordinated expression to take place more efficiently.

The data used: The complete genomes of E. coli and B. subtilis, their operon annotation (genomic coordinates and gene content) and the connections between operons and biological (mostly metabolic) pathways. Moreover, genes involved in pathways that are more active belong to operons that are located in smaller distances from each other.

The analysis: The authors devise a simple measure of operon compactness, that is how closely operons involved in the same pathway tend to occur in the linear genome. By comparing the actual average compactness value of E.coli and B. subtilis with one thousand random arrangements of operons (shuffling of the genome) they show that the natural arrangement is much more compact than the one that could have been produced by chance, therefore proposing that the observed compactness is the result of selection.

What's next: In a more recent paper (which we will be discussing soon), this idea is being further elaborated to incorporate data related to the 3-dimensional structure of the genome. It remains to be seen when and how such approaches will be extended to eukaryotic genomes with multiple chromosomes (and without operons)

Read more: Yin et al. (2010) Genomic arrangement of bacterial operons is constrained by biological pathways encoded in the genome. PNAS

0 Comments

    RSS Feed

    It's all about...

    Bioinformatics and computational biology with a focus on chromatin and genome architecture, plus a little bit of football and occasional aspects of  University education.

    Archives

    April 2021
    December 2020
    March 2020
    November 2018
    September 2017
    April 2017
    March 2017
    December 2016
    November 2016
    February 2016
    May 2015
    November 2014
    September 2014
    July 2014
    February 2014
    November 2013
    October 2013

    Categories

    All
    Academic Life
    Bioinformatics
    ChIPSeq
    ChIPSeq Bias
    Cpg Islands
    Data Analysis
    Exons
    Football
    Footballomics
    Gene Regulation
    Genetic Diseases
    Genome Architecture
    Genome Structure
    Inflammation
    Journalism
    Math Illiteracy
    NGS
    Nucleosome Positioning
    Nucleotide Composition
    Nucleotide Skews
    Promoters
    R
    Splicing
    Statistics
    Systems Biology
    Tnf
    Transcriptome
    Variation
    Whole Exome

Powered by Create your own unique website with customizable templates.