Quantitative analysis and prediction of G-quadruplex forming DNA


G-quadruplex (GQ) are four-stranded DNA structure that can form within guanine-rich sequences that are proposed to play a role in gene expression, DNA replication, and telomere maintenance.   Recent investigations have demonstrated the existence of GQ DNA in live mammalian cells and a significant number of potential GQ forming sequences in the human genome. CPLC researchers in the Myong and Song labs have carried out a systematic and quantitative analysis of GQ folding propensity on a large set of 438 GQ forming sequences in double-stranded DNA by integrating fluorescence measurement, single-molecule imaging and computational modeling.  It was discovered that short minimum loop length and the thymine base are two main factors that lead to high GQ folding propensity.  Linear and Gaussian process regression models further validate that the GQ folding potential can be predicted with high accuracy based on the loop length distribution and the nucleotide content of the loop sequences. These new parameters can inform the evaluation and classification of putative GQ sequences in the human genome. Refer to Kim, M. et al. (2016) Nucleic Acids Res. 44(10): 4807-4817.