Bioinformatic principles underlying the information content of transcription factor binding sites

Jan T. Kim, Thomas Martinetz, Daniel Polani

Research output: Contribution to journalArticlepeer-review

21 Citations (Scopus)

Abstract

Empirically, it has been observed in several cases that the information content of transcription factor binding site sequences (Rsequence) approximately equals the information content of binding site positions (Rfrequency). A general framework for formal models of transcription factors and binding sites is developed to address this issue. Measures for information content in transcription factor binding sites are revisited and theoretic analyses are compared on this basis. These analyses do not lead to consistent results. A comparative review reveals that these inconsistent approaches do not include a transcription factor state space. Therefore, a state space for mathematically representing transcription factors with respect to their binding site recognition properties is introduced into the modelling framework. Analysis of the resulting comprehensive model shows that the structure of genome state space favours equality of Rsequence and Rfrequency indeed, but the relation between the two information quantities also depends on the structure of the transcription factor state space. This might lead to significant deviations between Rsequence and Rfrequency. However, further investigation and biological arguments show that the effects of the structure of the transcription factor state space on the relation of Rsequence and Rfrequency are strongly limited for systems which are autonomous in the sense that all DNA-binding proteins operating on the genome are encoded in the genome itself. This provides a theoretical explanation for the empirically observed equality.
Original languageEnglish
Pages (from-to)529-544
Number of pages16
JournalJournal of Theoretical Biology
Volume220
Issue number4
DOIs
Publication statusPublished - Feb 2003

Cite this