Abstract
Tandem duplication is an evolutionary process whereby a segment of DNA is replicated and proximally inserted. The different configurations that can arise from this process give rise to some interesting combinatorial questions. Firstly, we introduce an algebraic formalism to represent this process as a word producing automaton. The number of words arising from n tandem duplications can then be recursively derived. Secondly, each single word accounts for multiple evolutions. With the aid of a bicoloured 2d tree, a Hasse diagram corresponding to a partially ordered set is constructed, from which we can count the number of evolutions corresponding to a given word. Thirdly, we implement some subtree prune and graft operations on this structure to show that the total number of possible evolutions arising from n tandem duplications is $\prod_{k=1}^n(4^k  (2k + 1))$. The space of structures arising from tandem duplication thus grows at a superexponential rate with leading order term $\mathcal{O}(4^{\frac{1}{2}n^2})$.
Original language  English 

Pages (fromto)  1–22 
Number of pages  22 
Journal  Discrete Applied Mathematics 
Volume  194 
Early online date  6 Jun 2015 
DOIs  
Publication status  Published  30 Oct 2015 
Keywords
 Combinatorics
 Tandem duplication
 Posets
 Rearrangements
 Evolution
Profiles

Christopher Greenman
 School of Computing Sciences  Lecturer
 School of Natural Sciences  Project Module Organiser
 Computational Biology  Member
Person: Member, Research Group Member, Academic, Teaching & Research

Taoyang Wu
 School of Computing Sciences  Lecturer in Computing Sciences
 Computational Biology  Member
Person: Research Group Member, Academic, Teaching & Research