Formal Definition of Sequence Alignment

Consider an alphabet $$\Sigma$$ and two sequences $$s$$ and $$t$$ on $$\Sigma$$. Let $$\bar{\Sigma} = \Sigma\cup \{\diamond\}$$ where $$\diamond$$ is some symbol not in $$\Sigma$$, it's called the gap symbol. $$M:\bar{\Sigma}\times \bar{\Sigma}\to \Z$$. We have a gap penalties functions $$g_s,g_t:\N\to \Z$$. The functions are monotonic and are $$0$$ at $$0$$. The four boundary gap penalty coefficient $$b_s,b_t,e_s,e_t\in \{0,1\}$$. We want to find the alignment score between the two sequences.

A gap is the maximal substring consist of only $$\diamond$$'s. The gap sequence of a string is a sequence of lengths of each gap. Define $$s_\diamond$$ be the gap sequence of string $$s$$.

$$G(x,g,y,a) = xg(a_1) + \sum_{i=2}^{n-1} g(a_i) + yg(a_n)$$, where $$a$$ is a sequence of length $$n$$.

$A(u,v) = \sum_{i=1}^{|u|} M(u_i,v_i) + G(b_s,g_s,e_s,u_\diamond) + G(b_t,g_t,e_t,v_\diamond)$

Define $$S$$ and $$T$$ be the set of all strings that can be formed by inserting $$\diamond$$ into $$s$$ and $$t$$ respectively.

The alignment score is defined as $\max \{A(u,v) : |u|=|v|, u\in S, v\in T\}$

Now, once one write an algorithm for this problem, it can be used for many sequence alignment problems on Rosalind.

Posted by Chao Xu on 2013-07-10.
Tags: .