The Art Gallery Guardian

Formal Definition of Sequence Alignment


Consider an alphabet Σ\Sigma and two sequences ss and tt on Σ\Sigma. Let Σˉ=Σ{}\bar{\Sigma} = \Sigma\cup \{\diamond\} where \diamond is some symbol not in Σ\Sigma, it's called the gap symbol. M:Σˉ×ΣˉZM:\bar{\Sigma}\times \bar{\Sigma}\to \Z. We have a gap penalties functions gs,gt:NZg_s,g_t:\N\to \Z. The functions are monotonic and are 00 at 00. The four boundary gap penalty coefficient bs,bt,es,et{0,1}b_s,b_t,e_s,e_t\in \{0,1\}. We want to find the alignment score between the two sequences.

A gap is the maximal substring consist of only \diamond's. The gap sequence of a string is a sequence of lengths of each gap. Define ss_\diamond be the gap sequence of string ss.

G(x,g,y,a)=xg(a1)+i=2n1g(ai)+yg(an)G(x,g,y,a) = xg(a_1) + \sum_{i=2}^{n-1} g(a_i) + yg(a_n), where aa is a sequence of length nn.

A(u,v)=i=1uM(ui,vi)+G(bs,gs,es,u)+G(bt,gt,et,v)\displaystyle A(u,v) = \sum_{i=1}^{|u|} M(u_i,v_i) + G(b_s,g_s,e_s,u_\diamond) + G(b_t,g_t,e_t,v_\diamond)

Define SS and TT be the set of all strings that can be formed by inserting \diamond into ss and tt respectively.

The alignment score is defined as max{A(u,v):u=v,uS,vT}\displaystyle \max \{A(u,v) : |u|=|v|, u\in S, v\in T\}

Now, once one write an algorithm for this problem, it can be used for many sequence alignment problems on Rosalind.

  • Hamming Distance: M(a,a)=1M(a,a)=-1, and 00 otherwise. All other function are -\infty.
  • Finding a Shared Spliced Motif, longest common subsequence. M(a,a)=1M(a,a)=1 for aΣa\in \Sigma, 00 otherwise. Everything else are 00.
  • Edit distance: Levenshtein distance, M(a,a)=1M(a,a)=-1 for aΣˉa\in \bar{\Sigma}, 00 otherwise. All other are -\infty.
  • Global alignment: bs=es=gsb_s=e_s=g_s, bt=et=gtb_t=e_t=g_t.
  • Overlap alignment: bs=et=0b_s=e_t=0, bt=gtb_t=g_t, es=gse_s=g_s.
  • Fitting alignment: bs=es=0b_s=e_s=0, bt=et=gtb_t=e_t=g_t.
  • Semiglobal alignment: bs=es=bt=et=0b_s=e_s=b_t=e_t=0.
  • Substring Matching: bs=es=0b_s=e_s=0, gs,gt,bt,et=g_s,g_t,b_t,e_t=-\infty.
Posted by Chao Xu on .
Tags: .