skbio.alignment.local_pairwise_align

skbio.alignment.local_pairwise_align(seq1, seq2, gap_open_penalty, gap_extend_penalty, substitution_matrix)[source]

Locally align exactly two seqs with Smith-Waterman

State: Experimental as of 0.4.0.

Parameters
  • seq1 (GrammaredSequence) – The first unaligned sequence.

  • seq2 (GrammaredSequence) – The second unaligned sequence.

  • gap_open_penalty (int or float) – Penalty for opening a gap (this is substracted from previous best alignment score, so is typically positive).

  • gap_extend_penalty (int or float) – Penalty for extending a gap (this is substracted from previous best alignment score, so is typically positive).

  • substitution_matrix (2D dict (or similar)) – Lookup for substitution scores (these values are added to the previous best alignment score).

Returns

TabularMSA object containing the aligned sequences, alignment score (float), and start/end positions of each input sequence (iterable of two-item tuples). Note that start/end positions are indexes into the unaligned sequences.

Return type

tuple

Notes

This algorithm was originally described in 1. The scikit-bio implementation was validated against the EMBOSS water web server 2.

References

1

Identification of common molecular subsequences. Smith TF, Waterman MS. J Mol Biol. 1981 Mar 25;147(1):195-7.

2

http://www.ebi.ac.uk/Tools/psa/emboss_water/