skbio.alignment.global_pairwise_align_protein

skbio.alignment.global_pairwise_align_protein(seq1, seq2, gap_open_penalty=11, gap_extend_penalty=1, substitution_matrix=None, penalize_terminal_gaps=False)[source]

Globally align pair of protein seqs or alignments with Needleman-Wunsch

Parameters:

seq1 : str, BiologicalSequence, or Alignment

The first unaligned sequence(s).

seq2 : str, BiologicalSequence, or Alignment

The second unaligned sequence(s).

gap_open_penalty : int or float, optional

Penalty for opening a gap (this is substracted from previous best alignment score, so is typically positive).

gap_extend_penalty : int or float, optional

Penalty for extending a gap (this is substracted from previous best alignment score, so is typically positive).

substitution_matrix: 2D dict (or similar), optional

Lookup for substitution scores (these values are added to the previous best alignment score); default is BLOSUM 50.

penalize_terminal_gaps: bool, optional

If True, will continue to penalize gaps even after one sequence has been aligned through its end. This behavior is true Needleman-Wunsch alignment, but results in (biologically irrelevant) artifacts when the sequences being aligned are of different length. This is False by default, which is very likely to be the behavior you want in all or nearly all cases.

Returns:

skbio.Alignment

Alignment object containing the aligned sequences as well as details about the alignment.

Notes

Default gap_open_penalty and gap_extend_penalty parameters are derived from the NCBI BLAST Server [R108].

The BLOSUM (blocks substitution matrices) amino acid substitution matrices were originally defined in [R109].

This function can be use to align either a pair of sequences, a pair of alignments, or a sequence and an alignment.

References

[R108](1, 2) http://blast.ncbi.nlm.nih.gov/Blast.cgi
[R109](1, 2) Amino acid substitution matrices from protein blocks. S Henikoff and J G Henikoff. Proc Natl Acad Sci U S A. Nov 15, 1992; 89(22): 10915-10919.