skbio.alignment.Alignment

class skbio.alignment.Alignment(seqs, validate=False, score=None, start_end_positions=None)[source]

Class for storing alignments of biological sequences.

The Alignment class adds convenience methods to the SequenceCollection class to make it easy to work with alignments of biological sequences.

Parameters:

seqs : list of skbio.sequence.BiologicalSequence objects

The skbio.sequence.BiologicalSequence objects to load into a new Alignment object.

validate : bool, optional

If True, runs the is_valid method after construction and raises SequenceCollectionError if is_valid == False.

score : float, optional

The score of the alignment, if applicable (usually only if the alignment was just constructed).

start_end_positions : iterable of two-item tuples, optional

The start and end positions of each input sequence in the alignment, if applicable (usually only if the alignment was just constructed using a local alignment algorithm). Note that these should be indexes into the unaligned sequences, though the Alignment object itself doesn’t know about these.

Raises:

skbio.util.exception.SequenceCollectionError

If validate == True and is_valid == False.

Notes

By definition, all of the sequences in an alignment must be of the same length. For this reason, an alignment can be thought of as a matrix of sequences (rows) by positions (columns).

Examples

>>> from skbio.alignment import Alignment
>>> from skbio.sequence import DNA
>>> sequences = [DNA('A--CCGT', id="seq1"),
...              DNA('AACCGGT', id="seq2")]
>>> a1 = Alignment(sequences)
>>> a1
<Alignment: n=2; mean +/- std length=7.00 +/- 0.00>

Methods

__contains__(id) The in operator.
__eq__(other) The equality operator.
__getitem__(index) The indexing operator.
__iter__() The iter operator.
__len__() The len operator.
__ne__(other) The inequality operator.
__repr__() The repr method.
__reversed__() The reversed method.
__str__() The str method.
degap() Return a new SequenceCollection with all gap characters removed.
distances([distance_fn]) Compute distances between all pairs of sequences
distribution_stats([center_f, spread_f]) Return sequence count, and center and spread of sequence lengths
from_fasta_records(fasta_records, ...[, ...]) Initialize a SequenceCollection object
get_seq(id) Return a sequence from the SequenceCollection by its id.
ids() Returns the BiologicalSequence ids
int_map([prefix]) Create an integer-based mapping of sequence ids
is_empty() Return True if the SequenceCollection is empty
is_valid() Return True if the Alignment is valid
iter_positions([constructor]) Generator of Alignment positions (i.e., columns)
iteritems() Generator of id, sequence tuples
k_word_frequencies(k[, overlapping, constructor]) Return frequencies of length k words for sequences in Alignment
lower() Converts all sequences to lowercase
majority_consensus([constructor]) Return the majority consensus sequence for the Alignment
omit_gap_positions(maximum_gap_frequency) Returns Alignment with positions filtered based on gap frequency
omit_gap_sequences(maximum_gap_frequency) Returns Alignment with sequences filtered based on gap frequency
position_counters() Return collection.Counter object for positions in Alignment
position_entropies([base, ...]) Return Shannon entropy of positions in Alignment
position_frequencies() Return frequencies of characters for positions in Alignment
score() Returns the score of the alignment.
sequence_count() Return the count of sequences in the SequenceCollection
sequence_length() Return the number of positions in Alignment
sequence_lengths() Return lengths of the sequences in the SequenceCollection
start_end_positions() Returns the (start, end) positions for each aligned sequence.
subalignment([seqs_to_keep, ...]) Returns new Alignment that is a subset of the current Alignment
toFasta() Return fasta-formatted string representing the SequenceCollection
to_fasta() Return fasta-formatted string representing the SequenceCollection
to_phylip([map_labels, label_prefix]) Return phylip-formatted string representing the SequenceCollection
upper() Converts all sequences to uppercase