skbio.alignment.Alignment

class skbio.alignment.Alignment(seqs, validate=False, score=None, start_end_positions=None)[source]

Class for storing alignments of biological sequences.

The Alignment class adds convenience methods to the SequenceCollection class to make it easy to work with alignments of biological sequences.

Parameters:

seqs : list of skbio.sequence.BiologicalSequence objects

The skbio.sequence.BiologicalSequence objects to load into a new Alignment object.

validate : bool, optional

If True, runs the is_valid method after construction and raises SequenceCollectionError if is_valid == False.

score : float, optional

The score of the alignment, if applicable (usually only if the alignment was just constructed).

start_end_positions : iterable of two-item tuples, optional

The start and end positions of each input sequence in the alignment, if applicable (usually only if the alignment was just constructed using a local alignment algorithm). Note that these should be indexes into the unaligned sequences, though the Alignment object itself doesn’t know about these.

Raises:

skbio.alignment.SequenceCollectionError

If validate == True and is_valid == False.

skbio.alignment.AlignmentError

If not all the sequences have the same length.

Notes

By definition, all of the sequences in an alignment must be of the same length. For this reason, an alignment can be thought of as a matrix of sequences (rows) by positions (columns).

Examples

>>> from skbio.alignment import Alignment
>>> from skbio.sequence import DNA
>>> sequences = [DNA('A--CCGT', id="seq1"),
...              DNA('AACCGGT', id="seq2")]
>>> a1 = Alignment(sequences)
>>> a1
<Alignment: n=2; mean +/- std length=7.00 +/- 0.00>

Methods

__contains__(id) The in operator.
__eq__(other) The equality operator.
__getitem__(index) The indexing operator.
__iter__() The iter operator.
__len__() The len operator.
__ne__(other) The inequality operator.
__repr__() The repr method.
__reversed__() The reversed method.
__str__() The str method.
degap() Return a new SequenceCollection with all gap characters removed.
distances([distance_fn]) Compute distances between all pairs of sequences
distribution_stats([center_f, spread_f]) Return sequence count, and center and spread of sequence lengths
from_fasta_records(fasta_records, ...[, ...]) Initialize a SequenceCollection object
get_seq(id) Return a sequence from the SequenceCollection by its id.
ids() Returns the BiologicalSequence ids
int_map([prefix]) Create an integer-based mapping of sequence ids
is_empty() Return True if the SequenceCollection is empty
is_valid() Return True if the SequenceCollection is valid
iter_positions([constructor]) Generator of Alignment positions (i.e., columns)
iteritems() Generator of id, sequence tuples
k_word_frequencies(k[, overlapping]) Return frequencies of length k words for sequences in Alignment
lower() Converts all sequences to lowercase
majority_consensus([constructor]) Return the majority consensus sequence for the Alignment
omit_gap_positions(maximum_gap_frequency) Returns Alignment with positions filtered based on gap frequency
omit_gap_sequences(maximum_gap_frequency) Returns Alignment with sequences filtered based on gap frequency
position_counters() Return collections.Counter object for positions in Alignment
position_entropies([base, ...]) Return Shannon entropy of positions in Alignment
position_frequencies() Return frequencies of characters for positions in Alignment
read(fp[, format]) Create a new Alignment instance from a file.
score() Returns the score of the alignment.
sequence_count() Return the count of sequences in the SequenceCollection
sequence_length() Return the number of positions in Alignment
sequence_lengths() Return lengths of the sequences in the SequenceCollection
start_end_positions() Returns the (start, end) positions for each aligned sequence.
subalignment([seqs_to_keep, ...]) Returns new Alignment that is a subset of the current Alignment
toFasta() Return fasta-formatted string representing the SequenceCollection
to_fasta() Return fasta-formatted string representing the SequenceCollection
to_phylip([map_labels, label_prefix]) Return phylip-formatted string representing the SequenceCollection
update_ids([ids, fn, prefix]) Update sequence IDs on the sequence collection.
upper() Converts all sequences to uppercase
write(fp[, format]) Write an instance of Alignment to a file.