# skbio.alignment.Alignment¶

class skbio.alignment.Alignment(seqs, validate=False, score=None, start_end_positions=None)[source]

Class for storing alignments of biological sequences.

The Alignment class adds convenience methods to the SequenceCollection class to make it easy to work with alignments of biological sequences.

Parameters: seqs : list of skbio.sequence.BiologicalSequence objects The skbio.sequence.BiologicalSequence objects to load into a new Alignment object. validate : bool, optional If True, runs the is_valid method after construction and raises SequenceCollectionError if is_valid == False. score : float, optional The score of the alignment, if applicable (usually only if the alignment was just constructed). start_end_positions : iterable of two-item tuples, optional The start and end positions of each input sequence in the alignment, if applicable (usually only if the alignment was just constructed using a local alignment algorithm). Note that these should be indexes into the unaligned sequences, though the Alignment object itself doesn’t know about these. skbio.util.exception.SequenceCollectionError If validate == True and is_valid == False.

Notes

By definition, all of the sequences in an alignment must be of the same length. For this reason, an alignment can be thought of as a matrix of sequences (rows) by positions (columns).

Examples

>>> from skbio.alignment import Alignment
>>> from skbio.sequence import DNA
>>> sequences = [DNA('A--CCGT', id="seq1"),
...              DNA('AACCGGT', id="seq2")]
>>> a1 = Alignment(sequences)
>>> a1
<Alignment: n=2; mean +/- std length=7.00 +/- 0.00>


Methods

 __contains__(id) The in operator. __eq__(other) The equality operator. __getitem__(index) The indexing operator. __iter__() The iter operator. __len__() The len operator. __ne__(other) The inequality operator. __repr__() The repr method. __reversed__() The reversed method. __str__() The str method. degap() Return a new SequenceCollection with all gap characters removed. distances([distance_fn]) Compute distances between all pairs of sequences distribution_stats([center_f, spread_f]) Return sequence count, and center and spread of sequence lengths from_fasta_records(fasta_records, ...[, ...]) Initialize a SequenceCollection object get_seq(id) Return a sequence from the SequenceCollection by its id. ids() Returns the BiologicalSequence ids int_map([prefix]) Create an integer-based mapping of sequence ids is_empty() Return True if the SequenceCollection is empty is_valid() Return True if the Alignment is valid iter_positions([constructor]) Generator of Alignment positions (i.e., columns) iteritems() Generator of id, sequence tuples k_word_frequencies(k[, overlapping, constructor]) Return frequencies of length k words for sequences in Alignment lower() Converts all sequences to lowercase majority_consensus([constructor]) Return the majority consensus sequence for the Alignment omit_gap_positions(maximum_gap_frequency) Returns Alignment with positions filtered based on gap frequency omit_gap_sequences(maximum_gap_frequency) Returns Alignment with sequences filtered based on gap frequency position_counters() Return collection.Counter object for positions in Alignment position_entropies([base, ...]) Return Shannon entropy of positions in Alignment position_frequencies() Return frequencies of characters for positions in Alignment score() Returns the score of the alignment. sequence_count() Return the count of sequences in the SequenceCollection sequence_length() Return the number of positions in Alignment sequence_lengths() Return lengths of the sequences in the SequenceCollection start_end_positions() Returns the (start, end) positions for each aligned sequence. subalignment([seqs_to_keep, ...]) Returns new Alignment that is a subset of the current Alignment toFasta() Return fasta-formatted string representing the SequenceCollection to_fasta() Return fasta-formatted string representing the SequenceCollection to_phylip([map_labels, label_prefix]) Return phylip-formatted string representing the SequenceCollection upper() Converts all sequences to uppercase