Class for storing alignments of biological sequences.
The Alignment class adds convenience methods to the SequenceCollection class to make it easy to work with alignments of biological sequences.
Parameters: | seqs : list of skbio.sequence.BiologicalSequence objects
validate : bool, optional
score : float, optional
start_end_positions : iterable of two-item tuples, optional
|
---|---|
Raises: | skbio.util.exception.SequenceCollectionError
|
See also
skbio.sequence.BiologicalSequence, skbio.sequence.NucleotideSequence, skbio.sequence.DNASequence, skbio.sequence.RNASequence, SequenceCollection, skbio.parse.sequences, skbio.parse.sequences.parse_fasta
Notes
By definition, all of the sequences in an alignment must be of the same length. For this reason, an alignment can be thought of as a matrix of sequences (rows) by positions (columns).
Examples
>>> from skbio.alignment import Alignment
>>> from skbio.sequence import DNA
>>> sequences = [DNA('A--CCGT', id="seq1"),
... DNA('AACCGGT', id="seq2")]
>>> a1 = Alignment(sequences)
>>> a1
<Alignment: n=2; mean +/- std length=7.00 +/- 0.00>
Methods
__contains__(id) | The in operator. |
__eq__(other) | The equality operator. |
__getitem__(index) | The indexing operator. |
__iter__() | The iter operator. |
__len__() | The len operator. |
__ne__(other) | The inequality operator. |
__repr__() | The repr method. |
__reversed__() | The reversed method. |
__str__() | The str method. |
degap() | Return a new SequenceCollection with all gap characters removed. |
distances([distance_fn]) | Compute distances between all pairs of sequences |
distribution_stats([center_f, spread_f]) | Return sequence count, and center and spread of sequence lengths |
from_fasta_records(fasta_records, ...[, ...]) | Initialize a SequenceCollection object |
get_seq(id) | Return a sequence from the SequenceCollection by its id. |
ids() | Returns the BiologicalSequence ids |
int_map([prefix]) | Create an integer-based mapping of sequence ids |
is_empty() | Return True if the SequenceCollection is empty |
is_valid() | Return True if the Alignment is valid |
iter_positions([constructor]) | Generator of Alignment positions (i.e., columns) |
iteritems() | Generator of id, sequence tuples |
k_word_frequencies(k[, overlapping, constructor]) | Return frequencies of length k words for sequences in Alignment |
lower() | Converts all sequences to lowercase |
majority_consensus([constructor]) | Return the majority consensus sequence for the Alignment |
omit_gap_positions(maximum_gap_frequency) | Returns Alignment with positions filtered based on gap frequency |
omit_gap_sequences(maximum_gap_frequency) | Returns Alignment with sequences filtered based on gap frequency |
position_counters() | Return collection.Counter object for positions in Alignment |
position_entropies([base, ...]) | Return Shannon entropy of positions in Alignment |
position_frequencies() | Return frequencies of characters for positions in Alignment |
score() | Returns the score of the alignment. |
sequence_count() | Return the count of sequences in the SequenceCollection |
sequence_length() | Return the number of positions in Alignment |
sequence_lengths() | Return lengths of the sequences in the SequenceCollection |
start_end_positions() | Returns the (start, end) positions for each aligned sequence. |
subalignment([seqs_to_keep, ...]) | Returns new Alignment that is a subset of the current Alignment |
toFasta() | Return fasta-formatted string representing the SequenceCollection |
to_fasta() | Return fasta-formatted string representing the SequenceCollection |
to_phylip([map_labels, label_prefix]) | Return phylip-formatted string representing the SequenceCollection |
upper() | Converts all sequences to uppercase |