class skbio.alignment.SequenceCollection(seqs, validate=False)[source]

Class for storing collections of biological sequences.


seqs : list of skbio.sequence.BiologicalSequence objects

The skbio.sequence.BiologicalSequence objects to load into a new SequenceCollection object.

validate : bool, optional

If True, runs the is_valid method after construction and raises SequenceCollectionError if is_valid == False.



If validate == True and is_valid == False.


>>> from skbio.alignment import SequenceCollection
>>> from skbio.sequence import DNA
>>> sequences = [DNA('ACCGT', id="seq1"),
...              DNA('AACCGGT', id="seq2")]
>>> s1 = SequenceCollection(sequences)
>>> s1
<SequenceCollection: n=2; mean +/- std length=6.00 +/- 1.00>


__contains__(id) The in operator.
__eq__(other) The equality operator.
__getitem__(index) The indexing operator.
__iter__() The iter operator.
__len__() The len operator.
__ne__(other) The inequality operator.
__repr__() The repr method.
__reversed__() The reversed method.
__str__() The str method.
degap() Return a new SequenceCollection with all gap characters removed.
distances(distance_fn) Compute distances between all pairs of sequences
distribution_stats([center_f, spread_f]) Return sequence count, and center and spread of sequence lengths
from_fasta_records(fasta_records, ...[, ...]) Initialize a SequenceCollection object
get_seq(id) Return a sequence from the SequenceCollection by its id.
ids() Returns the BiologicalSequence ids
int_map([prefix]) Create an integer-based mapping of sequence ids
is_empty() Return True if the SequenceCollection is empty
is_valid() Return True if the SequenceCollection is valid
iteritems() Generator of id, sequence tuples
k_word_frequencies(k[, overlapping, constructor]) Return frequencies of length k words for sequences in Alignment
lower() Converts all sequences to lowercase
sequence_count() Return the count of sequences in the SequenceCollection
sequence_lengths() Return lengths of the sequences in the SequenceCollection
toFasta() Return fasta-formatted string representing the SequenceCollection
to_fasta() Return fasta-formatted string representing the SequenceCollection
upper() Converts all sequences to uppercase