class skbio.core.sequence.BiologicalSequence(sequence, id='', description='', validate=False)[source]

Base class for biological sequences.


sequence : python Sequence (e.g., str, list or tuple)

The biological sequence.

id : str, optional

The sequence id (e.g., an accession number).

description : str, optional

A description or comment about the sequence (e.g., “green fluorescent protein”).

validate : bool, optional

If True, runs the is_valid method after construction and raises BiologicalSequenceError if is_valid == False.



If validate == True and is_valid == False.


BiologicalSequence objects are immutable. Where applicable, methods return a new object of the same class. Subclasses are typically defined by methods relevant to only a specific type of biological sequence, and by containing characters only contained in the IUPAC standard character set [R19] for that molecule type.


[R19](1, 2) Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. May 10, 1985; 13(9): 3021-3030. A Cornish-Bowden


>>> from skbio.core.sequence import BiologicalSequence
>>> s = BiologicalSequence('GGUCGUGAAGGA')
>>> t = BiologicalSequence('GGUCCUGAAGGU')


description Return the description of the BiologicalSequence
id Return the id of the BiologicalSequence


__contains__(other) The in operator.
__eq__(other) The equality operator.
__getitem__(i) The indexing operator.
__hash__() The hash operator.
__iter__() The iter operator.
__len__() The len operator.
__ne__(other) The inequality operator.
__repr__() The repr method.
__reversed__() The reversed operator.
__str__() The str operator
alphabet() Return the set of characters allowed in a BiologicalSequence.
count(subsequence) Returns the number of occurences of subsequence.
degap() Returns a new BiologicalSequence with gaps characters removed.
distance(other[, distance_fn]) Returns the distance to other
fraction_diff(other) Return fraction of positions that differ relative to other
fraction_same(other) Return fraction of positions that are the same relative to other
gap_alphabet() Return the set of characters defined as gaps.
gap_maps() Return tuples mapping b/w gapped and ungapped positions
gap_vector() Return list indicating positions containing gaps
has_unsupported_characters() Return bool indicating presence/absence of unsupported characters
index(subsequence) Return the position where subsequence first occurs
is_gap(char) Return True if char is in the gap_alphabet set
is_gapped() Return True if char(s) in gap_alphabet are present
is_valid() Return True if the sequence is valid
iupac_characters() Return the non-degenerate and degenerate characters.
iupac_degeneracies() Return the mapping of degenerate to non-degenerate characters.
iupac_degenerate_characters() Return the degenerate IUPAC characters.
iupac_standard_characters() Return the non-degenerate IUPAC characters.
k_word_counts(k[, overlapping, constructor]) Get the counts of words of length k
k_word_frequencies(k[, overlapping, constructor]) Get the frequencies of words of length k
k_words(k[, overlapping, constructor]) Get the list of words of length k
lower() Convert the BiologicalSequence to lowercase
nondegenerates() Yield all nondegenerate versions of the sequence.
to_fasta([field_delimiter, terminal_character]) Return the sequence as a fasta-formatted string
unsupported_characters() Return the set of unsupported characters in the BiologicalSequence
upper() Convert the BiologicalSequence to uppercase