class skbio.sequence.NucleotideSequence(sequence, id='', description='', validate=False)[source]

Base class for nucleotide sequences.

A NucleotideSequence is a BiologicalSequence with additional methods that are only applicable for nucleotide sequences, and containing only characters used in the IUPAC DNA or RNA lexicon.


All uppercase and lowercase IUPAC DNA/RNA characters are supported.


description Return the description of the BiologicalSequence
id Return the id of the BiologicalSequence


__contains__(other) The in operator.
__eq__(other) The equality operator.
__getitem__(i) The indexing operator.
__hash__() The hash operator.
__iter__() The iter operator.
__len__() The len operator.
__ne__(other) The inequality operator.
__repr__() The repr method.
__reversed__() The reversed operator.
__str__() The str operator
alphabet() Return the set of characters allowed in a BiologicalSequence.
complement() Return the complement of the NucleotideSequence
complement_map() Return the mapping of characters to their complements.
count(subsequence) Returns the number of occurences of subsequence.
degap() Returns a new BiologicalSequence with gaps characters removed.
distance(other[, distance_fn]) Returns the distance to other
fraction_diff(other) Return fraction of positions that differ relative to other
fraction_same(other) Return fraction of positions that are the same relative to other
gap_alphabet() Return the set of characters defined as gaps.
gap_maps() Return tuples mapping b/w gapped and ungapped positions
gap_vector() Return list indicating positions containing gaps
has_unsupported_characters() Return bool indicating presence/absence of unsupported characters
index(subsequence) Return the position where subsequence first occurs
is_gap(char) Return True if char is in the gap_alphabet set
is_gapped() Return True if char(s) in gap_alphabet are present
is_reverse_complement(other) Return True if other is the reverse complement of self
is_valid() Return True if the sequence is valid
iupac_characters() Return the non-degenerate and degenerate characters.
iupac_degeneracies() Return the mapping of degenerate to non-degenerate characters.
iupac_degenerate_characters() Return the degenerate IUPAC characters.
iupac_standard_characters() Return the non-degenerate IUPAC nucleotide characters.
k_word_counts(k[, overlapping, constructor]) Get the counts of words of length k
k_word_frequencies(k[, overlapping, constructor]) Get the frequencies of words of length k
k_words(k[, overlapping, constructor]) Get the list of words of length k
lower() Convert the BiologicalSequence to lowercase
nondegenerates() Yield all nondegenerate versions of the sequence.
rc() Return the reverse complement of the NucleotideSequence
reverse_complement() Return the reverse complement of the NucleotideSequence
to_fasta([field_delimiter, terminal_character]) Return the sequence as a fasta-formatted string
unsupported_characters() Return the set of unsupported characters in the BiologicalSequence
upper() Convert the BiologicalSequence to uppercase