class skbio.sequence.DNASequence(sequence, id='', description='', quality=None, validate=False)[source]

Base class for DNA sequences.

A DNASequence is a NucelotideSequence that is restricted to only containing characters used in IUPAC DNA lexicon.


All uppercase and lowercase IUPAC DNA characters are supported.


description Description of the biological sequence.
id ID of the biological sequence.
quality Quality scores of the characters in the biological sequence.
sequence String containing underlying biological sequence characters.


__contains__(other) The in operator.
__eq__(other) The equality operator.
__getitem__(i) The indexing operator.
__hash__() The hash operator.
__iter__() The iter operator.
__len__() The len operator.
__ne__(other) The inequality operator.
__repr__() The repr method.
__reversed__() The reversed operator.
__str__() The str operator
alphabet() Return the set of characters allowed in a BiologicalSequence.
complement() Return the complement of the NucleotideSequence
complement_map() Return the mapping of characters to their complements.
copy(**kwargs) Return a copy of the current biological sequence.
count(subsequence) Returns the number of occurences of subsequence.
degap() Returns a new BiologicalSequence with gap characters removed.
distance(other[, distance_fn]) Returns the distance to other
equals(other[, ignore]) Compare two biological sequences for equality.
find_features(feature_type[, min_length, ...]) Search the sequence for features
fraction_diff(other) Return fraction of positions that differ relative to other
fraction_same(other) Return fraction of positions that are the same relative to other
gap_alphabet() Return the set of characters defined as gaps.
gap_maps() Return tuples mapping b/w gapped and ungapped positions
gap_vector() Return list indicating positions containing gaps
has_quality() Return bool indicating presence of quality scores in the sequence.
has_unsupported_characters() Return bool indicating presence/absence of unsupported characters
index(subsequence) Return the position where subsequence first occurs
is_gap(char) Return True if char is in the gap_alphabet set
is_gapped() Return True if char(s) in gap_alphabet are present
is_reverse_complement(other) Return True if other is the reverse complement of self
is_valid() Return True if the sequence is valid
iupac_characters() Return the non-degenerate and degenerate characters.
iupac_degeneracies() Return the mapping of degenerate to non-degenerate characters.
iupac_degenerate_characters() Return the degenerate IUPAC characters.
iupac_standard_characters() Return the non-degenerate IUPAC DNA characters.
k_word_counts(k[, overlapping]) Get the counts of words of length k
k_word_frequencies(k[, overlapping]) Get the frequencies of words of length k
k_words(k[, overlapping]) Get the list of words of length k
lower() Convert the BiologicalSequence to lowercase
nondegenerates() Yield all nondegenerate versions of the sequence.
rc() Return the reverse complement of the NucleotideSequence
read(fp[, format]) Create a new DNASequence instance from a file.
regex_iter(regex[, retrieve_group_0]) Find patterns specified by regular expression
reverse_complement() Return the reverse complement of the NucleotideSequence
to_fasta([field_delimiter, terminal_character]) Return the sequence as a fasta-formatted string
unsupported_characters() Return the set of unsupported characters in the BiologicalSequence
upper() Convert the BiologicalSequence to uppercase
write(fp[, format]) Write an instance of DNASequence to a file.