class skbio.sequence.GrammaredSequence(sequence, metadata=None, positional_metadata=None, lowercase=False, validate=True)[source]

Store sequence data conforming to a character set.

This is an abstract base class (ABC) that cannot be instantiated.

This class is intended to be inherited from to create grammared sequences with custom alphabets.



If sequence characters are not in the character set [R202].

See also

DNA, RNA, Protein


[R202](1, 2) Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. May 10, 1985; 13(9): 3021-3030. A Cornish-Bowden


Note in the example below that properties either need to be static or use skbio’s classproperty decorator.

>>> from skbio.sequence import GrammaredSequence
>>> from skbio.util import classproperty
>>> class CustomSequence(GrammaredSequence):
...     @classproperty
...     def degenerate_map(cls):
...         return {"X": set("AB")}
...     @classproperty
...     def definite_chars(cls):
...         return set("ABC")
...     @classproperty
...     def default_gap_char(cls):
...         return '-'
...     @classproperty
...     def gap_chars(cls):
...         return set('-.')
>>> seq = CustomSequence('ABABACAC')
>>> seq
    length: 8
    has gaps: False
    has degenerates: False
    has definites: True
>>> seq = CustomSequence('XXXXXX')
>>> seq
    length: 6
    has gaps: False
    has degenerates: True
    has definites: False


values Array containing underlying sequence characters.
metadata dict containing metadata which applies to the entire object.
positional_metadata pd.DataFrame containing metadata along an axis.
alphabet Return valid characters.
gap_chars Return characters defined as gaps.
default_gap_char Gap character to use when constructing a new gapped sequence.
definite_chars Return definite characters.
degenerate_chars Return degenerate characters.
degenerate_map Return mapping of degenerate to definite characters.


bool(gs) Returns truth value (truthiness) of sequence.
x in gs Determine if a subsequence is contained in this sequence.
copy.copy(gs) Return a shallow copy of this sequence.
copy.deepcopy(gs) Return a deep copy of this sequence.
gs1 == gs2 Determine if this sequence is equal to another.
gs[x] Slice this sequence.
iter(gs) Iterate over positions in this sequence.
len(gs) Return the number of characters in this sequence.
gs1 != gs2 Determine if this sequence is not equal to another.
reversed(gs) Iterate over positions in this sequence in reverse order.
str(gs) Return sequence characters as a string.
concat(sequences[, how]) Concatenate an iterable of Sequence objects.
copy([deep]) Return a copy of this sequence.
count(subsequence[, start, end]) Count occurrences of a subsequence in this sequence.
definites() Find positions containing definite characters in the sequence.
degap() Return a new sequence with gap characters removed.
degenerates() Find positions containing degenerate characters in the sequence.
distance(other[, metric]) Compute the distance to another sequence.
expand_degenerates() Yield all possible definite versions of the sequence.
find_motifs(motif_type[, min_length, ignore]) Search the biological sequence for motifs.
find_with_regex(regex[, ignore]) Generate slices for patterns matched by a regular expression.
frequencies([chars, relative]) Compute frequencies of characters in the sequence.
gaps() Find positions containing gaps in the biological sequence.
has_definites() Determine if sequence contains one or more definite characters
has_degenerates() Determine if sequence contains one or more degenerate characters.
has_gaps() Determine if the sequence contains one or more gap characters.
has_metadata() Determine if the object has metadata.
has_nondegenerates() Determine if sequence contains one or more non-degenerate characters
has_positional_metadata() Determine if the object has positional metadata.
index(subsequence[, start, end]) Find position where subsequence first occurs in the sequence.
iter_contiguous(included[, min_length, invert]) Yield contiguous subsequences based on included.
iter_kmers(k[, overlap]) Generate kmers of length k from this sequence.
kmer_frequencies(k[, overlap, relative]) Return counts of words of length k from this sequence.
lowercase(lowercase) Return a case-sensitive string representation of the sequence.
match_frequency(other[, relative]) Return count of positions that are the same between two sequences.
matches(other) Find positions that match with another sequence.
mismatch_frequency(other[, relative]) Return count of positions that differ between two sequences.
mismatches(other) Find positions that do not match with another sequence.
nondegenerates() Find positions containing non-degenerate characters in the sequence.
read(file[, format]) Create a new Sequence instance from a file.
replace(where, character) Replace values in this sequence with a different character.
to_regex() Return regular expression object that accounts for degenerate chars.
write(file[, format]) Write an instance of Sequence to a file.