skbio.sequence.GrammaredSequence

class skbio.sequence.GrammaredSequence(sequence, metadata=None, positional_metadata=None, lowercase=False, validate=True)[source]

Store sequence data conforming to a character set.

This is an abstract base class (ABC) that cannot be instantiated.

This class is intended to be inherited from to create grammared sequences with custom alphabets.

Raises:

ValueError

If sequence characters are not in the character set [R202].

See also

DNA, RNA, Protein

References

[R202](1, 2) Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. May 10, 1985; 13(9): 3021-3030. A Cornish-Bowden

Examples

Note in the example below that properties either need to be static or use skbio’s classproperty decorator.

>>> from skbio.sequence import GrammaredSequence
>>> from skbio.util import classproperty
>>> class CustomSequence(GrammaredSequence):
...     @classproperty
...     def degenerate_map(cls):
...         return {"X": set("AB")}
...
...     @classproperty
...     def definite_chars(cls):
...         return set("ABC")
...
...
...     @classproperty
...     def default_gap_char(cls):
...         return '-'
...
...     @classproperty
...     def gap_chars(cls):
...         return set('-.')
>>> seq = CustomSequence('ABABACAC')
>>> seq
CustomSequence
--------------------------
Stats:
    length: 8
    has gaps: False
    has degenerates: False
    has definites: True
--------------------------
0 ABABACAC
>>> seq = CustomSequence('XXXXXX')
>>> seq
CustomSequence
-------------------------
Stats:
    length: 6
    has gaps: False
    has degenerates: True
    has definites: False
-------------------------
0 XXXXXX

Attributes

values Array containing underlying sequence characters.
metadata dict containing metadata which applies to the entire object.
positional_metadata pd.DataFrame containing metadata along an axis.
alphabet Return valid characters.
gap_chars Return characters defined as gaps.
default_gap_char Gap character to use when constructing a new gapped sequence.
definite_chars Return definite characters.
degenerate_chars Return degenerate characters.
degenerate_map Return mapping of degenerate to definite characters.

Methods

bool(gs) Returns truth value (truthiness) of sequence.
x in gs Determine if a subsequence is contained in this sequence.
copy.copy(gs) Return a shallow copy of this sequence.
copy.deepcopy(gs) Return a deep copy of this sequence.
gs1 == gs2 Determine if this sequence is equal to another.
gs[x] Slice this sequence.
iter(gs) Iterate over positions in this sequence.
len(gs) Return the number of characters in this sequence.
gs1 != gs2 Determine if this sequence is not equal to another.
reversed(gs) Iterate over positions in this sequence in reverse order.
str(gs) Return sequence characters as a string.
concat(sequences[, how]) Concatenate an iterable of Sequence objects.
copy([deep]) Return a copy of this sequence.
count(subsequence[, start, end]) Count occurrences of a subsequence in this sequence.
definites() Find positions containing definite characters in the sequence.
degap() Return a new sequence with gap characters removed.
degenerates() Find positions containing degenerate characters in the sequence.
distance(other[, metric]) Compute the distance to another sequence.
expand_degenerates() Yield all possible definite versions of the sequence.
find_motifs(motif_type[, min_length, ignore]) Search the biological sequence for motifs.
find_with_regex(regex[, ignore]) Generate slices for patterns matched by a regular expression.
frequencies([chars, relative]) Compute frequencies of characters in the sequence.
gaps() Find positions containing gaps in the biological sequence.
has_definites() Determine if sequence contains one or more definite characters
has_degenerates() Determine if sequence contains one or more degenerate characters.
has_gaps() Determine if the sequence contains one or more gap characters.
has_metadata() Determine if the object has metadata.
has_nondegenerates() Determine if sequence contains one or more non-degenerate characters
has_positional_metadata() Determine if the object has positional metadata.
index(subsequence[, start, end]) Find position where subsequence first occurs in the sequence.
iter_contiguous(included[, min_length, invert]) Yield contiguous subsequences based on included.
iter_kmers(k[, overlap]) Generate kmers of length k from this sequence.
kmer_frequencies(k[, overlap, relative]) Return counts of words of length k from this sequence.
lowercase(lowercase) Return a case-sensitive string representation of the sequence.
match_frequency(other[, relative]) Return count of positions that are the same between two sequences.
matches(other) Find positions that match with another sequence.
mismatch_frequency(other[, relative]) Return count of positions that differ between two sequences.
mismatches(other) Find positions that do not match with another sequence.
nondegenerates() Find positions containing non-degenerate characters in the sequence.
read(file[, format]) Create a new Sequence instance from a file.
replace(where, character) Replace values in this sequence with a different character.
to_regex() Return regular expression object that accounts for degenerate chars.
write(file[, format]) Write an instance of Sequence to a file.