class skbio.sequence.ProteinSequence(sequence, id='', description='', quality=None, validate=False)[source]

Base class for protein sequences.

A ProteinSequence is a BiologicalSequence containing only characters used in the IUPAC protein lexicon.


All uppercase and lowercase IUPAC protein characters are supported.


description Description of the biological sequence.
id ID of the biological sequence.
quality Quality scores of the characters in the biological sequence.
sequence String containing underlying biological sequence characters.


__contains__(other) The in operator.
__eq__(other) The equality operator.
__getitem__(i) The indexing operator.
__hash__() The hash operator.
__iter__() The iter operator.
__len__() The len operator.
__ne__(other) The inequality operator.
__repr__() The repr method.
__reversed__() The reversed operator.
__str__() The str operator
alphabet() Return the set of characters allowed in a BiologicalSequence.
copy(**kwargs) Return a copy of the current biological sequence.
count(subsequence) Returns the number of occurences of subsequence.
degap() Returns a new BiologicalSequence with gap characters removed.
distance(other[, distance_fn]) Returns the distance to other
equals(other[, ignore]) Compare two biological sequences for equality.
fraction_diff(other) Return fraction of positions that differ relative to other
fraction_same(other) Return fraction of positions that are the same relative to other
gap_alphabet() Return the set of characters defined as gaps.
gap_maps() Return tuples mapping b/w gapped and ungapped positions
gap_vector() Return list indicating positions containing gaps
has_quality() Return bool indicating presence of quality scores in the sequence.
has_unsupported_characters() Return bool indicating presence/absence of unsupported characters
index(subsequence) Return the position where subsequence first occurs
is_gap(char) Return True if char is in the gap_alphabet set
is_gapped() Return True if char(s) in gap_alphabet are present
is_valid() Return True if the sequence is valid
iupac_characters() Return the non-degenerate and degenerate characters.
iupac_degeneracies() Return the mapping of degenerate to non-degenerate characters.
iupac_degenerate_characters() Return the degenerate IUPAC characters.
iupac_standard_characters() Return the non-degenerate IUPAC protein characters.
k_word_counts(k[, overlapping]) Get the counts of words of length k
k_word_frequencies(k[, overlapping]) Get the frequencies of words of length k
k_words(k[, overlapping]) Get the list of words of length k
lower() Convert the BiologicalSequence to lowercase
nondegenerates() Yield all nondegenerate versions of the sequence.
read(fp[, format]) Create a new ProteinSequence instance from a file.
regex_iter(regex[, retrieve_group_0]) Find patterns specified by regular expression
to_fasta([field_delimiter, terminal_character]) Return the sequence as a fasta-formatted string
unsupported_characters() Return the set of unsupported characters in the BiologicalSequence
upper() Convert the BiologicalSequence to uppercase
write(fp[, format]) Write an instance of ProteinSequence to a file.