skbio.sequence.RNA.kmer_frequencies

RNA.kmer_frequencies(k, overlap=True, relative=False)[source]

Return counts of words of length k from this sequence.

State: Stable as of 0.4.0.

Parameters
  • k (int) – The word length.

  • overlap (bool, optional) – Defines whether the kmers should be overlapping or not.

  • relative (bool, optional) – If True, return the relative frequency of each kmer instead of its count.

Returns

Frequencies of words of length k contained in this sequence.

Return type

dict

Raises

ValueError – If k is less than 1.

Examples

>>> from pprint import pprint
>>> from skbio import Sequence
>>> s = Sequence('ACACATTTATTA')
>>> freqs = s.kmer_frequencies(3, overlap=False)
>>> pprint(freqs) # using pprint to display dict in sorted order
{'ACA': 1, 'CAT': 1, 'TTA': 2}
>>> freqs = s.kmer_frequencies(3, relative=True, overlap=False)
>>> pprint(freqs)
{'ACA': 0.25, 'CAT': 0.25, 'TTA': 0.5}