skbio.sequence.DNA.kmer_frequencies

DNA.kmer_frequencies(k, overlap=True, relative=False)[source]

Return counts of words of length k from this sequence.

State: Stable as of 0.4.0.

Parameters:

k : int

The word length.

overlap : bool, optional

Defines whether the kmers should be overlapping or not.

relative : bool, optional

If True, return the relative frequency of each kmer instead of its count.

Returns:

dict

Frequencies of words of length k contained in this sequence.

Raises:

ValueError

If k is less than 1.

Examples

>>> from pprint import pprint
>>> from skbio import Sequence
>>> s = Sequence('ACACATTTATTA')
>>> freqs = s.kmer_frequencies(3, overlap=False)
>>> pprint(freqs) # using pprint to display dict in sorted order
{'ACA': 1, 'CAT': 1, 'TTA': 2}
>>> freqs = s.kmer_frequencies(3, relative=True, overlap=False)
>>> pprint(freqs)
{'ACA': 0.25, 'CAT': 0.25, 'TTA': 0.5}