skbio.alignment.Alignment.k_word_frequencies

Alignment.k_word_frequencies(k, overlapping=True)[source]

Return k-word frequencies for sequences in SequenceCollection.

Parameters:

k : int

The word length.

overlapping : bool, optional

Defines whether the k-words should be overlapping or not overlapping. This is only relevant when k > 1.

Returns:

list

List of collections.defaultdict objects, one for each sequence in the SequenceCollection, representing the frequency of each k-word in each sequence of the SequenceCollection.

Examples

>>> from skbio import SequenceCollection, DNA
>>> sequences = [DNA('A', id="seq1"),
...              DNA('AT', id="seq2"),
...              DNA('TTTT', id="seq3")]
>>> s1 = SequenceCollection(sequences)
>>> for freqs in s1.k_word_frequencies(1):
...     print(freqs)
defaultdict(<type 'float'>, {'A': 1.0})
defaultdict(<type 'float'>, {'A': 0.5, 'T': 0.5})
defaultdict(<type 'float'>, {'T': 1.0})
>>> for freqs in s1.k_word_frequencies(2):
...     print(freqs)
defaultdict(<type 'float'>, {})
defaultdict(<type 'float'>, {'AT': 1.0})
defaultdict(<type 'float'>, {'TT': 1.0})