skbio.sequence.Sequence.frequencies¶
-
Sequence.
frequencies
(chars=None, relative=False)[source]¶ Compute frequencies of characters in the sequence.
State: Experimental as of 0.4.1.
- Parameters
chars (str or set of str, optional) – Characters to compute the frequencies of. May be a
str
containing a single character or aset
of single-character strings. IfNone
, frequencies will be computed for all characters present in the sequence.relative (bool, optional) – If
True
, return the relative frequency of each character instead of its count. If chars is provided, relative frequencies will be computed with respect to the number of characters in the sequence, not the total count of characters observed in chars. Thus, the relative frequencies will not necessarily sum to 1.0 if chars is provided.
- Returns
Frequencies of characters in the sequence.
- Return type
- Raises
TypeError – If chars is not a
str
orset
ofstr
.ValueError – If chars is not a single-character
str
or aset
of single-character strings.ValueError – If chars contains characters outside the allowable range of characters in a
Sequence
object.
See also
Notes
If the sequence is empty (i.e., length zero),
relative=True
, and chars is provided, the relative frequency of each specified character will benp.nan
.If chars is not provided, this method is equivalent to, but faster than,
seq.kmer_frequencies(k=1)
.If chars is not provided, it is equivalent to, but faster than, passing
chars=seq.observed_chars
.Examples
Compute character frequencies of a sequence:
>>> from pprint import pprint >>> from skbio import Sequence >>> seq = Sequence('AGAAGACC') >>> freqs = seq.frequencies() >>> pprint(freqs) # using pprint to display dict in sorted order {'A': 4, 'C': 2, 'G': 2}
Compute relative character frequencies:
>>> freqs = seq.frequencies(relative=True) >>> pprint(freqs) {'A': 0.5, 'C': 0.25, 'G': 0.25}
Compute relative frequencies of characters A, C, and T:
>>> freqs = seq.frequencies(chars={'A', 'C', 'T'}, relative=True) >>> pprint(freqs) {'A': 0.5, 'C': 0.25, 'T': 0.0}
Note that since character T is not in the sequence we receive a relative frequency of 0.0. The relative frequencies of A and C are relative to the number of characters in the sequence (8), not the number of A and C characters (4 + 2 = 6).