skbio.sequence.RNA.frequencies

RNA.frequencies(chars=None, relative=False)[source]

Compute frequencies of characters in the sequence.

State: Experimental as of 0.4.1.

Parameters
  • chars (str or set of str, optional) – Characters to compute the frequencies of. May be a str containing a single character or a set of single-character strings. If None, frequencies will be computed for all characters present in the sequence.

  • relative (bool, optional) – If True, return the relative frequency of each character instead of its count. If chars is provided, relative frequencies will be computed with respect to the number of characters in the sequence, not the total count of characters observed in chars. Thus, the relative frequencies will not necessarily sum to 1.0 if chars is provided.

Returns

Frequencies of characters in the sequence.

Return type

dict

Raises
  • TypeError – If chars is not a str or set of str.

  • ValueError – If chars is not a single-character str or a set of single-character strings.

  • ValueError – If chars contains characters outside the allowable range of characters in a Sequence object.

Notes

If the sequence is empty (i.e., length zero), relative=True, and chars is provided, the relative frequency of each specified character will be np.nan.

If chars is not provided, this method is equivalent to, but faster than, seq.kmer_frequencies(k=1).

If chars is not provided, it is equivalent to, but faster than, passing chars=seq.observed_chars.

Examples

Compute character frequencies of a sequence:

>>> from pprint import pprint
>>> from skbio import Sequence
>>> seq = Sequence('AGAAGACC')
>>> freqs = seq.frequencies()
>>> pprint(freqs) # using pprint to display dict in sorted order
{'A': 4, 'C': 2, 'G': 2}

Compute relative character frequencies:

>>> freqs = seq.frequencies(relative=True)
>>> pprint(freqs)
{'A': 0.5, 'C': 0.25, 'G': 0.25}

Compute relative frequencies of characters A, C, and T:

>>> freqs = seq.frequencies(chars={'A', 'C', 'T'}, relative=True)
>>> pprint(freqs)
{'A': 0.5, 'C': 0.25, 'T': 0.0}

Note that since character T is not in the sequence we receive a relative frequency of 0.0. The relative frequencies of A and C are relative to the number of characters in the sequence (8), not the number of A and C characters (4 + 2 = 6).