skbio.sequence.DNA.gc_frequency

DNA.gc_frequency(relative=False)[source]

Calculate frequency of G’s and C’s in the sequence.

State: Stable as of 0.4.0.

This calculates the minimum GC frequency, which corresponds to IUPAC characters G, C, and S (which stands for G or C).

Parameters:

relative : bool, optional

If False return the frequency of G, C, and S characters (ie the count). If True return the relative frequency, ie the proportion of G, C, and S characters in the sequence. In this case the sequence will also be degapped before the operation, so gap characters will not be included when calculating the length of the sequence.

Returns:

int or float

Either frequency (count) or relative frequency (proportion), depending on relative.

See also

gc_content

Examples

>>> from skbio import DNA
>>> DNA('ACGT').gc_frequency()
2
>>> DNA('ACGT').gc_frequency(relative=True)
0.5
>>> DNA('ACGT--..').gc_frequency(relative=True)
0.5
>>> DNA('--..').gc_frequency(relative=True)
0

S means G or C, so it counts:

>>> DNA('ASST').gc_frequency()
2

Other degenerates don’t count:

>>> DNA('RYKMBDHVN').gc_frequency()
0