skbio.alignment.TabularMSA.gap_frequencies

TabularMSA.gap_frequencies(axis='sequence', relative=False)[source]

Compute frequency of gap characters across an axis.

State: Experimental as of 0.4.1.

Parameters:

axis : {‘sequence’, ‘position’}, optional

Axis to compute gap character frequencies across. If ‘sequence’ or 0, frequencies are computed for each position in the MSA. If ‘position’ or 1, frequencies are computed for each sequence.

relative : bool, optional

If True, return the relative frequency of gap characters instead of the count.

Returns:

1D np.ndarray (int or float)

Vector of gap character frequencies across the specified axis. Will have int dtype if relative=False and float dtype if relative=True.

Raises:

ValueError

If axis is invalid.

Notes

If there are no positions in the MSA, axis='position', and relative=True, the relative frequency of gap characters in each sequence will be np.nan.

Examples

Compute frequency of gap characters for each position in the MSA (i.e., across the sequence axis):

>>> from skbio import DNA, TabularMSA
>>> msa = TabularMSA([DNA('ACG'),
...                   DNA('A--'),
...                   DNA('AC.'),
...                   DNA('AG.')])
>>> msa.gap_frequencies()
array([0, 1, 3])

Compute relative frequencies across the same axis:

>>> msa.gap_frequencies(relative=True)
array([ 0.  ,  0.25,  0.75])

Compute frequency of gap characters for each sequence (i.e., across the position axis):

>>> msa.gap_frequencies(axis='position')
array([0, 2, 1, 1])