skbio.alignment.TabularMSA.gap_frequencies

TabularMSA.gap_frequencies(axis='sequence', relative=False)[source]

Compute frequency of gap characters across an axis.

State: Experimental as of 0.4.1.

Parameters
  • axis ({'sequence', 'position'}, optional) – Axis to compute gap character frequencies across. If ‘sequence’ or 0, frequencies are computed for each position in the MSA. If ‘position’ or 1, frequencies are computed for each sequence.

  • relative (bool, optional) – If True, return the relative frequency of gap characters instead of the count.

Returns

Vector of gap character frequencies across the specified axis. Will have int dtype if relative=False and float dtype if relative=True.

Return type

1D np.ndarray (int or float)

Raises

ValueError – If axis is invalid.

Notes

If there are no positions in the MSA, axis='position', and relative=True, the relative frequency of gap characters in each sequence will be np.nan.

Examples

Compute frequency of gap characters for each position in the MSA (i.e., across the sequence axis):

>>> from skbio import DNA, TabularMSA
>>> msa = TabularMSA([DNA('ACG'),
...                   DNA('A--'),
...                   DNA('AC.'),
...                   DNA('AG.')])
>>> msa.gap_frequencies()
array([0, 1, 3])

Compute relative frequencies across the same axis:

>>> msa.gap_frequencies(relative=True)
array([ 0.  ,  0.25,  0.75])

Compute frequency of gap characters for each sequence (i.e., across the position axis):

>>> msa.gap_frequencies(axis='position')
array([0, 2, 1, 1])