skbio.alignment.Alignment.position_entropies

Alignment.position_entropies(base=None, nan_on_non_standard_chars=True)[source]

Return Shannon entropy of positions in Alignment

Parameters:

base : float, optional

log base for entropy calculation. If not passed, default will be e (i.e., natural log will be computed).

nan_on_non_standard_chars : bool, optional

if True, the entropy at positions containing characters outside of the first sequence’s iupac_standard_characters will be np.nan. This is useful, and the default behavior, as it’s not clear how a gap or degenerate character should contribute to a positional entropy. This issue was described in [R86].

Returns:

list

List of floats of Shannon entropy at Alignment positions. Shannon entropy is defined in [R87].

References

[R86](1, 2) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Hertz GZ, Stormo GD. Bioinformatics. 1999 Jul-Aug;15(7-8):563-77.
[R87](1, 2) A Mathematical Theory of Communication CE Shannon The Bell System Technical Journal (1948).

Examples

>>> from skbio.alignment import Alignment
>>> from skbio.sequence import DNA
>>> sequences = [DNA('AC--', id="seq1"),
...              DNA('AT-C', id="seq2"),
...              DNA('TT-C', id="seq3")]
>>> a1 = Alignment(sequences)
>>> print(a1.position_entropies())
[0.63651416829481278, 0.63651416829481278, nan, nan]