skbio.core.sequence.DNASequence.gap_maps

DNASequence.gap_maps()[source]

Return tuples mapping b/w gapped and ungapped positions

Returns:

tuple containing two lists

The first list is the length of the ungapped sequence, and each entry is the position of that base in the gapped sequence. The second list is the length of the gapped sequence, and each entry is either None (if that position represents a gap) or the position of that base in the ungapped sequence.

See also

gap_vector

Notes

Visual aid is useful here. Imagine we have BiologicalSequence('-ACCGA-TA-'). The position numbers in the ungapped sequence and gapped sequence will be as follows:

 0123456
 ACCGATA
 |||||\
-ACCGA-TA-
0123456789

So, in the first list, position 0 maps to position 1, position 1 maps to position 2, position 5 maps to position 7, ... And, in the second list, position 0 doesn’t map to anything (so it’s None), position 1 maps to position 0, ...

Examples

>>> from skbio.core.sequence import BiologicalSequence
>>> s = BiologicalSequence('-ACCGA-TA-')
>>> m = s.gap_maps()
>>> m[0]
[1, 2, 3, 4, 5, 7, 8]
>>> m[1]
[None, 0, 1, 2, 3, 4, None, 5, 6, None]