GrammaredSequence.
iter_contiguous
(included, min_length=1, invert=False)[source]¶Yield contiguous subsequences based on included.
State: Stable as of 0.4.0.
Parameters: | included : 1D array_like (bool) or iterable (slices or ints)
min_length : int, optional
invert : bool, optional
|
---|---|
Yields: | Sequence
|
Notes
If slices provide adjacent ranges, then they will be considered the same contiguous subsequence.
Examples
Here we use iter_contiguous to find all of the contiguous ungapped sequences using a boolean vector derived from our DNA sequence.
>>> from skbio import DNA
>>> s = DNA('AAA--TT-CCCC-G-')
>>> no_gaps = ~s.gaps()
>>> for ungapped_subsequence in s.iter_contiguous(no_gaps,
... min_length=2):
... print(ungapped_subsequence)
AAA
TT
CCCC
Note how the last potential subsequence was skipped because it would have been smaller than our min_length which was set to 2.
We can also use iter_contiguous on a generator of slices as is produced by find_motifs (and find_with_regex).
>>> from skbio import Protein
>>> s = Protein('ACDFNASANFTACGNPNRTESL')
>>> for subseq in s.iter_contiguous(s.find_motifs('N-glycosylation')):
... print(subseq)
NASANFTA
NRTE
Note how the first subsequence contains two N-glycosylation sites. This happened because they were contiguous.