# skbio.sequence.Protein.iter_contiguous¶

Protein.iter_contiguous(included, min_length=1, invert=False)[source]

Yield contiguous subsequences based on included.

State: Stable as of 0.4.0.

Parameters
• included (1D array_like (bool) or iterable (slices or ints)) – included is transformed into a flat boolean vector where each position will either be included or skipped. All contiguous included positions will be yielded as a single region.

• min_length (int, optional) – The minimum length of a subsequence for it to be yielded. Default is 1.

• invert (bool, optional) – Whether to invert included such that it describes what should be skipped instead of included. Default is False.

Yields

Sequence – Contiguous subsequence as indicated by included.

Notes

If slices provide adjacent ranges, then they will be considered the same contiguous subsequence.

Examples

Here we use iter_contiguous to find all of the contiguous ungapped sequences using a boolean vector derived from our DNA sequence.

>>> from skbio import DNA
>>> s = DNA('AAA--TT-CCCC-G-')
>>> no_gaps = ~s.gaps()
>>> for ungapped_subsequence in s.iter_contiguous(no_gaps,
...                                               min_length=2):
...     print(ungapped_subsequence)
AAA
TT
CCCC


Note how the last potential subsequence was skipped because it would have been smaller than our min_length which was set to 2.

We can also use iter_contiguous on a generator of slices as is produced by find_motifs (and find_with_regex).

>>> from skbio import Protein
>>> s = Protein('ACDFNASANFTACGNPNRTESL')
>>> for subseq in s.iter_contiguous(s.find_motifs('N-glycosylation')):
...     print(subseq)
NASANFTA
NRTE


Note how the first subsequence contains two N-glycosylation sites. This happened because they were contiguous.