skbio.parse.sequences.SequenceIterator

class skbio.parse.sequences.SequenceIterator(seq, qual=None, transform=None, valid_id=True, valid_length=True, **kwargs)[source]

Provide a standard API for interacting with sequence data

Provide a common interface for iterating over sequence data, including support for quality scores and transforms.

A transform method is a function that takes the state dict and modifies it in place. For instance, to reverse sequences, you could pass in the following function:

def reverse(st):
st[‘Sequence’]= st[‘Sequence’][::-1] st[‘Qual’] = st[‘Qual’][::-1] if st[‘Qual’] is not None else None

as transform. The primary intention is to support reverse complementing of sequences.

All subclasses of this object are expected to update the following in state:

SequenceID : str, the sequence identifier Sequence : str, the sequence itself QualID : str or None, the quality ID (for completeness) Qual : np.array or None, the quality scores

state is preallocated a single time to avoid repetitive allocations. What this means is that the object being yielded is updated in place. If an individual record needs to be tracked over time, then it is recommended that copies of the yielded data are made.

WARNING: The yielded obj is not safe for use with Python 2.7’s builtin zip method as the state is updated in place.

Parameters:

seq : list of open file-like objects

qual : list of open file-like objects or None

transform : function or None

If provided, this function will be passed state

valid_id : bool

If true, verify sequence and qual IDs are identical (if relevant)

valid_length : bool

If true, verify the length of the sequence and qual are the same (if relevant)

Attributes

seq  
qual  
state  
options  

Methods

initialize_state(item) Do nothing here as the subclassed iterators update state directly
transform(dec_self) Transform state if necessary
valid_lengths(dec_self)
validate_ids(dec_self)