skbio.alignment.Alignment.update_ids

Alignment.update_ids(ids=None, fn=None, prefix='')[source]

Update sequence IDs on the sequence collection.

IDs can be updated by providing a sequence of new IDs (ids) or a function that maps current IDs to new IDs (fn).

Default behavior (if ids and fn are not provided) is to create new IDs that are unique postive integers (starting at 1) cast as strings, optionally preceded by prefix. For example, ('1', '2', '3', ...).

Parameters:

ids : sequence of str, optional

New IDs to update on the sequence collection.

fn : function, optional

Function accepting a sequence of current IDs and returning a sequence of new IDs to update on the sequence collection.

prefix : str, optional

If ids and fn are both None, prefix is prepended to each new integer-based ID (see description of default behavior above).

Returns:

SequenceCollection

New SequenceCollection (or subclass) containing sequences with updated IDs.

dict

Mapping of new IDs to old IDs.

Raises:

SequenceCollectionError

If both ids and fn are provided, prefix is provided with either ids or fn, or the number of new IDs does not match the number of sequences in the sequence collection.

Notes

The default behavior can be useful when writing sequences out for use with programs that are picky about their sequence IDs (e.g., RAxML [R88]).

References

[R88](1, 2) RAxML Version 8: A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies”. In Bioinformatics, 2014

Examples

Define a sequence collection containing two sequences with IDs “abc” and “def”:

>>> from skbio import DNA, SequenceCollection
>>> sequences = [DNA('A--CCGT.', id="abc"),
...              DNA('.AACCG-GT.', id="def")]
>>> s1 = SequenceCollection(sequences)
>>> s1.ids()
['abc', 'def']

Update the IDs in the sequence collection, obtaining a new sequence collection with IDs that are integer-based:

>>> s2, new_to_old_ids = s1.update_ids()
>>> s2.ids()
['1', '2']

Alternatively, we can specify a function to map the current IDs to new IDs. Let’s define a function that appends '-new' to each ID:

>>> def id_mapper(ids):
...     return [id_ + '-new' for id_ in ids]
>>> s3, new_to_old_ids = s1.update_ids(fn=id_mapper)
>>> s3.ids()
['abc-new', 'def-new']

We can also directly update the IDs with a new sequence of IDs:

>>> s4, new_to_old_ids = s1.update_ids(ids=['ghi', 'jkl'])
>>> s4.ids()
['ghi', 'jkl']