skbio.alignment.TabularMSA.extend

TabularMSA.extend(sequences, minter=None, index=None, reset_index=False)[source]

Extend this MSA with sequences without recomputing alignment.

State: Experimental as of 0.4.1.

Parameters
  • sequences (iterable of GrammaredSequence) – Sequences to be appended. Must match the dtype of the MSA and the number of positions in the MSA.

  • minter (callable or metadata key, optional) – Used to create index labels for the sequences being appended. If callable, it generates a label directly. Otherwise it’s treated as a key into the sequence metadata. Note that minter cannot be combined with index nor reset_index.

  • index (pd.Index consumable, optional) – Index labels to use for the appended sequences. Must be the same length as sequences. Must be able to be passed directly to pd.Index constructor. Note that index cannot be combined with minter nor reset_index.

  • reset_index (bool, optional) – If True, this MSA’s index is reset to the TabularMSA constructor’s default after extending. Note that reset_index cannot be combined with minter nor index.

Raises
  • ValueError – If exactly one choice of minter, index, or reset_index is not provided.

  • ValueError – If index is not the same length as sequences.

  • TypeError – If sequences contains an object that isn’t a GrammaredSequence.

  • TypeError – If sequences contains a type that does not match the dtype of the MSA.

  • ValueError – If the length of a sequence does not match the number of positions in the MSA.

Notes

The MSA is not automatically re-aligned when appending sequences. Therefore, this operation is not necessarily meaningful on its own.

Examples

Create an MSA with a single sequence labeled 'seq1':

>>> from skbio import DNA, TabularMSA
>>> msa = TabularMSA([DNA('ACGT')], index=['seq1'])
>>> msa
TabularMSA[DNA]
---------------------
Stats:
    sequence count: 1
    position count: 4
---------------------
ACGT
>>> msa.index
Index(['seq1'], dtype='object')

Extend the MSA with sequences, providing their index labels via index:

>>> msa.extend([DNA('AG-T'), DNA('-G-T')], index=['seq2', 'seq3'])
>>> msa
TabularMSA[DNA]
---------------------
Stats:
    sequence count: 3
    position count: 4
---------------------
ACGT
AG-T
-G-T
>>> msa.index
Index(['seq1', 'seq2', 'seq3'], dtype='object')

Extend with more sequences, this time resetting the MSA’s index labels to the default with reset_index. Note that since the MSA’s index is reset, we do not need to provide index labels for the new sequences via index or minter:

>>> msa.extend([DNA('ACGA'), DNA('AC-T'), DNA('----')],
...            reset_index=True)
>>> msa
TabularMSA[DNA]
---------------------
Stats:
    sequence count: 6
    position count: 4
---------------------
ACGT
AG-T
...
AC-T
----
>>> msa.index
RangeIndex(start=0, stop=6, step=1)