skbio.alignment.TabularMSA.iloc

TabularMSA.iloc

Slice the MSA on either axis by index position.

State: Experimental as of 0.4.1.

This will return an object with the following interface:

msa.iloc[seq_idx]
msa.iloc[seq_idx, pos_idx]
msa.iloc(axis='sequence')[seq_idx]
msa.iloc(axis='position')[pos_idx]
Parameters:

seq_idx : int, slice, iterable (int and slice), 1D array_like (bool)

Slice the first axis of the MSA. When this value is a scalar, a sequence of msa.dtype will be returned. This may be further sliced by pos_idx.

pos_idx : (same as seq_idx), optional

Slice the second axis of the MSA. When this value is a scalar, a sequence of type skbio.sequence.Sequence will be returned. This represents a column of the MSA and may have been additionally sliced by seq_idx.

axis : {‘sequence’, ‘position’, 0, 1, None}, optional

Limit the axis to slice on. When set, a tuple as the argument will no longer be split into seq_idx and pos_idx.

Returns:

TabularMSA, GrammaredSequence, Sequence

A TabularMSA is returned when seq_idx and pos_idx are non-scalars. A GrammaredSequence of type msa.dtype is returned when seq_idx is a scalar (this object will match the dtype of the MSA). A Sequence is returned when seq_idx is non-scalar and pos_idx is scalar.

See also

__getitem__, loc

Notes

If the slice operation results in a TabularMSA without any sequences, the MSA’s positional_metadata will be unset.

Examples

First we need to set up an MSA to slice:

>>> from skbio import TabularMSA, DNA
>>> msa = TabularMSA([DNA("ACGT"), DNA("A-GT"), DNA("AC-T"),
...                   DNA("ACGA")])
>>> msa
TabularMSA[DNA]
---------------------
Stats:
    sequence count: 4
    position count: 4
---------------------
ACGT
A-GT
AC-T
ACGA

When we slice by a scalar we get the original sequence back out of the MSA:

>>> msa.iloc[1]
DNA
--------------------------
Stats:
    length: 4
    has gaps: True
    has degenerates: False
    has definites: True
    GC-content: 33.33%
--------------------------
0 A-GT

Similarly when we slice the second axis by a scalar we get a column of the MSA:

>>> msa.iloc[..., 1]
Sequence
-------------
Stats:
    length: 4
-------------
0 C-CC

Note: we return an skbio.Sequence object because the column of an alignment has no biological meaning and many operations defined for the MSA’s sequence dtype would be meaningless.

When we slice both axes by a scalar, operations are applied left to right:

>>> msa.iloc[0, 0]
DNA
--------------------------
Stats:
    length: 1
    has gaps: False
    has degenerates: False
    has definites: True
    GC-content: 0.00%
--------------------------
0 A

In other words, it exactly matches slicing the resulting sequence object directly:

>>> msa.iloc[0][0]
DNA
--------------------------
Stats:
    length: 1
    has gaps: False
    has degenerates: False
    has definites: True
    GC-content: 0.00%
--------------------------
0 A

When our slice is non-scalar we get back an MSA of the same dtype:

>>> msa.iloc[[0, 2]]
TabularMSA[DNA]
---------------------
Stats:
    sequence count: 2
    position count: 4
---------------------
ACGT
AC-T

We can similarly slice out a column of that:

>>> msa.iloc[[0, 2], 2]
Sequence
-------------
Stats:
    length: 2
-------------
0 G-

Slice syntax works as well:

>>> msa.iloc[:3]
TabularMSA[DNA]
---------------------
Stats:
    sequence count: 3
    position count: 4
---------------------
ACGT
A-GT
AC-T

We can also use boolean vectors:

>>> msa.iloc[[True, False, False, True], 2:3]
TabularMSA[DNA]
---------------------
Stats:
    sequence count: 2
    position count: 1
---------------------
G
G

Here we sliced the first axis by a boolean vector, but then restricted the columns to a single column. Because the second axis was given a nonscalar we still recieve an MSA even though only one column is present.