skbio.alignment.TabularMSA

class skbio.alignment.TabularMSA(sequences, metadata=None, positional_metadata=None, minter=None, index=None)[source]

Store a multiple sequence alignment in tabular (row/column) form.

Parameters:

sequences : iterable of GrammaredSequence, TabularMSA

Aligned sequences in the MSA. Sequences must all be the same type and length. For example, sequences could be an iterable of DNA, RNA, or Protein sequences. If sequences is a TabularMSA, its metadata, positional_metadata, and index will be used unless overridden by parameters metadata, positional_metadata, and minter/index, respectively.

metadata : dict, optional

Arbitrary metadata which applies to the entire MSA. A shallow copy of the dict will be made.

positional_metadata : pd.DataFrame consumable, optional

Arbitrary metadata which applies to each position in the MSA. Must be able to be passed directly to pd.DataFrame constructor. Each column of metadata must be the same length as the number of positions in the MSA. A shallow copy of the positional metadata will be made.

minter : callable or metadata key, optional

If provided, defines an index label for each sequence in sequences. Can either be a callable accepting a single argument (each sequence) or a key into each sequence’s metadata attribute. Note that minter cannot be combined with index.

index : pd.Index consumable, optional

Index containing labels for sequences. Must be the same length as sequences. Must be able to be passed directly to pd.Index constructor. Note that index cannot be combined with minter.

Raises:

ValueError

If minter and index are both provided.

ValueError

If index is not the same length as sequences.

TypeError

If sequences contains an object that isn’t a GrammaredSequence.

TypeError

If sequences does not contain exactly the same type of GrammaredSequence objects.

ValueError

If sequences does not contain GrammaredSequence objects of the same length.

Notes

If neither minter nor index are provided, default index labels will be used: pd.RangeIndex(start=0, stop=len(sequences), step=1).

Examples

Create a TabularMSA object with three DNA sequences and four positions:

>>> from skbio import DNA, TabularMSA
>>> seqs = [
...     DNA('ACGT'),
...     DNA('AG-T'),
...     DNA('-C-T')
... ]
>>> msa = TabularMSA(seqs)
>>> msa
TabularMSA[DNA]
---------------------
Stats:
    sequence count: 3
    position count: 4
---------------------
ACGT
AG-T
-C-T

Since minter or index wasn’t provided, the MSA has default index labels:

>>> msa.index
RangeIndex(start=0, stop=3, step=1)

Create an MSA with metadata, positional metadata, and non-default index labels:

>>> msa = TabularMSA(seqs, index=['seq1', 'seq2', 'seq3'],
...                  metadata={'id': 'msa-id'},
...                  positional_metadata={'prob': [3, 4, 2, 2]})
>>> msa
TabularMSA[DNA]
--------------------------
Metadata:
    'id': 'msa-id'
Positional metadata:
    'prob': <dtype: int64>
Stats:
    sequence count: 3
    position count: 4
--------------------------
ACGT
AG-T
-C-T
>>> msa.index
Index(['seq1', 'seq2', 'seq3'], dtype='object')

Attributes

dtype Data type of the stored sequences.
iloc Slice the MSA on either axis by index position.
index Index containing labels along the sequence axis.
loc Slice the MSA on first axis by index label, second axis by position.
metadata dict containing metadata which applies to the entire object.
positional_metadata pd.DataFrame containing metadata along an axis.
shape Number of sequences (rows) and positions (columns).

Methods

bool(msa) Boolean indicating whether the MSA is empty or not.
x in msa Determine if an index label is in this MSA.
copy.copy(msa) Return a shallow copy of this MSA.
copy.deepcopy(msa) Return a deep copy of this MSA.
msa1 == msa2 Determine if this MSA is equal to another.
msa[x] Slice the MSA on either axis.
iter(msa) Iterate over sequences in the MSA.
len(msa) Number of sequences in the MSA.
msa1 != msa2 Determine if this MSA is not equal to another.
reversed(msa) Iterate in reverse order over sequences in the MSA.
str(msa) String summary of this MSA.
append(sequence[, minter, index, reset_index]) Append a sequence to the MSA without recomputing alignment.
consensus() Compute the majority consensus sequence for this MSA.
conservation([metric, degenerate_mode, gap_mode]) Apply metric to compute conservation for all alignment positions
extend(sequences[, minter, index, reset_index]) Extend this MSA with sequences without recomputing alignment.
from_dict(dictionary) Create a TabularMSA from a dict.
gap_frequencies([axis, relative]) Compute frequency of gap characters across an axis.
has_metadata() Determine if the object has metadata.
has_positional_metadata() Determine if the object has positional metadata.
iter_positions([reverse, ignore_metadata]) Iterate over positions (columns) in the MSA.
join(other[, how]) Join this MSA with another by sequence (horizontally).
read(file[, format]) Create a new TabularMSA instance from a file.
reassign_index([mapping, minter]) Reassign index labels to sequences in this MSA.
sort([level, ascending]) Sort sequences by index label in-place.
to_dict() Create a dict from this TabularMSA.
write(file[, format]) Write an instance of TabularMSA to a file.