TabularMSA.
join
(other, how='strict')[source]¶Join this MSA with another by sequence (horizontally).
Sequences will be joined by index labels. MSA positional_metadata
will be joined by columns. Use how to control join behavior.
Alignment is not recomputed during join operation (see Notes section for details).
Parameters: | other : TabularMSA
how : {‘strict’, ‘inner’, ‘outer’, ‘left’, ‘right’}, optional
|
---|---|
Returns: | TabularMSA
|
Raises: | ValueError
ValueError
ValueError
ValueError
TypeError
TypeError
|
See also
Notes
The join operation does not automatically perform re-alignment; sequences are simply joined together. Therefore, this operation is not necessarily meaningful on its own.
The index labels of this MSA must be unique. Likewise, the index labels of other must be unique.
The MSA-wide and per-sequence metadata (TabularMSA.metadata
and
Sequence.metadata
) are not retained on the joined TabularMSA
.
The positional metadata of the sequences will be outer-joined,
regardless of how (using Sequence.concat(how='outer')
).
If the join operation results in a TabularMSA
without any
sequences, the MSA’s positional_metadata
will not be set.
Examples
Join MSAs by sequence:
>>> from skbio import DNA, TabularMSA
>>> msa1 = TabularMSA([DNA('AC'),
... DNA('A-')])
>>> msa2 = TabularMSA([DNA('G-T'),
... DNA('T--')])
>>> joined = msa1.join(msa2)
>>> joined
TabularMSA[DNA]
---------------------
Stats:
sequence count: 2
position count: 5
---------------------
ACG-T
A-T--
Sequences are joined based on MSA index labels:
>>> msa1 = TabularMSA([DNA('AC'),
... DNA('A-')], index=['a', 'b'])
>>> msa2 = TabularMSA([DNA('G-T'),
... DNA('T--')], index=['b', 'a'])
>>> joined = msa1.join(msa2)
>>> joined
TabularMSA[DNA]
---------------------
Stats:
sequence count: 2
position count: 5
---------------------
ACT--
A-G-T
>>> joined.index
Index(['a', 'b'], dtype='object')
By default both MSA indexes must match. Use how
to specify an inner
join:
>>> msa1 = TabularMSA([DNA('AC'),
... DNA('A-'),
... DNA('-C')], index=['a', 'b', 'c'],
... positional_metadata={'col1': [42, 43],
... 'col2': [1, 2]})
>>> msa2 = TabularMSA([DNA('G-T'),
... DNA('T--'),
... DNA('ACG')], index=['b', 'a', 'z'],
... positional_metadata={'col2': [3, 4, 5],
... 'col3': ['f', 'o', 'o']})
>>> joined = msa1.join(msa2, how='inner')
>>> joined
TabularMSA[DNA]
--------------------------
Positional metadata:
'col2': <dtype: int64>
Stats:
sequence count: 2
position count: 5
--------------------------
A-G-T
ACT--
>>> joined.index
Index(['b', 'a'], dtype='object')
>>> joined.positional_metadata
col2
0 1
1 2
2 3
3 4
4 5
When performing an outer join ('outer'
, 'left'
, or
'right'
), unshared sequences are padded with gaps and unshared
positional_metadata
columns are padded with NaN:
>>> joined = msa1.join(msa2, how='outer')
>>> joined
TabularMSA[DNA]
----------------------------
Positional metadata:
'col1': <dtype: float64>
'col2': <dtype: int64>
'col3': <dtype: object>
Stats:
sequence count: 4
position count: 5
----------------------------
ACT--
A-G-T
-C---
--ACG
>>> joined.index
Index(['a', 'b', 'c', 'z'], dtype='object')
>>> joined.positional_metadata
col1 col2 col3
0 42.0 1 NaN
1 43.0 2 NaN
2 NaN 3 f
3 NaN 4 o
4 NaN 5 o