skbio.alignment.StockholmAlignment

class skbio.alignment.StockholmAlignment(seqs, gf=None, gs=None, gr=None, gc=None, validate=False)[source]

Contains the metadata information in a Stockholm file alignment

Parameters:

seqs : list of skbio.sequence.BiologicalSequence objects

The skbio.sequence.BiologicalSequence objects to load.

gf : dict, optional

GF info in the format {feature: info}

gs : dict of dicts, optional

GS info in the format {feature: {seqlabel: info}}

gr : dict of dicts, optional

GR info in the format {feature: {seqlabel: info}}

gc : dict, optional

GC info in the format {feature: info}

Notes

The Stockholm format is described in [R95] and [R96].

If there are multiple references, include information for each R* line as a list, with reference 0 information in position 0 for all lists, etc. This list will be broken up into the appropriate bits for each reference on string formatting.

If there are multiple trees included, use a list to store identifiers and trees, with position 0 holding identifier for tree in position 0, etc.

References

[R95](1, 2) http://sonnhammer.sbc.su.se/Stockholm.html
[R96](1, 2) http://en.wikipedia.org/wiki/Stockholm_format

Examples

Assume we have a basic stockholm file with the following contents:

# STOCKHOLM 1.0
seq1         ACC--G-GGGU
seq2         TCC--G-GGGA
#=GC SS_cons (((.....)))
//
>>> from skbio.sequence import RNA
>>> from skbio.alignment import StockholmAlignment
>>> from StringIO import StringIO
>>> sto_in = StringIO("# STOCKHOLM 1.0\n"
...                   "seq1     ACC--G-GGGU\nseq2     TCC--G-GGGA\n"
...                   "#=GC SS_cons (((.....)))\n//")
>>> sto_records = StockholmAlignment.from_file(sto_in, RNA)
>>> sto = next(sto_records)
>>> print(sto)
# STOCKHOLM 1.0
seq1          ACC--G-GGGU
seq2          TCC--G-GGGA
#=GC SS_cons  (((.....)))
//
>>> sto.gc
{'SS_cons': '(((.....)))'}

We can also write out information by instantiating the StockholmAlignment object and then printing it.

>>> from skbio.sequence import RNA
>>> from skbio.alignment import StockholmAlignment
>>> seqs = [RNA("ACC--G-GGGU", id="seq1"),
...     RNA("TCC--G-GGGA", id="seq2")]
>>> gf = {
... "RT": ["TITLE1",  "TITLE2"],
... "RA": ["Auth1;", "Auth2;"],
... "RL": ["J Mol Biol", "Cell"],
... "RM": ["11469857", "12007400"]}
>>> sto = StockholmAlignment(seqs, gf=gf)
>>> print(sto)
# STOCKHOLM 1.0
#=GF RN [1]
#=GF RM 11469857
#=GF RT TITLE1
#=GF RA Auth1;
#=GF RL J Mol Biol
#=GF RN [2]
#=GF RM 12007400
#=GF RT TITLE2
#=GF RA Auth2;
#=GF RL Cell
seq1          ACC--G-GGGU
seq2          TCC--G-GGGA
//

Methods

__contains__(id) The in operator.
__eq__(other) The equality operator.
__getitem__(index) The indexing operator.
__iter__() The iter operator.
__len__() The len operator.
__ne__(other) The inequality operator.
__repr__() The repr method.
__reversed__() The reversed method.
degap() Return a new SequenceCollection with all gap characters removed.
distances([distance_fn]) Compute distances between all pairs of sequences
distribution_stats([center_f, spread_f]) Return sequence count, and center and spread of sequence lengths
from_fasta_records(fasta_records, ...[, ...]) Initialize a SequenceCollection object
from_file(infile, seq_constructor[, strict]) yields StockholmAlignment objects from a stockholm file.
get_seq(id) Return a sequence from the SequenceCollection by its id.
ids() Returns the BiologicalSequence ids
int_map([prefix]) Create an integer-based mapping of sequence ids
is_empty() Return True if the SequenceCollection is empty
is_valid() Return True if the SequenceCollection is valid
iter_positions([constructor]) Generator of Alignment positions (i.e., columns)
iteritems() Generator of id, sequence tuples
k_word_frequencies(k[, overlapping]) Return frequencies of length k words for sequences in Alignment
lower() Converts all sequences to lowercase
majority_consensus([constructor]) Return the majority consensus sequence for the Alignment
omit_gap_positions(maximum_gap_frequency) Returns Alignment with positions filtered based on gap frequency
omit_gap_sequences(maximum_gap_frequency) Returns Alignment with sequences filtered based on gap frequency
position_counters() Return collections.Counter object for positions in Alignment
position_entropies([base, ...]) Return Shannon entropy of positions in Alignment
position_frequencies() Return frequencies of characters for positions in Alignment
read(fp[, format]) Create a new Alignment instance from a file.
score() Returns the score of the alignment.
sequence_count() Return the count of sequences in the SequenceCollection
sequence_length() Return the number of positions in Alignment
sequence_lengths() Return lengths of the sequences in the SequenceCollection
start_end_positions() Returns the (start, end) positions for each aligned sequence.
subalignment([seqs_to_keep, ...]) Returns new Alignment that is a subset of the current Alignment
toFasta() Return fasta-formatted string representing the SequenceCollection
to_fasta() Return fasta-formatted string representing the SequenceCollection
to_file(out_f) Save the alignment to file in text format.
to_phylip([map_labels, label_prefix]) Return phylip-formatted string representing the SequenceCollection
update_ids([ids, fn, prefix]) Update sequence IDs on the sequence collection.
upper() Converts all sequences to uppercase
write(fp[, format]) Write an instance of Alignment to a file.