# Clustal format (skbio.io.clustal)¶

Clustal format (clustal) stores multiple sequence alignments. This format was originally introduced in the Clustal package [R151].

## Format Support¶

Has Sniffer: Yes

Yes Yes skbio.alignment.Alignment

## Format Specification¶

A clustal-formatted file is a plain text format. It can optionally have a header, which states the clustal version number. This is followed by the multiple sequence alignment, and optional information about the degree of conservation at each position in the alignment [R152].

### Alignment Section¶

Each sequence in the alignment is divided into subsequences each at most 60 characters long. The sequence identifier for each sequence precedes each subsequence. Each subsequence can optionally be followed by the cumulative number of non-gap characters up to that point in the full sequence (not included in the examples below). A line containing conservation information about each position in the alignment can optionally follow all of the subsequences (not included in the examples below).

Note

scikit-bio does not support writing conservation information

Note

scikit-bio will only write a clustal-formatted file if the alignment’s sequence characters are valid IUPAC characters, as defined in skbio.sequence. The specific lexicon that is validated against depends on the type of sequences stored in the alignment.

## Examples¶

Assume we have a clustal-formatted file with the following contents:

CLUSTAL W (1.82) multiple sequence alignment

abc   GCAUGCAUCUGCAUACGUACGUACGCAUGCAUCA
def   ----------------------------------
xyz   ----------------------------------

abc   GUCGAUACAUACGUACGUCGUACGUACGU-CGAC
def   ---------------CGCGAUGCAUGCAU-CGAU
xyz   -----------CAUGCAUCGUACGUACGCAUGAC


We can use the following code to read a clustal file:

>>> from StringIO import StringIO
>>> from skbio import Alignment
>>> clustal_f = StringIO('abc   GCAUGCAUCUGCAUACGUACGUACGCAUGCA\n'
...                      'def   -------------------------------\n'
...                      'xyz   -------------------------------\n'
...                      '\n'
...                      'abc   GUCGAUACAUACGUACGUCGGUACGU-CGAC\n'
...                      'def   ---------------CGUGCAUGCAU-CGAU\n'
...                      'xyz   -----------CAUUCGUACGUACGCAUGAC\n')
>>> for dna in read(clustal_f, format="clustal", into=Alignment):
...     print(dna.id)
...     print(dna.sequence)
abc
GCAUGCAUCUGCAUACGUACGUACGCAUGCAGUCGAUACAUACGUACGUCGGUACGU-CGAC
def
----------------------------------------------CGUGCAUGCAU-CGAU
xyz
------------------------------------------CAUUCGUACGUACGCAUGAC


We can use the following code to write to a clustal-formatted file:

>>> from skbio import Alignment, DNA
>>> from skbio.io import write
>>> seqs = [DNA('ACCGTTGTA-GTAGCT', id='seq1'),
...         DNA('A--GTCGAA-GTACCT', id='sequence-2'),
...         DNA('AGAGTTGAAGGTATCT', id='3')]
>>> aln = Alignment(seqs)
>>> from StringIO import StringIO
>>> fh = StringIO()
>>> aln.write(fh, format='clustal')
>>> print(fh.getvalue())
CLUSTAL

seq1        ACCGTTGTA-GTAGCT
sequence-2  A--GTCGAA-GTACCT
3           AGAGTTGAAGGTATCT