skbio.sequence.GeneticCode

class skbio.sequence.GeneticCode(amino_acids, starts, name='')[source]

Genetic code for translating codons to amino acids.

Parameters:

amino_acids : consumable by skbio.Protein constructor

64-character vector containing IUPAC amino acid characters. The order of the amino acids should correspond to NCBI’s codon order (see Notes section below). amino_acids is the “AAs” field in NCBI’s genetic code format [R193].

starts : consumable by skbio.Protein constructor

64-character vector containing only M and - characters, with start codons indicated by M. The order of the amino acids should correspond to NCBI’s codon order (see Notes section below). starts is the “Starts” field in NCBI’s genetic code format [R193].

name : str, optional

Genetic code name. This is simply metadata and does not affect the functionality of the genetic code itself.

Notes

The genetic codes available via GeneticCode.from_ncbi and used throughout the examples are defined in [R193]. The genetic code strings defined there are directly compatible with the GeneticCode constructor.

The order of amino_acids and starts should correspond to NCBI’s codon order, defined in [R193]:

UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG
UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG

Note that scikit-bio displays this ordering using the IUPAC RNA alphabet, while NCBI displays this same ordering using the IUPAC DNA alphabet (for historical purposes).

References

[R193](1, 2, 3, 4, 5) http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi

Examples

Get NCBI’s standard genetic code (table ID 1, the default genetic code in scikit-bio):

>>> from skbio import GeneticCode
>>> GeneticCode.from_ncbi()
GeneticCode (Standard)
-------------------------------------------------------------------------
  AAs  = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = ---M---------------M---------------M----------------------------
Base1  = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG
Base3  = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG

Get a different NCBI genetic code (25):

>>> GeneticCode.from_ncbi(25)
GeneticCode (Candidate Division SR1 and Gracilibacteria)
-------------------------------------------------------------------------
  AAs  = FFLLSSSSYY**CCGWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = ---M-------------------------------M---------------M------------
Base1  = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG
Base3  = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG

Define a custom genetic code:

>>> GeneticCode('M' * 64, '-' * 64)
GeneticCode
-------------------------------------------------------------------------
  AAs  = MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
Starts = ----------------------------------------------------------------
Base1  = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG
Base3  = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG

Translate an RNA sequence to protein using NCBI’s standard genetic code:

>>> from skbio import RNA
>>> rna = RNA('AUGCCACUUUAA')
>>> GeneticCode.from_ncbi().translate(rna)
Protein
-----------------------------
Stats:
    length: 4
    has gaps: False
    has degenerates: False
    has non-degenerates: True
    has stops: True
-----------------------------
0 MPL*

Attributes

name Genetic code name.
reading_frames Six possible reading frames.

Methods

gc1 == gc2 Determine if the genetic code is equal to another.
gc1 != gc2 Determine if the genetic code is not equal to another.
str(gc) Return string representation of the genetic code.
from_ncbi([table_id]) Return NCBI genetic code specified by table ID.
translate(sequence[, reading_frame, start, stop]) Translate RNA sequence into protein sequence.
translate_six_frames(sequence[, start, stop]) Translate RNA into protein using six possible reading frames.