skbio.sequence.
RNA
(sequence, metadata=None, positional_metadata=None, lowercase=False, validate=True)[source]¶Store RNA sequence data and optional associated metadata.
Only characters in the IUPAC RNA character set [R228] are supported.
Parameters: | sequence : str, Sequence, or 1D np.ndarray (np.uint8 or ‘|S1’)
metadata : dict, optional
positional_metadata : Pandas DataFrame consumable, optional
lowercase : bool or str, optional
validate : bool, optional
|
---|
See also
Notes
Subclassing is disabled for RNA, because subclassing makes
it possible to change the alphabet, and certain methods rely on the
IUPAC alphabet. If a custom sequence alphabet is needed, inherit directly
from GrammaredSequence
.
References
[R228] | (1, 2) Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. May 10, 1985; 13(9): 3021-3030. A Cornish-Bowden |
Examples
>>> from skbio import RNA
>>> RNA('ACCGAAU')
RNA
--------------------------
Stats:
length: 7
has gaps: False
has degenerates: False
has definites: True
GC-content: 42.86%
--------------------------
0 ACCGAAU
Convert lowercase characters to uppercase:
>>> RNA('AcCGaaU', lowercase=True)
RNA
--------------------------
Stats:
length: 7
has gaps: False
has degenerates: False
has definites: True
GC-content: 42.86%
--------------------------
0 ACCGAAU
Attributes
values |
Array containing underlying sequence characters. |
metadata |
dict containing metadata which applies to the entire object. |
positional_metadata |
pd.DataFrame containing metadata along an axis. |
alphabet |
Return valid characters. |
gap_chars |
Return characters defined as gaps. |
default_gap_char |
Gap character to use when constructing a new gapped sequence. |
definite_chars |
Return definite characters. |
degenerate_chars |
Return degenerate characters. |
degenerate_map |
Return mapping of degenerate to definite characters. |
complement_map |
Return mapping of nucleotide characters to their complements. |
Methods
bool(rna) |
Returns truth value (truthiness) of sequence. |
x in rna |
Determine if a subsequence is contained in this sequence. |
copy.copy(rna) |
Return a shallow copy of this sequence. |
copy.deepcopy(rna) |
Return a deep copy of this sequence. |
rna1 == rna2 |
Determine if this sequence is equal to another. |
rna[x] |
Slice this sequence. |
iter(rna) |
Iterate over positions in this sequence. |
len(rna) |
Return the number of characters in this sequence. |
rna1 != rna2 |
Determine if this sequence is not equal to another. |
reversed(rna) |
Iterate over positions in this sequence in reverse order. |
str(rna) |
Return sequence characters as a string. |
complement ([reverse]) |
Return the complement of the nucleotide sequence. |
concat (sequences[, how]) |
Concatenate an iterable of Sequence objects. |
copy ([deep]) |
Return a copy of this sequence. |
count (subsequence[, start, end]) |
Count occurrences of a subsequence in this sequence. |
definites () |
Find positions containing definite characters in the sequence. |
degap () |
Return a new sequence with gap characters removed. |
degenerates () |
Find positions containing degenerate characters in the sequence. |
distance (other[, metric]) |
Compute the distance to another sequence. |
expand_degenerates () |
Yield all possible definite versions of the sequence. |
find_motifs (motif_type[, min_length, ignore]) |
Search the biological sequence for motifs. |
find_with_regex (regex[, ignore]) |
Generate slices for patterns matched by a regular expression. |
frequencies ([chars, relative]) |
Compute frequencies of characters in the sequence. |
gaps () |
Find positions containing gaps in the biological sequence. |
gc_content () |
Calculate the relative frequency of G’s and C’s in the sequence. |
gc_frequency ([relative]) |
Calculate frequency of G’s and C’s in the sequence. |
has_definites () |
Determine if sequence contains one or more definite characters |
has_degenerates () |
Determine if sequence contains one or more degenerate characters. |
has_gaps () |
Determine if the sequence contains one or more gap characters. |
has_metadata () |
Determine if the object has metadata. |
has_nondegenerates () |
Determine if sequence contains one or more non-degenerate characters |
has_positional_metadata () |
Determine if the object has positional metadata. |
index (subsequence[, start, end]) |
Find position where subsequence first occurs in the sequence. |
is_reverse_complement (other) |
Determine if a sequence is the reverse complement of this sequence. |
iter_contiguous (included[, min_length, invert]) |
Yield contiguous subsequences based on included. |
iter_kmers (k[, overlap]) |
Generate kmers of length k from this sequence. |
kmer_frequencies (k[, overlap, relative]) |
Return counts of words of length k from this sequence. |
lowercase (lowercase) |
Return a case-sensitive string representation of the sequence. |
match_frequency (other[, relative]) |
Return count of positions that are the same between two sequences. |
matches (other) |
Find positions that match with another sequence. |
mismatch_frequency (other[, relative]) |
Return count of positions that differ between two sequences. |
mismatches (other) |
Find positions that do not match with another sequence. |
nondegenerates () |
Find positions containing non-degenerate characters in the sequence. |
read (file[, format]) |
Create a new RNA instance from a file. |
replace (where, character) |
Replace values in this sequence with a different character. |
reverse_complement () |
Return the reverse complement of the nucleotide sequence. |
reverse_transcribe () |
Reverse transcribe RNA into DNA. |
to_regex () |
Return regular expression object that accounts for degenerate chars. |
translate ([genetic_code]) |
Translate RNA sequence into protein sequence. |
translate_six_frames ([genetic_code]) |
Translate RNA into protein using six possible reading frames. |
write (file[, format]) |
Write an instance of RNA to a file. |