skbio.io
)¶This package provides I/O functionality for skbio.
For details on what objects are supported by each format, see the associated documentation.
blast6 |
BLAST+6 format (skbio.io.format.blast6 ) |
blast7 |
BLAST+7 format (skbio.io.format.blast7 ) |
clustal |
Clustal format (skbio.io.format.clustal ) |
embl |
EMBL format (skbio.io.format.embl ) |
fasta |
FASTA/QUAL format (skbio.io.format.fasta ) |
fastq |
FASTQ format (skbio.io.format.fastq ) |
genbank |
GenBank format (skbio.io.format.genbank ) |
gff3 |
GFF3 format (skbio.io.format.gff3 ) |
lsmat |
Labeled square matrix format (skbio.io.format.lsmat ) |
newick |
Newick format (skbio.io.format.newick ) |
ordination |
Ordination results format (skbio.io.format.ordination ) |
phylip |
PHYLIP multiple sequence alignment format (skbio.io.format.phylip ) |
qseq |
QSeq format (skbio.io.format.qseq ) |
stockholm |
Stockholm format (skbio.io.format.stockholm ) |
write (self, obj, format, into, **kwargs) |
Write obj as format into a file. |
read (self, file[, format, into, verify]) |
Read file as format into an object. |
sniff (self, file, **kwargs) |
Detect the format of a given file and suggest kwargs for reading. |
FormatIdentificationWarning |
Warn when the sniffer of a format cannot confirm the format. |
ArgumentOverrideWarning |
Warn when a user provided kwarg differs from a guessed kwarg. |
UnrecognizedFormatError |
Raised when a file’s format is unknown, ambiguous, or unidentifiable. |
IOSourceError |
Raised when a file source cannot be resolved. |
FileFormatError |
Raised when a file cannot be parsed. |
BLAST7FormatError |
Raised when a blast7 formatted file cannot be parsed. |
ClustalFormatError |
Raised when a clustal formatted file cannot be parsed. |
EMBLFormatError |
Raised when a EMBL formatted file cannot be parsed. |
FASTAFormatError |
Raised when a fasta formatted file cannot be parsed. |
FASTQFormatError |
Raised when a fastq formatted file cannot be parsed. |
GenBankFormatError |
Raised when a genbank formatted file cannot be parsed. |
GFF3FormatError |
Raised when a GFF3 formatted file cannot be parsed. |
LSMatFormatError |
Raised when a lsmat formatted file cannot be parsed. |
NewickFormatError |
Raised when a newick formatted file cannot be parsed. |
OrdinationFormatError |
Raised when an ordination formatted file cannot be parsed. |
PhylipFormatError |
Raised when a phylip formatted file cannot be parsed. |
QSeqFormatError |
Raised when a qseq formatted file cannot be parsed. |
QUALFormatError |
Raised when a qual formatted file cannot be parsed. |
StockholmFormatError |
Raised when a stockholm formatted file cannot be parsed. |
registry |
I/O Registry (skbio.io.registry ) |
util |
I/O utils (skbio.io.util ) |
For developer documentation on extending I/O, see skbio.io.registry
.
Reading and writing files (I/O) can be a complicated task:
skbio.alignment.TabularMSA
or skbio.sequence.DNA
depending on
what operations you’d like to perform on your data.skbio.alignment.TabularMSA
object could be written to FASTA, FASTQ,
CLUSTAL, or PHYLIP formats, just to name a few.To address these issues (and others), scikit-bio provides a simple, powerful interface for dealing with I/O. We accomplish this by using a single I/O registry.
To see a complete list of file-like inputs that can be used for reading,
writing, and sniffing, see the documentation for skbio.io.util.open()
.
There are two ways to read files. The first way is to use the procedural interface:
my_obj = skbio.io.read(file, format='someformat', into=SomeSkbioClass)
The second is to use the object-oriented (OO) interface which is automatically constructed from the procedural interface:
my_obj = SomeSkbioClass.read(file, format='someformat')
For example, to read a newick file using both interfaces you would type:
>>> from skbio import read
>>> from skbio import TreeNode
>>> from io import StringIO
>>> open_filehandle = StringIO('(a, b);')
>>> tree = read(open_filehandle, format='newick', into=TreeNode)
>>> tree
<TreeNode, name: unnamed, internal node count: 0, tips count: 2>
For the OO interface:
>>> open_filehandle = StringIO('(a, b);')
>>> tree = TreeNode.read(open_filehandle, format='newick')
>>> tree
<TreeNode, name: unnamed, internal node count: 0, tips count: 2>
In the case of skbio.io.registry.read()
if into is not provided, then a
generator will be returned. What the generator yields will depend on what
format is being read.
When into is provided, format may be omitted and the registry will use its knowledge of the available formats for the requested class to infer the correct format. This format inference is also available in the OO interface, meaning that format may be omitted there as well.
As an example:
>>> open_filehandle = StringIO('(a, b);')
>>> tree = TreeNode.read(open_filehandle)
>>> tree
<TreeNode, name: unnamed, internal node count: 0, tips count: 2>
We call format inference sniffing, much like the csv.Sniffer
class of Python’s standard library. The goal of a sniffer is twofold: to
identify if a file is a specific format, and if it is, to provide **kwargs
which can be used to better parse the file.
Note
There is a built-in sniffer which results in a useful error message if an empty file is provided as input and the format was omitted.
Just as when reading files, there are two ways to write files.
Procedural Interface:
skbio.io.write(my_obj, format='someformat', into=file)
OO Interface:
my_obj.write(file, format='someformat')
In the procedural interface, format is required. Without it, scikit-bio does not know how you want to serialize an object. OO interfaces define a default format, so it may not be necessary to include it.