scikit-bio is an open-source, BSD-licensed python package providing data structures, algorithms and educational resources for bioinformatics.

Installation of release version (recommended for most users)

To install the latest release version of scikit-bio you should run:

pip install numpy
pip install scikit-bio

If you'd like to install the dependencies manually (or some other way than using pip), you can find those here:

If you have trouble getting these dependencies installed (scipy, in particular, can be tricky), you should try installing Canopy Express, which includes all of these dependencies. You should then be able to easily install scikit-bio by running:

pip install scikit-bio

After installation with pip, you can run the scikit-bio unittest suite as follows:

nosetests --with-doctest skbio

Installation of development version

If you're interested in working with the latest development release of scikit-bio (recommended for developers only, as the development code can be unstable and less documented than the release code), you can clone the repository and install as follows. This will require that you have git installed.

git clone
cd scikit-bio
pip install .

After this completes, you can run the scikit-bio unittest suite as follows. You must first cd out of the scikit-bio directory for the tests to pass (here we cd to the home directory).

nosetests --with-doctest skbio

For developers of scikit-bio, if you don't want to be forced to re-install after every change, you can modify the above pip install command to:

pip install -e .

This will build scikit-bio's cython extensions, and will create a link in the site-packages directory to the scikit-bio source directory. When you then make changes to code in the source directory, those will be used (e.g., by the unittests) without re-installing.

Finally, if you don't want to use pip to install scikit-bio, and prefer to just put scikit-bio in your $PYTHONPATH, at the minimum you should run:

python build_ext --inplace

This will build scikit-bio's cython extensions, but not create a link to the scikit-bio source directory in site-packages. If this isn't done, using certain components of scikit-bio will be inefficient and will produce an EfficiencyWarning.


scikit-bio is available under the new BSD license. See COPYING.txt for scikit-bio's license, and the licenses directory for the licenses of third-party software that is (either partially or entirely) distributed with scikit-bio.

Projects using scikit-bio

Some of the projects that we know of that are using scikit-bio are:

If you're using scikit-bio in your own projects, you can issue a pull request to add them to this list.

scikit-bio development

If you're interested in getting involved in or learning about scikit-bio development, see

See the list of all of scikit-bio's contributors.

Summaries of our weekly developer meetings are posted on HackPad. Click here to view the meeting notes for 2014.

The pre-history of scikit-bio

scikit-bio began from code derived from PyCogent and QIIME, and the contributors and/or copyright holders have agreed to make the code they wrote for PyCogent and/or QIIME available under the BSD license. The contributors to PyCogent and/or QIIME modules that have been ported to scikit-bio are: Rob Knight (@rob-knight), Gavin Huttley (@gavin-huttley), Daniel McDonald (@wasade), Micah Hamady, Antonio Gonzalez (@antgonza), Sandra Smit, Greg Caporaso (@gregcaporaso), Jai Ram Rideout (@ElBrogrammer), Cathy Lozupone (@clozupone), Mike Robeson (@mikerobeson), Marcin Cieslik, Peter Maxwell, Jeremy Widmann, Zongzhi Liu, Michael Dwan, Logan Knecht (@loganknecht), Andrew Cochran, Jose Carlos Clemente (@cleme), Damien Coy, Levi McCracken, Andrew Butterfield, Will Van Treuren (@wdwvt1), Justin Kuczynski (@justin212k), and Jose Antonio Navas Molina (@josenavas).


scikit-bio's logo was created by @ebolyen. scikit-bio's ASCII art tree was created by @gregcaporaso. Our text logo was created at