skbio.math.stats.ordination.CCA

class skbio.math.stats.ordination.CCA(Y, X, site_ids, species_ids)[source]

Compute constrained (also known as canonical) correspondence analysis.

Canonical (or constrained) correspondence analysis is a multivariate ordination technique. It appeared in community ecology [R77] and relates community composition to the variation in the environment (or in other factors). It works from data on abundances or counts of individuals and environmental variables, and outputs ordination axes that maximize niche separation among species.

It is better suited to extract the niches of taxa than linear multivariate methods because it assumes unimodal response curves (habitat preferences are often unimodal functions of habitat variables [R78]).

As more environmental variables are added, the result gets more similar to unconstrained ordination, so only the variables that are deemed explanatory should be included in the analysis.

Parameters:

Y : array_like Community data matrix of shape (n, m): a

contingency table for m species at n sites.

X : array_like Constraining matrix of shape (n, q): q quantitative

environmental variables at n sites.

See also

CA, RDA

Notes

The algorithm is based on [R79], S 11.2, and is expected to give the same results as cca(Y, X) in R’s package vegan, except that this implementation won’t drop constraining variables due to perfect collinearity: the user needs to choose which ones to input.

Canonical correspondence analysis shouldn’t be confused with canonical correlation analysis (CCorA, but sometimes called CCA), a different technique to search for multivariate relationships between two datasets. Canonical correlation analysis is a statistical tool that, given two vectors of random variables, finds linear combinations that have maximum correlation with each other. In some sense, it assumes linear responses of “species” to “environmental variables” and is not well suited to analyze ecological data.

In data analysis, ordination (or multivariate gradient analysis) complements clustering by arranging objects (species, samples...) along gradients so that similar ones are closer and dissimilar ones are further. There’s a good overview of the available techniques in http://ordination.okstate.edu/overview.htm.

References

[R77](1, 2) Cajo J. F. Ter Braak, “Canonical Correspondence Analysis: A New Eigenvector Technique for Multivariate Direct Gradient Analysis”, Ecology 67.5 (1986), pp. 1167-1179.
[R78](1, 2) Cajo J.F. Braak and Piet F.M. Verdonschot, “Canonical correspondence analysis and related multivariate methods in aquatic ecology”, Aquatic Sciences 57.3 (1995), pp. 255-289.
[R79](1, 2) Legendre P. and Legendre L. 1998. Numerical Ecology. Elsevier, Amsterdam.

Methods

scores(scaling) Compute site and species scores for different scalings.