skbio.math.stats.spatial.procrustes

skbio.math.stats.spatial.procrustes(data1, data2)[source]

Procrustes analysis, a similarity test for two data sets

Each input matrix is a set of points or vectors (the rows of the matrix). The dimension of the space is the number of columns of each matrix. Given two identially sized matrices, procrustes standardizes both such that:

  • trace(AA’) = 1 (A’ is the transpose, and the product is a standard matrix product).
  • Both sets of points are centered around the origin.

Procrustes ([R71], [R72]) then applies the optimal transform to the second matrix (including scaling/dilation, rotations, and reflections) to minimize M^2 = sum(square(mtx1 - mtx2)), or the sum of the squares of the pointwise differences between the two input datasets.

If two data sets have different dimensionality (different number of columns), simply add columns of zeros the the smaller of the two.

This function was not designed to handle datasets with different numbers of datapoints (rows).

Parameters:

data1 : array_like

matrix, n rows represent points in k (columns) space data1 is the reference data, after it is standardised, the data from data2 will be transformed to fit the pattern in data1 (must have >1 unique points).

data2 : array_like

n rows of data in k space to be fit to data1. Must be the same shape (numrows, numcols) as data1 (must have >1 unique points).

Returns:

mtx1 : array_like

a standardized version of data1

mtx2 : array_like

the orientation of data2 that best fits data1. Centered, but not necessarily trace(mtx2*mtx2’) = 1

disparity : array_like

M^2 defined above

Notes

  • The disparity should not depend on the order of the input matrices, but the output matrices will, as only the first output matrix is guaranteed to be scaled such that trace(AA') = 1.
  • Duplicate datapoints are generally ok, duplicating a data point will increase it’s effect on the procrustes fit.
  • The disparity scales as the number of points per input matrix.

References

[R71](1, 2) Krzanowski, W. J. (2000). “Principles of Multivariate analysis”.
[R72](1, 2) Gower, J. C. (1975). “Generalized procrustes analysis”.

Examples

>>> import numpy as np
>>> from skbio.math.stats.spatial import procrustes
>>> a = np.array([[1, 3], [1, 2], [1, 1], [2, 1]], 'd')
>>> b = np.array([[4, -2], [4, -4], [4, -6], [2, -6]], 'd')
>>> p = procrustes(a, b)
>>> print p
(array([[-0.13363062,  0.6681531 ],
       [-0.13363062,  0.13363062],
       [-0.13363062, -0.40089186],
       [ 0.40089186, -0.40089186]]), array([[-0.13363062,  0.6681531 ],
       [-0.13363062,  0.13363062],
       [-0.13363062, -0.40089186],
       [ 0.40089186, -0.40089186]]), 1.6177811532852781e-32)