`skbio.stats.distance`

)¶This subpackage provides functionality for serializing, deserializing, and manipulating dissimilarity and distance matrices in memory. It also contains various statistical methods that operate on distance matrices, often relating distances (e.g., community distances) to categorical and/or continuous variables of interest (e.g., gender or age). Methods are also provided for comparing distance matrices (e.g., computing the correlation between two or more distance matrices using the Mantel test).

This package provides two matrix classes, DissimilarityMatrix and DistanceMatrix. Both classes can store measures of difference/distinction between objects. A dissimilarity/distance matrix includes both a matrix of dissimilarities/distances (floats) between objects, as well as unique IDs (object labels; strings) identifying each object in the matrix.

DissimilarityMatrix can be used to store measures of dissimilarity between
objects, and does not require that the dissimilarities are symmetric (e.g.,
dissimilarities obtained using the *Gain in PD* measure [R260]).
DissimilarityMatrix is a more general container to store differences than
DistanceMatrix.

DistanceMatrix has the additional requirement that the differences it stores are symmetric (e.g., Euclidean or Hamming distances).

Note

DissimilarityMatrix can be used to store distances, but it is
recommended to use DistanceMatrix to store this type of data as it
provides an additional check for symmetry. A distance matrix *is a*
dissimilarity matrix; this is modeled in the class design by having
DistanceMatrix subclass DissimilarityMatrix.

`DissimilarityMatrix` (data[, ids]) |
Store dissimilarities between objects. |

`DistanceMatrix` (data[, ids]) |
Store distances between objects. |

`randdm` (num_objects[, ids, constructor, ...]) |
Generate a distance matrix populated with random distances. |

`DissimilarityMatrixError` |
General error for dissimilarity matrix validation failures. |

`DistanceMatrixError` |
General error for distance matrix validation failures. |

`MissingIDError` (missing_id) |
Error for ID lookup that doesn’t exist in the dissimilarity matrix. |

Assume we have the following delimited text file storing distances between
three objects with IDs `a`

, `b`

, and `c`

:

```
\ta\tb\tc
a\t0.0\t0.5\t1.0
b\t0.5\t0.0\t0.75
c\t1.0\t0.75\t0.0
```

Load a distance matrix from the file:

```
>>> from io import StringIO
>>> from skbio import DistanceMatrix
>>> dm_fh = StringIO("\ta\tb\tc\n"
... "a\t0.0\t0.5\t1.0\n"
... "b\t0.5\t0.0\t0.75\n"
... "c\t1.0\t0.75\t0.0\n")
>>> dm = DistanceMatrix.read(dm_fh)
>>> print(dm)
3x3 distance matrix
IDs:
'a', 'b', 'c'
Data:
[[ 0. 0.5 1. ]
[ 0.5 0. 0.75]
[ 1. 0.75 0. ]]
```

Access the distance (scalar) between objects `'a'`

and `'c'`

:

```
>>> dm['a', 'c']
1.0
```

Get a row vector of distances between object `'b'`

and all other objects:

```
>>> dm['b']
array([ 0.5 , 0. , 0.75])
```

numpy indexing/slicing also works as expected. Extract the third column:

```
>>> dm[:, 2]
array([ 1. , 0.75, 0. ])
```

Serialize the distance matrix to delimited text file:

```
>>> out_fh = StringIO()
>>> _ = dm.write(out_fh)
>>> out_fh.getvalue() == dm_fh.getvalue()
True
```

A distance matrix object can also be created from an existing `numpy.array`

(or an array-like object, such as a nested Python list):

```
>>> import numpy as np
>>> data = np.array([[0.0, 0.5, 1.0],
... [0.5, 0.0, 0.75],
... [1.0, 0.75, 0.0]])
>>> ids = ["a", "b", "c"]
>>> dm_from_np = DistanceMatrix(data, ids)
>>> print(dm_from_np)
3x3 distance matrix
IDs:
'a', 'b', 'c'
Data:
[[ 0. 0.5 1. ]
[ 0.5 0. 0.75]
[ 1. 0.75 0. ]]
>>> dm_from_np == dm
True
```

IDs may be omitted when constructing a dissimilarity/distance matrix. Monotonically-increasing integers (cast as strings) will be automatically used:

```
>>> dm = DistanceMatrix(data)
>>> dm.ids
('0', '1', '2')
```

In addition to the data structures described above, this package provides the following distance-based statistical methods.

`anosim` (distance_matrix, grouping[, column, ...]) |
Test for significant differences between groups using ANOSIM. |

`permanova` (distance_matrix, grouping[, ...]) |
Test for significant differences between groups using PERMANOVA. |