skbio.stats.distance.DissimilarityMatrix

class skbio.stats.distance.DissimilarityMatrix(data, ids=None)[source]

Store dissimilarities between objects.

A DissimilarityMatrix instance stores a square, hollow, two-dimensional matrix of dissimilarities between objects. Objects could be, for example, samples or DNA sequences. A sequence of IDs accompanies the dissimilarities.

Methods are provided to load and save dissimilarity matrices from/to disk, as well as perform common operations such as extracting dissimilarities based on object ID.

Parameters
  • data (array_like or DissimilarityMatrix) – Square, hollow, two-dimensional numpy.ndarray of dissimilarities (floats), or a structure that can be converted to a numpy.ndarray using numpy.asarray or a one-dimensional vector of dissimilarities (floats), as defined by scipy.spatial.distance.squareform. Can instead be a DissimilarityMatrix (or subclass) instance, in which case the instance’s data will be used. Data will be converted to a float dtype if necessary. A copy will not be made if already a numpy.ndarray with a float dtype.

  • ids (sequence of str, optional) – Sequence of strings to be used as object IDs. Must match the number of rows/cols in data. If None (the default), IDs will be monotonically-increasing integers cast as strings, with numbering starting from zero, e.g., ('0', '1', '2', '3', ...).

See also

DistanceMatrix, scipy.spatial.distance.squareform

Notes

The dissimilarities are stored in redundant (square-form) format 1.

The data are not checked for symmetry, nor guaranteed/assumed to be symmetric.

References

1

http://docs.scipy.org/doc/scipy/reference/spatial.distance.html

Attributes

T

Transpose of the dissimilarity matrix.

data

Array of dissimilarities.

default_write_format

dtype

Data type of the dissimilarities.

ids

Tuple of object IDs.

png

Display heatmap in IPython Notebook as PNG.

shape

Two-element tuple containing the dissimilarity matrix dimensions.

size

Total number of elements in the dissimilarity matrix.

svg

Display heatmap in IPython Notebook as SVG.

Built-ins

x in dm

Check if the specified ID is in the dissimilarity matrix.

dm1 == dm2

Compare this dissimilarity matrix to another for equality.

dm[x]

Slice into dissimilarity data by object ID or numpy indexing.

dm1 != dm2

Determine whether two dissimilarity matrices are not equal.

str(dm)

Return a string representation of the dissimilarity matrix.

Methods

between(from_, to_[, allow_overlap])

Obtain the distances between the two groups of IDs

copy()

Return a deep copy of the dissimilarity matrix.

filter(ids[, strict])

Filter the dissimilarity matrix by IDs.

from_iterable(iterable, metric[, key, keys])

Create DissimilarityMatrix from an iterable given a metric.

index(lookup_id)

Return the index of the specified ID.

plot([cmap, title])

Creates a heatmap of the dissimilarity matrix

read(file[, format])

Create a new DissimilarityMatrix instance from a file.

redundant_form()

Return an array of dissimilarities in redundant format.

to_data_frame()

Create a pandas.DataFrame from this DissimilarityMatrix.

transpose()

Return the transpose of the dissimilarity matrix.

within(ids)

Obtain all the distances among the set of IDs

write(file[, format])

Write an instance of DissimilarityMatrix to a file.