skbio.diversity.block_beta_diversity

skbio.diversity.block_beta_diversity(metric, counts, ids, validate=True, k=64, reduce_f=None, map_f=None, **kwargs)[source]

Perform a block-decomposition beta diversity calculation

State: Experimental as of 0.5.1.

Parameters
  • metric (str or callable) – The pairwise distance function to apply. If metric is a string, it must be resolvable by scikit-bio (e.g., UniFrac methods), or must be callable.

  • counts (2D array_like of ints or floats) – Matrix containing count/abundance data where each row contains counts of OTUs in a given sample.

  • ids (iterable of strs) – Identifiers for each sample in counts.

  • validate (bool, optional) – See skbio.diversity.beta_diversity for details.

  • reduce_f (function, optional) –

    A method to reduce PartialDistanceMatrix objects into a single DistanceMatrix. The expected signature is:

    f(Iterable of DistanceMatrix) -> DistanceMatrix

    Note, this is the reduce within a map/reduce.

  • map_f (function, optional) –

    A method that accepts a _block_compute. The expected signature is:

    f(**kwargs) -> DistanceMatrix

    NOTE: ipyparallel’s map_async will not work here as we need to be able to pass around **kwargs`.

  • k (int, optional) – The blocksize used when computing distances

  • kwargs (kwargs, optional) – Metric-specific parameters.

Returns

A distance matrix relating all samples represented by counts to each other.

Return type

DistanceMatrix

Notes

This method is designed to facilitate computing beta diversity in parallel. In general, if you are processing a few hundred samples or less, then it is likely the case that skbio.diversity.beta_diversity will be faster. The original need which motivated the development of this method was processing the Earth Microbiome Project 1 dataset which at the time spanned over 25,000 samples and 7.5 million open reference OTUs.

References

1

http://www.earthmicrobiome.org/