skbio.tree.majority_rule

skbio.tree.majority_rule(trees, weights=None, cutoff=0.5, support_attr='support')[source]

Determines consensus trees from a list of rooted trees

State: Experimental as of 0.4.0.

Parameters:

trees : list of TreeNode

The trees to operate on

weights : list or np.array of {int, float}, optional

If provided, the list must be in index order with trees. Each tree will receive the corresponding weight. If omitted, all trees will be equally weighted.

cutoff : float, 0.0 <= cutoff <= 1.0

Any clade that has <= cutoff support will be dropped. If cutoff is < 0.5, then it is possible that ties will result. If so, ties are broken arbitrarily depending on list sort order.

support_attr : str

The attribute to be decorated onto the resulting trees that contain the consensus support.

Returns:

list of TreeNode

Multiple trees can be returned in the case of two or more disjoint sets of tips represented on input.

Notes

This code was adapted from PyCogent’s majority consensus code originally written by Matthew Wakefield. The method is based off the original description of consensus trees in [R264]. An additional description can be found in the Phylip manual [R265]. This method does not support majority rule extended.

Support is computed as a weighted average of the tree weights in which the clade was observed in. For instance, if {A, B, C} was observed in 5 trees all with a weight of 1, its support would then be 5.

References

[R264](1, 2) Margush T, McMorris FR. (1981) “Consensus n-trees.” Bulletin for Mathematical Biology 43(2) 239-44.
[R265](1, 2) http://evolution.genetics.washington.edu/phylip/doc/consense.html

Examples

Computing the majority consensus, using the example from the Phylip manual with the exception that we are computing majority rule and not majority rule extended.

>>> from skbio.tree import TreeNode
>>> from io import StringIO
>>> trees = [
... TreeNode.read(StringIO("(A,(B,(H,(D,(J,(((G,E),(F,I)),C))))));")),
... TreeNode.read(StringIO("(A,(B,(D,((J,H),(((G,E),(F,I)),C)))));")),
... TreeNode.read(StringIO("(A,(B,(D,(H,(J,(((G,E),(F,I)),C))))));")),
... TreeNode.read(StringIO("(A,(B,(E,(G,((F,I),((J,(H,D)),C))))));")),
... TreeNode.read(StringIO("(A,(B,(E,(G,((F,I),(((J,H),D),C))))));")),
... TreeNode.read(StringIO("(A,(B,(E,((F,I),(G,((J,(H,D)),C))))));")),
... TreeNode.read(StringIO("(A,(B,(E,((F,I),(G,(((J,H),D),C))))));")),
... TreeNode.read(StringIO("(A,(B,(E,((G,(F,I)),((J,(H,D)),C)))));")),
... TreeNode.read(StringIO("(A,(B,(E,((G,(F,I)),(((J,H),D),C)))));"))]
>>> consensus = majority_rule(trees, cutoff=0.5)[0]
>>> for node in sorted(consensus.non_tips(),
...                    key=lambda k: k.count(tips=True)):
...     support_value = node.support
...     names = ' '.join(sorted(n.name for n in node.tips()))
...     print("Tips: %s, support: %s" % (names, support_value))
Tips: F I, support: 9.0
Tips: D H J, support: 6.0
Tips: C D H J, support: 6.0
Tips: C D F G H I J, support: 6.0
Tips: C D E F G H I J, support: 9.0
Tips: B C D E F G H I J, support: 9.0

In the next example, multiple trees will be returned which can happen if clades are not well supported across the trees. In addition, this can arise if not all tips are present across all trees.

>>> trees = [
...     TreeNode.read(StringIO("((a,b),(c,d),(e,f));")),
...     TreeNode.read(StringIO("(a,(c,d),b,(e,f));")),
...     TreeNode.read(StringIO("((c,d),(e,f),b);")),
...     TreeNode.read(StringIO("(a,(c,d),(e,f));"))]
>>> consensus_trees = majority_rule(trees)
>>> len(consensus_trees)
4