Blog

MDAnalysis is a NumFOCUS affiliated project

NumFOCUS Foundation

We are glad to announce that, since February 2017, MDAnalysis is officially a NumFOCUS affiliated project. With this affiliation, MDAnalysis establishes itself as part of the wider scientific python ecosystem, and we hope it will open up new opportunities in the future.

NumFOCUS is a 501(c)(3) nonprofit that supports and promotes world-class, innovative, open source scientific computing, reproducible research, and education in data science.

@MDAnalysis/coredevs

Our Code of Conduct

Dear MDAnalysis Community,

We feel the MDAnalysis community is a happy one, and we hope all members feel the same. As the community grows we want it to stay as welcoming. In that spirit, MDAnalysis has adopted a Code of Conduct (CoC) to make explicit that we as a community value all our present and future members and embrace our diversity. We are committed to providing a productive, harassment-free environment for everyone

Please read the full text at http://mdanalysis.org/pages/conduct/.

All core developers absolutely stand behind the CoC and are fully committed to ensuring and enforcing that everyone (founders, developers, anyone communicating on the lists or on GitHub, …) respects each other as outlined in the CoC.

But it’s not only core developers who are responsible for making our’s an inclusive community in which everyone feels they can productively participate. It actually need to be all of us. A community is based on the interactions of individuals and the sum of their values defines their community. The CoC means that everyone in MDAnalysis agrees to share a common set of core values and, importantly, to act according to these values in all interactions in our community.

If you feel you suffered from harassment or intimidation or were made feel uncomfortable then please speak up: the CoC has links to communicate safely with a subset of the core developers who have taken on the responsibility to follow up on CoC violations. If you observe that someone else is being harassed please speak up – during the incident, if possible, or through the above channels.

Establishing the CoC was not prompted by any incident. Rather, the core developers recognized that making explicit the rules and values of our community is simply a good idea, and we follow many of the major open source projects such as Django and Jupyter (from which we heavily borrowed – thank you!).

@MDAnalysis/coredevs

ENCORE ensemble similarity

We are happy to announce that the ENCORE ensemble similarity library has been integrated in the next version of MDAnalysis as MDAnalysis.analysis.encore.

ENCORE implements a variety of techniques for calculating similarity measures between structural ensembles in the form of trajectories, as described in:

Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE: Software for Quantitative Ensemble Comparison. PLoS Comput Biol 11(10): e1004415. doi:10.1371/journal.pcbi.1004415 .

The similarity measures are based on the same fundamental principle, i.e. estimating the probability density of conformational states of proteins from the available ensemble data and comparing such densities using measures of distance between probability distributions, such as the Jensen-Shannon divergence. ENCORE implements three similarity measures: HES (Harmonic ensemble similarity), CES (Clustering ensemble similarity) and DRES ( Dimensionality reduction ensemble similarity). In HES the structures of the ensembles are seen as samples from a multivariate normal distribution, whose parameters are estimated based on the available data. CES partitions the conformational space of all the ensembles in clusters and uses the relative occurrence of the ensembles in the clusters to estimate the probability density. DRES uses a kernel-density estimate from the ensembles, which is run on a dimensionally-reduced version of the conformational space.

The ENCORE package implements the similarity measures themselves together with a number of other algorithms and features, also available standalone. ENCORE implements:

  • the three similarity measures, HES, CES and DRES, each accessible with a single function call
  • trajectory convergence estimation based on CES and DRES, also accessible with a single function call
  • the Covariance estimators (Maximum likelihood and Shrinkage), Affinity Propagation clustering algorithm and the Stochastic proximity embedding dimensionality reduction method
  • plug-in interfaces that support the use of other clustering or dimensionality reduction algorithms from the scikit-learn package to be used instead of our own implementations, which makes it easy to switch between clustering and dimensionality reduction algorithms when using the ces and dres functions.
  • a parallel multi-core RMSD matrix calculator
  • tools to estimate the robustness of the methods via bootstrapping of the ensembles or similarity matrices. This can be useful to estimate the error associated with the size of the ensembles used as well as to assess the robustness of the implemented algorithms.

Details on implementation, use-cases and expected performance can be found in 10.1371/journal.pcbi.1004415.

The HES method is the fastest and least general of the three, as its performance depends on how well the probability of distribution underlying the ensembles can be modeled as a simple multivariate normal, which is not necessarily guaranteed for simulation trajectories. CES and DRES don’t rely on this assumption, however they both require the calculation of a full RMSD matrix for all the ensembles to be compared as well as clustering or dimensionality reduction, respectively, on the conformational space, and have thus higher requirements in terms of computation time and memory.

Using the similarity measures is simply a matter of loading the trajectories or experimental ensembles that one would like to compare as MDAnalysis.Universe objects:

>>> from MDAnalysis import Universe
>>> import MDAnalysis.analysis.encore as encore
>>> from MDAnalysis.tests.datafiles import PSF, DCD, DCD2
>>> u1 = Universe(PSF, DCD)
>>> u2 = Universe(PSF, DCD2)

and running the similarity measures on them, as for instance using the Harmonic Ensemble Similarity measure (encore.hes()):

>>> hes_similarities, details = encore.hes([u1, u2])
>>> print hes_similarities
[[        0.         38279683.9587939]
 [ 38279683.9587939         0.       ]]

Similarities are written in a square symmetric matrix having the same dimensions and ordering as the input list, with each element being the similarity value for a pair of the input ensembles. Other available measures are CES (encore.ces()) and DRES (encore.dres()). The details variable contains extra information about the calculation that has been performed: with HES, it contains the parameters of the estimated probability distributions; with CES, it contains the output of clustering; with DRES, it contains the embedded space.

The clustering and dimensionality reduction functionality is also directly available through the cluster and reduce_dimensionality functions.

For instance, to cluster the conformations from the two universes defined above, we can write:

>>> cluster_collection = encore.cluster([u1,u2])
>>> print cluster_collection
0 (size:5,centroid:1): array([ 0,  1,  2,  3, 98])
1 (size:5,centroid:6): array([4, 5, 6, 7, 8])
2 (size:7,centroid:12): array([ 9, 10, 11, 12, 13, 14, 15])

Here each cluster element is a conformation belonging to an ensemble; the cluster_collection object keeps track, for each element, both of the standard cluster membership information and of the ensemble it belongs to, making it possible to evaluate how the different trajectories are represented in each cluster.

By default ENCORE uses our implementation of the Affinity Propagation algorithm, but that can be changed as desired by the user to the others available in scikit-learn, which are automatically loaded into ENCORE if available.

For instance:

>>> cluster_collection =
    encore.cluster(
    [ens1,ens2],
    method=encore.DBSCAN())

in the same way, it is possible use dimensionality reduction algorithm other than the default Stochastic proximity embedding:

>>> coordinates, details =
    encore.reduce_dimensionality(
    [ens1,ens2],
    method=encore.PrincipalComponentAnalysis(dimension=2))

Similar options in encore.ces() and encore.dres() make it easy to change the algorithm that will be used by the methods on the fly.

For further details, see the documentation of the individual functions within ENCORE:

@mtiberti & @kain88-de