01 Mar 2017
We are glad to announce that, since February 2017, MDAnalysis is officially
a NumFOCUS affiliated
project. With this affiliation, MDAnalysis establishes itself as part of the
wider scientific python ecosystem, and we hope it will open up new opportunities
in the future.
NumFOCUS is a 501(c)(3) nonprofit that supports and
promotes world-class, innovative, open source scientific computing, reproducible
research, and education in data science.
— @MDAnalysis/coredevs
09 Feb 2017
Dear MDAnalysis Community,
We feel the MDAnalysis community is a happy one, and we hope all members
feel the same. As the community grows we want it to stay as welcoming.
In that spirit, MDAnalysis has adopted a
Code of Conduct (CoC) to make
explicit that we as a community value all our present and future
members and embrace our diversity. We are committed to providing a
productive, harassment-free environment for everyone
Please read the full text at
http://mdanalysis.org/pages/conduct/.
All core developers absolutely stand behind the CoC and are fully
committed to ensuring and enforcing that everyone (founders,
developers, anyone communicating on the lists or on GitHub, …)
respects each other as outlined in the CoC.
But it’s not only core developers who are responsible for making our’s
an inclusive community in which everyone feels they can productively
participate. It actually need to be all of us. A community is based on
the interactions of individuals and the sum of their values defines
their community. The CoC means that everyone in MDAnalysis agrees to
share a common set of core values and, importantly, to act according
to these values in all interactions in our community.
If you feel you suffered from harassment or intimidation or were made
feel uncomfortable then please speak up: the CoC has links to
communicate safely with a
subset of the core developers who have taken on the responsibility to
follow up on CoC violations. If you observe that someone else is being
harassed please speak up – during the incident, if possible, or
through the above channels.
Establishing the CoC was not prompted by any incident. Rather, the
core developers recognized that making explicit the rules and values
of our community is simply a good idea, and we follow many of the
major open source projects such as Django and Jupyter (from which we
heavily borrowed –
thank you!).
— @MDAnalysis/coredevs
17 Dec 2016
We are happy to announce that the ENCORE ensemble similarity library has
been integrated in the next version of MDAnalysis as
MDAnalysis.analysis.encore.
ENCORE implements a variety of techniques for calculating similarity measures
between structural ensembles in the form of trajectories, as described in:
Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE:
Software for Quantitative Ensemble Comparison. PLoS Comput Biol 11(10):
e1004415.
doi:10.1371/journal.pcbi.1004415
.
The similarity measures are based on the same fundamental principle, i.e.
estimating the probability density of conformational states of proteins from
the available ensemble data and comparing such densities using measures of
distance between probability distributions, such as the Jensen-Shannon
divergence. ENCORE implements three similarity measures: HES (Harmonic
ensemble similarity), CES (Clustering ensemble similarity) and DRES (
Dimensionality reduction ensemble similarity). In HES the structures of the
ensembles are seen as samples from a multivariate normal distribution, whose
parameters are estimated based on the available data. CES partitions the
conformational space of all the ensembles in clusters and uses the relative
occurrence of the ensembles in the clusters to estimate the probability
density. DRES uses a kernel-density estimate from the ensembles, which is run
on a dimensionally-reduced version of the conformational space.
The ENCORE package implements the similarity measures themselves together with
a number of other algorithms and features, also available standalone. ENCORE
implements:
- the three similarity measures, HES, CES and DRES, each accessible with a
single function call
- trajectory convergence estimation based on CES and DRES, also accessible
with a single function call
- the Covariance estimators (Maximum likelihood and Shrinkage), Affinity
Propagation clustering algorithm and the Stochastic proximity embedding
dimensionality reduction method
- plug-in interfaces that support the use of other clustering or
dimensionality reduction algorithms from the scikit-learn package to be used instead of our own implementations,
which makes it easy to switch between clustering and dimensionality reduction
algorithms when using the
ces
and dres
functions.
- a parallel multi-core RMSD matrix calculator
- tools to estimate the robustness of the methods via bootstrapping of the
ensembles or similarity matrices. This can be useful to estimate the error
associated with the size of the ensembles used as well as to assess the
robustness of the implemented algorithms.
Details on implementation, use-cases and expected performance
can be found in 10.1371/journal.pcbi.1004415.
The HES method is the fastest and least general of the three, as its
performance depends on how well the probability of distribution underlying
the ensembles can be modeled as a simple multivariate normal, which is not
necessarily guaranteed for simulation trajectories. CES and DRES don’t rely on
this assumption, however they both require the calculation of a full RMSD
matrix for all the ensembles to be compared as well as clustering or
dimensionality reduction, respectively, on the conformational space, and have
thus higher requirements in terms of computation time and memory.
Using the similarity measures is simply a matter of loading the trajectories
or experimental ensembles that one would like to compare as
MDAnalysis.Universe objects:
>>> from MDAnalysis import Universe
>>> import MDAnalysis.analysis.encore as encore
>>> from MDAnalysis.tests.datafiles import PSF, DCD, DCD2
>>> u1 = Universe(PSF, DCD)
>>> u2 = Universe(PSF, DCD2)
and running the similarity measures on them, as for instance using the
Harmonic Ensemble Similarity measure (encore.hes()):
>>> hes_similarities, details = encore.hes([u1, u2])
>>> print hes_similarities
[[ 0. 38279683.9587939]
[ 38279683.9587939 0. ]]
Similarities are written in a square symmetric matrix having the same
dimensions and ordering as the input list, with each element being the
similarity value for a pair of the input ensembles. Other available measures
are CES (encore.ces())
and DRES (encore.dres()).
The details
variable contains extra information about the calculation that
has been performed: with HES, it contains the parameters of the estimated
probability distributions; with CES, it contains the output of clustering;
with DRES, it contains the embedded space.
The clustering and dimensionality reduction functionality is also directly
available through the cluster
and reduce_dimensionality
functions.
For instance, to cluster the conformations
from the two universes defined above, we can write:
>>> cluster_collection = encore.cluster([u1,u2])
>>> print cluster_collection
0 (size:5,centroid:1): array([ 0, 1, 2, 3, 98])
1 (size:5,centroid:6): array([4, 5, 6, 7, 8])
2 (size:7,centroid:12): array([ 9, 10, 11, 12, 13, 14, 15])
…
Here each cluster element is a conformation belonging to an ensemble; the cluster_collection
object keeps track, for each element, both of the standard cluster membership information and of the ensemble it belongs to, making it possible to evaluate how the different trajectories are represented in each cluster.
By default ENCORE uses our implementation of the Affinity Propagation
algorithm, but that can be changed as desired by the user to the others
available in scikit-learn, which are automatically loaded into ENCORE if
available.
For instance:
>>> cluster_collection =
encore.cluster(
[ens1,ens2],
method=encore.DBSCAN())
in the same way, it is possible use dimensionality reduction algorithm other
than the default Stochastic proximity embedding:
>>> coordinates, details =
encore.reduce_dimensionality(
[ens1,ens2],
method=encore.PrincipalComponentAnalysis(dimension=2))
Similar options in encore.ces()
and encore.dres() make it easy to change the algorithm that will be used by the methods on the
fly.
For further details, see the documentation of the individual functions within
ENCORE:
@mtiberti & @kain88-de