09 Feb 2017
Dear MDAnalysis Community,
We feel the MDAnalysis community is a happy one, and we hope all members
feel the same. As the community grows we want it to stay as welcoming.
In that spirit, MDAnalysis has adopted a
Code of Conduct (CoC) to make
explicit that we as a community value all our present and future
members and embrace our diversity. We are committed to providing a
productive, harassment-free environment for everyone
Please read the full text at
http://mdanalysis.org/pages/conduct/.
All core developers absolutely stand behind the CoC and are fully
committed to ensuring and enforcing that everyone (founders,
developers, anyone communicating on the lists or on GitHub, …)
respects each other as outlined in the CoC.
But it’s not only core developers who are responsible for making our’s
an inclusive community in which everyone feels they can productively
participate. It actually need to be all of us. A community is based on
the interactions of individuals and the sum of their values defines
their community. The CoC means that everyone in MDAnalysis agrees to
share a common set of core values and, importantly, to act according
to these values in all interactions in our community.
If you feel you suffered from harassment or intimidation or were made
feel uncomfortable then please speak up: the CoC has links to
communicate safely with a
subset of the core developers who have taken on the responsibility to
follow up on CoC violations. If you observe that someone else is being
harassed please speak up – during the incident, if possible, or
through the above channels.
Establishing the CoC was not prompted by any incident. Rather, the
core developers recognized that making explicit the rules and values
of our community is simply a good idea, and we follow many of the
major open source projects such as Django and Jupyter (from which we
heavily borrowed –
thank you!).
— @MDAnalysis/coredevs
17 Dec 2016
We are happy to announce that the ENCORE ensemble similarity library has
been integrated in the next version of MDAnalysis as
MDAnalysis.analysis.encore.
ENCORE implements a variety of techniques for calculating similarity measures
between structural ensembles in the form of trajectories, as described in:
Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE:
Software for Quantitative Ensemble Comparison. PLoS Comput Biol 11(10):
e1004415.
doi:10.1371/journal.pcbi.1004415
.
The similarity measures are based on the same fundamental principle, i.e.
estimating the probability density of conformational states of proteins from
the available ensemble data and comparing such densities using measures of
distance between probability distributions, such as the Jensen-Shannon
divergence. ENCORE implements three similarity measures: HES (Harmonic
ensemble similarity), CES (Clustering ensemble similarity) and DRES (
Dimensionality reduction ensemble similarity). In HES the structures of the
ensembles are seen as samples from a multivariate normal distribution, whose
parameters are estimated based on the available data. CES partitions the
conformational space of all the ensembles in clusters and uses the relative
occurrence of the ensembles in the clusters to estimate the probability
density. DRES uses a kernel-density estimate from the ensembles, which is run
on a dimensionally-reduced version of the conformational space.
The ENCORE package implements the similarity measures themselves together with
a number of other algorithms and features, also available standalone. ENCORE
implements:
- the three similarity measures, HES, CES and DRES, each accessible with a
single function call
- trajectory convergence estimation based on CES and DRES, also accessible
with a single function call
- the Covariance estimators (Maximum likelihood and Shrinkage), Affinity
Propagation clustering algorithm and the Stochastic proximity embedding
dimensionality reduction method
- plug-in interfaces that support the use of other clustering or
dimensionality reduction algorithms from the scikit-learn package to be used instead of our own implementations,
which makes it easy to switch between clustering and dimensionality reduction
algorithms when using the
ces
and dres
functions.
- a parallel multi-core RMSD matrix calculator
- tools to estimate the robustness of the methods via bootstrapping of the
ensembles or similarity matrices. This can be useful to estimate the error
associated with the size of the ensembles used as well as to assess the
robustness of the implemented algorithms.
Details on implementation, use-cases and expected performance
can be found in 10.1371/journal.pcbi.1004415.
The HES method is the fastest and least general of the three, as its
performance depends on how well the probability of distribution underlying
the ensembles can be modeled as a simple multivariate normal, which is not
necessarily guaranteed for simulation trajectories. CES and DRES don’t rely on
this assumption, however they both require the calculation of a full RMSD
matrix for all the ensembles to be compared as well as clustering or
dimensionality reduction, respectively, on the conformational space, and have
thus higher requirements in terms of computation time and memory.
Using the similarity measures is simply a matter of loading the trajectories
or experimental ensembles that one would like to compare as
MDAnalysis.Universe objects:
>>> from MDAnalysis import Universe
>>> import MDAnalysis.analysis.encore as encore
>>> from MDAnalysis.tests.datafiles import PSF, DCD, DCD2
>>> u1 = Universe(PSF, DCD)
>>> u2 = Universe(PSF, DCD2)
and running the similarity measures on them, as for instance using the
Harmonic Ensemble Similarity measure (encore.hes()):
>>> hes_similarities, details = encore.hes([u1, u2])
>>> print hes_similarities
[[ 0. 38279683.9587939]
[ 38279683.9587939 0. ]]
Similarities are written in a square symmetric matrix having the same
dimensions and ordering as the input list, with each element being the
similarity value for a pair of the input ensembles. Other available measures
are CES (encore.ces())
and DRES (encore.dres()).
The details
variable contains extra information about the calculation that
has been performed: with HES, it contains the parameters of the estimated
probability distributions; with CES, it contains the output of clustering;
with DRES, it contains the embedded space.
The clustering and dimensionality reduction functionality is also directly
available through the cluster
and reduce_dimensionality
functions.
For instance, to cluster the conformations
from the two universes defined above, we can write:
>>> cluster_collection = encore.cluster([u1,u2])
>>> print cluster_collection
0 (size:5,centroid:1): array([ 0, 1, 2, 3, 98])
1 (size:5,centroid:6): array([4, 5, 6, 7, 8])
2 (size:7,centroid:12): array([ 9, 10, 11, 12, 13, 14, 15])
…
Here each cluster element is a conformation belonging to an ensemble; the cluster_collection
object keeps track, for each element, both of the standard cluster membership information and of the ensemble it belongs to, making it possible to evaluate how the different trajectories are represented in each cluster.
By default ENCORE uses our implementation of the Affinity Propagation
algorithm, but that can be changed as desired by the user to the others
available in scikit-learn, which are automatically loaded into ENCORE if
available.
For instance:
>>> cluster_collection =
encore.cluster(
[ens1,ens2],
method=encore.DBSCAN())
in the same way, it is possible use dimensionality reduction algorithm other
than the default Stochastic proximity embedding:
>>> coordinates, details =
encore.reduce_dimensionality(
[ens1,ens2],
method=encore.PrincipalComponentAnalysis(dimension=2))
Similar options in encore.ces()
and encore.dres() make it easy to change the algorithm that will be used by the methods on the
fly.
For further details, see the documentation of the individual functions within
ENCORE:
@mtiberti & @kain88-de
30 Nov 2016
The upcoming 0.16.0 release of MDAnalysis will have
support for MMTF!
MMTF is a new format designed to provide compact, efficient and fast browsing of the Protein
Data Bank.
Support for MMTF within MDAnalysis is offered through reading locally stored MMTF files,
or through fetching the file directly from the PDB archive through the new
MDAnalysis.fetch_mmtf
function.
import MDAnalysis as mda
# Load a local file
u = mda.Universe('myfile.mmtf')
# Or download directly from PDB by providing PBD id
u = mda.fetch_mmtf('3J3Q')
The performance of loading MMTF files is a large improvement over traditional ascii PDB files,
with the above system of approximately 2.4M atoms taking under 10 seconds to load.
The compressed format and efficient algorithms
for storing the data mean that downloading structures will also require much less bandwidth,
making this possible even on slow connections.
MMTF files can support many different models for a given structure and this is made available
through the .models
attribute of a MDAnalysis Universe. This provides a list of AtomGroup
objects each representing a different model. These models are able to each have a
different topology.
from __future__ import print_function
import MDAnalysis as mda
u = mda.fetch_mmtf('4P3R')
print("This file has {} models".format(len(u.models)))
# Iterate over all models
for model in u.models:
# analyse each model!
# Select atoms in a given molecule
ag = u.select_atoms('model 4 and name Ca')
Finally, full interoperability between different formats is provided in MDAnalysis, allowing
MMTF files to be written to any of our
supported formats.
For example to download a MMTF file and write out to Gromacs GRO file:
import MDAnalysis as mda
u = mda.fetch_mmtf('4AKE')
u.atoms.write('4ake.gro')
Trying this out today
These features will all be in the upcoming 0.16.0 release of MDAnalysis, but if you can’t wait for that,
it is also possible to install the latest development version.
For full instructions on how to install the development version, see our guide on
installing MDAnalysis from source.
— @richardjgowers