Support for MMTF has arrived in MDAnalysis!

Macromolecular Transmission Format support

The upcoming 0.16.0 release of MDAnalysis will have support for MMTF! MMTF is a new format designed to provide compact, efficient and fast browsing of the Protein Data Bank. Support for MMTF within MDAnalysis is offered through reading locally stored MMTF files, or through fetching the file directly from the PDB archive through the new MDAnalysis.fetch_mmtf function.

import MDAnalysis as mda

# Load a local file
u = mda.Universe('myfile.mmtf')

# Or download directly from PDB by providing PBD id
u = mda.fetch_mmtf('3J3Q')

The performance of loading MMTF files is a large improvement over traditional ascii PDB files, with the above system of approximately 2.4M atoms taking under 10 seconds to load. The compressed format and efficient algorithms for storing the data mean that downloading structures will also require much less bandwidth, making this possible even on slow connections.

MMTF files can support many different models for a given structure and this is made available through the .models attribute of a MDAnalysis Universe. This provides a list of AtomGroup objects each representing a different model. These models are able to each have a different topology.

from __future__ import print_function
import MDAnalysis as mda

u = mda.fetch_mmtf('4P3R')

print("This file has {} models".format(len(u.models)))

# Iterate over all models
for model in u.models:
    # analyse each model!

# Select atoms in a given molecule
ag = u.select_atoms('model 4 and name Ca')

Finally, full interoperability between different formats is provided in MDAnalysis, allowing MMTF files to be written to any of our supported formats. For example to download a MMTF file and write out to Gromacs GRO file:

import MDAnalysis as mda

u = mda.fetch_mmtf('4AKE')


Trying this out today

These features will all be in the upcoming 0.16.0 release of MDAnalysis, but if you can’t wait for that, it is also possible to install the latest development version. For full instructions on how to install the development version, see our guide on installing MDAnalysis from source.


Release 0.15.0

We have just released MDAnalysis version 0.15.0. This release contains new features as well as bug fixes. Highlights are listed below but for more details see the release notes.

We also had a lot of contributions from GSoC applicants. Thanks to our GSoC students @fiona-naughton and @jdetle as well as the other applicants @saxenauts, @endle, @abhinavgupta94 and @pedrishi.


You can upgrade with pip install --upgrade MDAnalysis

Noticable Changes

Deprecations in anticipation of new topology system

The next release (0.16.0) will bring a very big change to the internal workings of MDAnalysis. The topology system (how atom, residue, segment attributes are stored and manipulated) has been completely redesigned, giving many advantages and resolving a number of longstanding issues. The new system also brings speed (up to 40x faster) and memory (up to 60% less usage) improvements, and makes the core of MDAnalysis easier to maintain for the future. You can read more about this at the original issue, or see a short summary of the new system on the wiki.

In preparation for this change, we have introduced deprecation warnings for all components of the existing topology system suggesting the corresponding usage under the new system. Please adjust existing code as you encounter these warnings.

Revamped Contact Analysis

Contact Analysis has been completely rewritten in the new Contacts class. This class offers a standard native contact analysis as well as a contact analysis developed by Best & Hummer. A Q1-Q2 analysis is now available directly as q1q2. We have also made the Contacts extendable so that you can pass it your own cut-off functions, the q1q2 analysis is actually only a wrapper of Contacts that makes use of this flexibility. More information can be found in the documentation.

The old ContactAnalysis1 and ContactAnalysis classes will be removed in the next release.

RMSD Calculation

The rmsd function now doesn’t super position the given coordinates by default. The coordinates aren’t changed now by default, instead you can control it with the new center and superposition keywords.

PDB Format

Our own PDB parser has seen a lot of love in the last year. It has been the default for a long time now and all problems that occur for PDB’s are fixed only in this parser. Because of that we have removed the Biopython PDB parser. This means the permissive keyword argument for Universes isn’t used anymore.

We have also spent time tuning the performance of the PDB trajectory reader. Reading long PDB trajectories is now significantly faster. Your exact speed up depends on the length of the trajectory. For 1000 frames we have measured a speed up of 10000%. See PR #849 and Issue #848 for more details.

Additionally we have added .ent files to the list of supported PDB file formats.

Other Changes

A list of all changes can be found in the CHANGELOG.

GSoC Projects

Python Software Foundation Google Summer of Code 2016

We are happy to anounce that MDAnalysis is hosting two GSoC students for the PSF this year, Fiona Naughton and John Detlefs.

Fiona Naughton: Umbrella simulations with MDAnalysis

Fiona Naughton

Umbrella Sampling is an MD technique which involves performing a series of simulations in which a reaction coordinate , such as the distance between two molecules, is restrained to different values. A method to analyse the combination of the simulations is weighted histogram analysis method (WHAM). Fiona will work on adding WHAM to the analysis module to calculate different observables.

Fiona Naughton is doing a PhD in Biochemistry at the University of Oxford with the Structural Bioinformatics and Computational Biochemistry Unit, studying protein/membrane interactions through molecular dynamics simulations. She has started her own blog and can be found on github under @fiona-naughton. She plans to continue to a career in the same field and outside academia enjoys reading, baking and handicraft.

John Detlefs: Dimensionality Reduction

John Detlefs

MD-simulations produce data with several thousand dimensions, from which we want to learn something new about how proteins work and how they interact with each other. Fortunaly for us in a lot of cases one or two dimensions are enough to describe a specific function of a protein. Dimensionionality reduction algorithms help us to find projections into low dimensional subspaces that capture the relevant motions. John has chosen to implement principal component analysis and diffusion maps.

John Detlefs is a Mathematics and Chemistry double major at California Polytechnic State University, San Luis Obispo. His blog can be found on his website and on github he is @jdetle. When not contributing to MDAnalysis, John can be found reading a good book or enjoying one of the many outdoor activities California has to offer. After graduating in June 2016, John plans on pursuing a career in scientific computing.

In the next weeks they will both further refine their projects and setup personal blogs to update anyone interested about their progress during the summer. We plan to have all discussions public on the devel mailinglist.