Blog

Google Summer of Code Student 2017

We are happy to anounce that MDAnalysis is hosting a GSoC student for NumFOCUS this year, Utkarsh Bansal (@utkbansal on GitHub), with his project “Port to pytest”.

Utkarsh Bansal: Port unit tests to pytest

Utkarsh Bansal

Utkarsh will port our complete unit tests from nose to pytest. This is a massive undertaking for MDAnalysis with over 4000 individual tests. But we have great confidence in him and he has started work already to ensure that we don’t have a drop in code coverage during the transition. Newer projects under the MDAnalysis umbrella all use pytests and we are happy to see the switch happening for MDAnalysis as well. Utkarsh will blog continuously during the summer to let you know how far the transition has come and how to best write unit-tests in python.

Utkarsh is currently pursuing a bachelors in Computer Science and Engineering and will be graduating this summer. He hopes to learn new things about python and testing in general this summer and is planning to continue his career as a software developer.

Other NumFOCUS students

NumFOCUS is hosting 12 students this year for several of their supported and affiliated projects. You can find out about the other students here.

NumFOCUS small grant for Python 3 support

NumFOCUS Foundation

We have generously been awarded a small development grant by NumFOCUS to fully support Python 3. To do this Richard Gowers and Tyler Reddy will be hosted at Oliver Beckstein’s lab at Arizona State University in the summer for a week of hacking.

MDAnalysis started almost 10 years ago when Python was around version 2.4 and interfacing with existing C code was mostly done by writing C-wrappers that directly used CPython. This legacy code has hampered a speedy full transition to Python 3 and consequently MDAnalysis lags behind the rest of the scientific Python community in fully supporting Python 3. Although about 80% of code passes unit tests in Python 3, we urgently need to close the remaining 20% gap in order to support our user base and to safeguard the long term viability of the project.

In the meantime we are busy porting our last Python 2.7 only C-extension, the DCD Reader and Writer, to Cython. We now have a working Cython version that can be used without MDAnalysis, similar to our XTC and TRR readers. Only a clean up of the new Cython / DCD handling code and updated documentation is required. You can check our progress here.

Release 0.16.0

We have just released MDAnalysis version 0.16.0. This release contains new features as well as bug fixes. Highlights are listed below but for more details see the release notes.

This is the biggest release we ever had with tons of new features. It includes a rewrite of the topology system, the work of our GSoC students Fiona Naughton (@fiona-naughton) and John Detlefs (@jdetle), a complete new ensemble analysis with encore and much more. In total 28 people contributed 1904 commits to this release. We closed 162 issues and merged 199 pull requests.

Upgrade

You can upgrade with pip install --upgrade MDAnalysis . If you use the conda package manager run conda update -c conda-forge mdanalysis

Noticable Changes

You can find a notebook with code example of the changes here.

Rewrite of our internal topology representation

This is the change we are most exited about for this release. It brings performance enhancements, makes maintaining the code easier, it’s easier to extend and allowed us a simplification of the interface. We have previously written about the details of new topology system.

But with this change also a lot of deprecated functions have been removed. For all of the deprecated functions the new replacements already exists and you will get a warning with a suggested change. The easiest way to check if your scripts will run without a problem after the update is to include the following on top of it.

import warnings
warnings.filterwarnings('always', module='MDAnalysis')

This will print a warning for every deprecated function you are using together with a short code snippet how your code has to be changed. If you do not want to upgrade your existing scripts we posted a guide how to use conda and python environments to run different versions of MDAnalysis on the same computer.

Attach arbitrary time series to your trajectories

Our GSoC student @fiona-naughton has implemented an auxillary reader to add arbitrary time series to a universe. The time series are kept in sync with the trajectory so it is possible to iterate through the trajectory and access the auxiliary data corresponding to the current time step.

import MDAnalysis as mda
from MDAnalysisTests.datafiles import PDB_sub_sol, XTC_sub_sol, XVG_BZ2

# Create your universe as usual
universe = mda.Universe(PDB_sub_sol, XTC_sub_sol)
# Attach an auxiliary time serie with the name `forces`
# In this example, the XVG file contains the force that applies to each atom.
universe.trajectory.add_auxiliary('forces', XVG_BZ2)
# Itarete through your trajectory, the time serie is kept in sync
for time_step in universe.trajectory:
    print(time_step.aux.forces)
# The first element of each array is the time in picoseconds.
# The next elements are the other columns of the XVG file.

@fiona-naugthon worked at offering several convenient way to iterate through your data. Read the documentation or Fiona’s blog posts to learn more about the feature.

This feature is still in its beginning and will be expanded in future releases. You can follow the conversation on the initial issue or on the pull request. So far, only the XVG format used by gromacs and grace are supported. Open an issue if you need support for other time series formats.

Do a dimension reduction with PCA and Diffusion Maps

@jdetle has implemented two new dimension reduction algorithms, Principal Component Analysis and Diffusion Maps. Both can be found in the analysis submodule. As an example lets look at the first two PCA dimensions of ADK from our test files.

import matplotlib.pyplot as plt
import MDAnalyis as mda
from MDAnalysis.analysis.pca import PCA
from MDAnalyisTests.datafiles import PSF, DCD

plt.style.use('ggplot')

u = mda.Universe(PSF, DCD)
pca = PCA(u, select='protein and name CA', verbose=True).run()
reduced_data = pca.transform(ca, n_components=2)

f, ax = plt.subplots()
ax.plot(d[:, 0], d[:, 1], 'o')
ax.set(xlabel=r'PC$_1$ [$\AA$]', ylabel=r'PC$_2$ [$\AA$]', title='PCA of ADK')

PCA projection

Convenience functions to create a new analysis

A while back we introduced a new frame work for analysis to unify the API for the different analysis methods we offer. With this release we also add a new class AnalysisFromFunction to make it easier to calculate observables from a simulation. Now code like this with a handwritten loop.

result = []
for ts in u.trajectory:
    result.append(u.atoms.center_of_geometry())
results = np.asarray(results)

Can now be converted into this.

from MDAnalyis.analysis.base import AnalysisFromFunction
cog = AnalysisFromFunction(lambda ag : ag.center_of_geometry(), u.atoms).run()
cog.results

This class also takes arguments to adjust the iteration (start,stop,step) and you can add verbosity with verbose=True . You will also profit from any performance improvements in the analysis class in the future without changing your code. If you have a specific observable that you want to calculate several times you can also create a new analysis class with analysis_class like this.

from MDAnalyis.analysis.base import analysis_class

def cog(ag):
    return ag.center_of_geometry()

COG = analysis_class(cog)

cog_results = COG(u.atoms, step=2, verbose=True).run()

Speed improvements in RMSD

Thanks for work from our NSF REU student @rbrtdlgd our RMSD calculations are about 40% faster now. If you are using the low-level qcprot algorithm yourself instead of our provided wrappers you have to change your code since the API has changed. For more see the release notes.

MemoryReader: Reading trajectories from memory

MDAnalysis typically reads trajectories from files on-demand, so that it can efficiently deal with large trajectories - even those that do not fit in memory. However, in some cases, both for convenience and for efficiency, it can be an advantage to work with trajectories directly in memory. In this release, we have introduced a MemoryReader, which makes this possible. This Reader has been originally implemented in the encore package.

The MemoryReader works with numpy arrays, using the same format as that used by for instance DCDReader.timeseries(). You can create a Universe directly from such an array:

import numpy as np
from MDAnalysis import Universe
from MDAnalysisTests.datafiles import DCD, PSF
from MDAnalysis.coordinates.memory import MemoryReader

# Create a Universe using a DCD reader
universe = Universe(PSF, DCD)

# Create a numpy array with random coordinates (100 frames) for the same topology
coordinates = np.random.uniform(size=(100, universe.atoms.n_atoms, 3)).cumsum(0)

# Create a new Universe directly from these coordinates
universe2 = Universe(PSF, coordinates, format=MemoryReader)

The MemoryReader will work just as any other reader. In particular, you can iterate over it as usual, or use the .timeseries() method to retrieve a reference to the raw array:

coordinates_fac = universe2.trajectory.timeseries(format='fac')

Certain operations can be speeded up by moving a trajectory to memory, and we have therefore added functionality to directly transfer any existing trajectory to a MemoryReader using Universe.transfer_to_memory:

universe = Universe(PSF, DCD)
# Switches to a MemoryReader representation
universe.transfer_to_memory() 

You can also do this directly upon construction of a Universe, by using the in_memory flag:

universe = Universe(PSF, DCD, in_memory=True)

Likewise, the AlignTraj class in the analysis/align.py module also has an in_memory flag, allowing it to do in-place alignments.

Others

We also blogged since the start of the year about features of the upcoming release.

Minor Enhancements

  • No more deprecation warning spam when MDAnalyis is imported
  • analysis.align has a new AlignTraj class following the analysis class style
  • all new analysis classes now print additional information with the verbose=True.
  • RMSD has been ported to the new analysis class style

Other Changes

A list of all changes can be found in the CHANGELOG.