Blog

On-the-fly transformations

On-the-fly transformations have been introduced in version 0.19.0 of MDAnalysis. This feature is part of @davidercruz ‘s Google Summer of Code 2018 project and brings to MDAnalysis a whole new level of functionality, allowing for new and more efficient workflows when analyzing and visualizing simulation trajectories. The documentation for these new functions can be found in the docs for MDAnalysis.transformations

Why do we need transformations?

When visualizing and analyzing trajectories from molecular dynamics simulations, some prior modifications are often required. Examples of the most usual modifications or transformations are removing artifacts from periodic boundary conditions, which cause some issues with some molecular viewers (PyMol for example), removing the rotation and translation of a particular molecule and/or centering it in the unit cell, which helps focus on the its actual conformational changes by removing their natural movement in solution. These transformations help us better identify patterns in the behavior of our biological systems, and, more importantly, show them to the world.

The advantage of using MDAnalysis for trajectory transformations

Many simulation packages often contain tools to transform and analyze trajectories, such as Gromacs gmx trjconv command. However, most of the times, the user is required to apply all the intended transformations to the whole trajectory (or the portion of interest) prior to visualization and analysis. This often requires processing huge files, sometimes more than once. Moreover, some tools such as trjconv do not support frame indexing for the most popular trajectory formats, requiring iterating over frames that are not needed for that particular analysis. Trajectory transformations in MDAnalysis, on the other end, have one great advantage - they are performed on-the-fly for each frame that is read. Transformations are added to a universe as a transformation workflow containing one or more transformations. The API also makes it easy to add new transformations for your own projects. Another things that really makes the “on-the-fly” aspect of the MDAnalysis transformations shine is coupling it to a visualization widget such as NGL Viewer.

Using MDAnalysis transformations

Now it’s time to learn how to use the trajectory transformations in MDAnalysis. During the following steps, we will apply some transformations on a 1 ns trajectory of a simple 19-residue peptide embeded in a 128-DMPC membrane, showing the Gromacs gmx trjconv command and the equivalent MDAnalysis code and output. To keep things lightweight, frames are were taken every 100 ps, and water molecules were removed. This can be easily done with MDAnalysis.

Preparation: Example trajectory

We get our example trajectory from MDAnalysisData.membrane_peptide:

import MDAnalysis as mda
import MDAnalysisData

peptide = MDAnalysisData.datasets.fetch_membrane_peptide()
u = mda.Universe(peptide.topology, peptide.trajectory)
u.transfer_to_memory(step=10)

The above commands will download the peptide.topology (a Gromacs TPR file named “memb_pept.tpr”) and the peptide.trajectory “memb_pept.xtc” in XTC format.

The original trajectory has 1000 frames but for making the visualizations in this post shorter, we will only keep every 10th frame by using an in-memory representation (see Universe.transfer_to_memory()); when trying these examples yourself you can omit the line u.transfer_to_memory(step=10). In the following we just write

u = mda.Universe(peptide.topology, peptide.trajectory)

Visualization

We use nglview for visualizing our trajectory in the jupyter notebook. In all cases we add a unit cell representation and rotate the view with commands such as

import nglview as nv
import numpy as np

view = nv.show_mdanalysis(u)
view.add_unitcell()
view.control.rotate(
    mda.lib.transformations.quaternion_from_euler(
        -np.pi/2, np.pi/3, np.pi/6, 'rzyz').tolist())
view.control.zoom(-0.3)
view

but for simplicity, in the following we only write

nv.show_mdanalysis(u)

The movies were rendered as animated GIFs with

from nglview.contrib.movie import MovieMaker
movie = MovieMaker(view, fps=24, output='movie.gif')
movie.make()

Example 1: making everything whole again

When performing MD simulations using periodic boundary conditions, molecules will often cross the limits of the unit cell. When this happens, some atoms of the molecule will show up on the the opposing side of the unit cell and some molecular viewers will show stretched bonds and other visual artifacts depending on the visual representation of the system. This is the case of our system. Without any modifications, when we look at the trajectory of our system, things become more cluttered and confusing:

import warnings
warnings.filterwarnings('ignore') # nglview is missing some PDB-only attributes and complains 

import MDAnalysis as mda
import nglview as nv

u = mda.Universe(peptide.topology, peptide.trajectory)

nv.show_mdanalysis(u)

raw trajectory

Using trjconv, one way to make every molecules whole again would be:

gmx trjconv -f pept_in_memb.xtc -s pept_in_memb.tpr -pbc mol -o output.xtc

In MDAnalysis this can be done with the unwrap transformation, which takes an AtomGroup as argument. This can be done as follows:

from MDAnalysis import transformations

# a custom atom group can be passed as an argument. In this case we will use all the atoms
# in the Universe u
u = mda.Universe(peptide.topology, peptide.trajectory)

# we define the transformation
workflow = [transformations.unwrap(u.atoms)]

Now that we have a workflow - in this case it is only a single transformation - we add it to the trajectory object so it can be applied in each frame that we want to read.

u.trajectory.add_transformations(*workflow)

If we want to, we can do other things with the trajectory without having to generate a new file with the transformed trajectory.

This is how our trajectory looks like:

nv.show_mdanalysis(u)

unwrapped trajectory

As you can see, the artifacts caused by the atoms crossing the boundaries of the unit cell are now gone.

Example 2: what if we also want to center the peptide in the unit cell?

In that case, using trjconv we would do something like this:

gmx trjconv -f pept_in_memb.xtc -s pept_in_memb.tpr -pbc mol -center -o output.xtc

And we choose Protein as the group to be centered.

In MDAnalysis we use the center_in_box transformation. As the name says, this transformation will move all the atoms of the frame, so that a given AtomGroup is centered in the unit cell. center_in_box takes an AtomGroup as a mandatory argument. Optional arguments include weights, which is used to calculate the weighted center of the given AtomGroup (if weights=’mass’ then the center of mass is calculated), center_to which is used when the user needs to center the AtomGroup in a custom point instead of the center of the unit cell, and wrap which, if True, causes all the atoms of the AtomGroup to be moved to the unit cell before calculating the weighted center.

You can see that the transformations workflow below has three steps:

  • make everything molecule whole again with unwrap ;
  • center the protein in the unit cell with center_in_box - this causes some of the phospholipids to fall outside the unit cell ;
  • shift the molecules (compound='fragments') back to the unit cell using wrap

This is how it looks:

u = mda.Universe(peptide.topology, peptide.trajectory)
prot = u.select_atoms("protein")
ag = u.atoms
# we will use mass as weights for the center calculation
workflow = (transformations.unwrap(ag),
                   transformations.center_in_box(prot, center='mass'),
                   transformations.wrap(ag, compound='fragments'))
u.trajectory.add_transformations(*workflow)
nv.show_mdanalysis(u)

centered and unwrapped trajectory

Example 3: what if we want to do a fitting of the protein?

Fitting is useful when processing trajectories for visualization and analyses - it removes the translations and rotations of the molecule, allowing us to have a better look at the structural changes that happen in our simulations. If we want to do this using trjconv we would do have to do this in two steps:

gmx trjconv -f pept_in_memb.xtc -s pept_in_memb.tpr -pbc mol -center -o midstep.xtc
gmx trjconv -f midstep.xtc -s pept_in_memb.tpr -fit rot+trans -o output.xtc

And we choose Protein as the group to be centered and for the least squares fitting.

In MDAnalysis we just add another transformation to our workflow - fit_rot_trans. This transformation takes the AtomGroup to be fitted as argument, an AtomGroup to be used as reference and, by default, it behaves just as the option -fit rot+trans. If given a plane argument, the fitting is performed on a given plane. If plane=xy then the transformation will behave as -fit rotxy+transxy, but the xz and yz planes are also supported. Just as in center_in_box, a weights argument can be passed to the function, and it will dictate how much each atom of the molecule contributes to the least squares fitting. Here’s what the workflow looks like:

u = mda.Universe(peptide.topology, peptide.trajectory)
prot = u.select_atoms("protein")
# we load another universe to define the reference
# it uses the same input files, but this doesn't have to be always the case
ref_u = u.copy()
reference = ref_u.select_atoms("protein")
ag = u.atoms
workflow = (transformations.unwrap(ag),
                   transformations.center_in_box(prot, center='mass'),
                   transformations.wrap(ag, compound='fragments'),
                   transformations.fit_rot_trans(prot, reference))
u.trajectory.add_transformations(*workflow)
nv.show_mdanalysis(u)

fitted on protein and unwrapped trajectory

It looks a bit confusing with the membrane so we can also look at only the protein

view = nv.show_mdanalysis(prot)
view.w.add_line()
view

fitted on protein and unwrapped trajectory (protein only)

This transformation is good when we want to see how the conformation of the protein evolves with time.

But, in this case, we also have a membrane. How does the protein behave in the membrane? Doing a least squares fitting in the xy plane can help us have a better look. Here’s how it goes:

u = mda.Universe(peptide.topology, peptide.trajectory)
prot = u.select_atoms("protein")
ref_u = u.copy()
reference = ref_u.select_atoms("protein")
ag = u.atoms
workflow = (transformations.unwrap(ag),
                   transformations.center_in_box(prot),
                   transformations.wrap(ag, compound='fragments'),
                   transformations.fit_rot_trans(prot, reference, plane='xy', weights="mass"))
u.trajectory.add_transformations(*workflow)

For the visualization we will hide the lipid tails and only indicate the phosphorous atoms:

protein_P = u.select_atoms("protein or name P")
view = nv.show_mdanalysis(protein_P)
view.add_line()
view

fitted on protein in x-y plane and unwrapped trajectory

This transformation keeps the membrane horizontal, while the protein rotation in the z-axis is removed, and it becomes particularly useful when observing protein insertion.

Example 4: I want to do my own transformations…

The beauty of MDAnalysis transformations is the ability to easily create custom transformations. All transformations must have the following structure:

def custom_transform(args): # arguments at this point are not mandatory
    #do some things
        
    def wrapped(ts): 
        # This wrapped function must only take a Timestep as argument
        # and perform the actual changes to the timestep
        
        return ts
        
    return wrapped

Let’s create one here:

def up_by_2():
    def wrapped(ts):
        # here's where the magic happens 
        # we create a numpy float32 array to avoid reduce floating
        # point errors
        ts.positions += np.asarray([0,0,20])
        return ts
    return wrapped

Now lets add our transformation to a workflow.

import numpy as np
u = mda.Universe(peptide.topology, peptide.trajectory)

# loading another universe to better see the changes made by our transformation
previous = u.copy()
# making the unmodified universe whole accross the trajectory
previous.trajectory.add_transformations(mda.transformations.unwrap(previous.atoms))

ag = u.atoms

workflow = (transformations.unwrap(ag),
                   up_by_2())
u.trajectory.add_transformations(*workflow)

All atoms in the u Universe are shifted up by 20 angstroms but this does not look much different from what we have seen before. So let’s do something more interesting and just move a selection such as the peptide.

The transformations can accept arguments. Let’s modify up_by_2 so that only the peptide is translated in the z coordinate:

def protein_up_by_2(ag):
    def wrapped(ts):
        # here's where the magic happens 
        # we create a numpy float32 array to avoid reduce floating
        # point errors
        ag.positions += np.asarray([0,0,20])
        return ts
    return wrapped

We’ll add the new transformation to the workflow and see what happens.

u = mda.Universe(peptide.topology, peptide.trajectory)
ag = u.atoms
prot = u.select_atoms("protein")
workflow = (transformations.unwrap(ag),
                   protein_up_by_2(prot),
                   transformations.wrap(ag, compound='fragments'))
u.trajectory.add_transformations(*workflow)
nv.show_mdanalysis(u)

peptide translated by 20 Å upwards

The two examples of custom transformations shown here are very simple. But more complex things can be done, and we encourage you to try them!

Final remarks

These transformations are a new feature in MDAnalysis and some transformations such as wrap/unwrap are still comparatively slow but this post should have given you some ideas what you will now be able to do. In particular, one can transform a trajectory before any analysis code sees it so one could implement trajectory smoothing or projections and then directly analyze the pre-processed trajectory without having to write any intermediate files or change any of the existing analysis functions.

Transformations behave differently when used with “out of core” trajectories (the normal approach in MDAnalysis, where each trajectory frame is read from disk into memory when needed) and “in core” trajectories (generated with Universe.transfer_to_memory(), also known as the “MemoryReader”). For on-disk trajectories, the transformations are performed whenever a frame is read from disk. For in-memory trajectories, the transformations are applied once to and the modified trajectory is stored in memory. Therefore, in-memory trajectories with transformations can appear to take a long time to load because all calculations are done immediately.

This has been a quick demonstration of the power of the new on-the-fly transformations of MDAnalysis. There are more transformations available for you to explore and a whole lot more for you to create for your own molecular system. More information on trajectory transformations can be found in the online docs of MDAnalysis.

@davidercruz

Google Summer of Code 2020

Google Summer of Code 2020

MDAnalysis has been accepted as an organization into Google Summer of Code 2020! If you are a student who is interested in working with us this summer, please read the advice and links below and write to us on the mailing list.

We are looking forward to all applications from interested students (undergraduate and postgraduate).

The application window deadline is March 31, 2020 at 18:00 (UTC). As part of the application process you must familiarize yourself with Google Summer of Code 2020.

If you are interested in working with us please read on and contact us on our mailing list. Apply as soon as possible; the application window opens on March 16, 2020.

Project Ideas

If you have your own idea about a potential project we’d love to work with you to develop this idea; please write to us on the developer list to discuss it there.

We also have listed several possible projects for you to work on, on our wiki. Our initial list of ideas (see summaries in the table below) contains 6 projects at different levels of difficulties and with different skill requirements. However, check the ideas page — we might add more ideas after the posting date of this post.

You don’t need to have all the skills that we are listing, although that helps, of course. But you need to demonstrate to us that you’re able and keen to learn anything that you don’t know yet, and we will be happy to help you learn during your project with us.

project name difficulty description skills mentors
1 Molecular volume and surface analysis easy use an existing package for molecular surface area calculations to build a new analysis module Python, MDAnalysis.analysis @orbeckst @IAlibay @richardjgowers
2 Command Line Interface (CLI) easy create a new package with MDAnalysis-based command line tools Python, bash, Python packages, CI, MD @PicoCentauri @orbeckst @fiona-naughton
3 Interoperability with RDKit medium add capability to MDAnalysis to use the RDKit API to convert data structures between MDAnalysis and RDKit MDAnalysis, RDKit, Python, C++ (?) @richardjgowers @IAlibay
4 Improved atom selections medium replace the selection parsing code with a more flexible parser Python @orbeckst, @IAlibay, @fiona-naughton
5 Serialize Universes for parallel computation medium/challenging make the central Universe data structure serializable with Pickle to enable simple parallelization with Dask or MPI Python, MDAnalysis I/O, task-based parallelization @richardjgowers, @orbeckst, @IAlibay, @fiona-naughton
6 Implement TNG support challenging write Cython bindings to the TNG library and write classes to bring this information into MDAnalysis Cython, C, MD @richardjgowers

Information for Students

You must meet our own requirements if you want to be a student with MDAnalysis this year (read all the docs behind these links!). You must also meet the eligibility criteria.

The MDAnalysis community values diversity and is committed to providing a productive, harassment-free to every member. Our Code of Conduct explains the values that we as a community uphold. Every member (and every GSoC student) agrees to follow the Code of Conduct.

As a start to get familiar with MDAnalysis and open source development you should follow these steps:

Complete the Quick Start Guide

We have a Quick Start Guide explaining the basics of MDAnalysis. You should go through it at least once to understand how MDAnalysis is used. Continue reading the User Guide to learn more.

Introduce yourself to us

Introduce yourself on the mailing list. Tell us what you plan to work on during the summer or what you have already done with MDAnalysis.

Close an issue of MDAnalysis

You must have at least one commit in the development branch of MDAnalysis in order to be eligible, i.e. you must demonstrate that you have been seriously engaged with the MDAnalysis project. We have a list of easy bugs and suggested GSOC Starter issues to work on in our issue tracker on GitHub. We also appreciate if you write more tests or update/improve our documentation.

We recommend you start your application by working on an issue. It will give you a better understanding of MDAnalysis as a project and improve the quality of your application.

To start developing for MDAnalysis have a look at our guide on contributing to MDAnalysis and write to us on the mailing list if you have more questions about setting up a development environment or how to contribute.

Updates to this announcement

  • 2020-03-04: We also have a GSoC FAQ that answers questions that come up repeatedly. The FAQ is being updated with new questions and answers as we are moving towards the application deadline.

@orbeckst, @richardjgowers, @IAlibay, @PicoCentauri, @fiona-naughton, @kain88-de

MDAnalysis is a NumFOCUS Sponsored Project!

NumFOCUS Sponsored

MDAnalysis is now a NumFOCUS Sponsored Project! Let us explain who MDAnalysis and NumFOCUS are and what being a sponsored project means:

MDAnalysis is an open source software project that produces the MDAnalysis Python library for the analysis of computer simulations of many-body systems at the molecular scale, spanning use cases from interactions of drugs with proteins to novel materials. It is widely used in the scientific community and is written by scientists for scientists. The MDAnalysis Project is represented by the MDAnalysis Core Developers.

NumFOCUS is an organization that promotes open practices in research, data, and scientific computing by serving as a fiscal sponsor for open source projects and organizing community-driven educational programs. In brief, they provide financial and logistic infrastructure to many open source projects that fulfill important roles in their communities. NumFOCUS is a 501(c)(3) public charity in the United States and it is able to accept donations for and on-behalf of projects and to act as a grantee.

NumFOCUS will accept funds on behalf of the MDAnalysis Project and then make them available to us. By becoming a fiscally sponsored project, we agreed that all software produced under the MDAnalysis umbrella is and will be free Open Source software and that any funds will be spent compliant with NumFOCUS’ tax-exempt status: what this means in practice is that all funds will go to further the growth and well-being of the project.

So if you want to donate to MDAnalysis: you can now do this easily by clicking this button:

Donate Now

More importantly, with NumFOCUS as a partner, MDAnalysis gains a long-term perspective for stable development that becomes less dependent on individuals and academic institutions. It allows the project to obtain its own funding and move in directions that best benefit its community. For example, it will become easier to organize workshops, arrange for developers to travel, and to hire developers.

For users and developers, the way the MDAnalysis Project operates will not change: it remains a friendly and respectful community that welcomes everyone’s contributions and that is committed to producing good software that helps us all do interesting science. But together with NumFOCUS, we will be able to do more than before… stay tuned!

@MDAnalysis/coredevs