Interoperability roadmap

On June 18 2020, MDAnalysis was pleased to release the first major version, 1.0.0. As described in our 2019 roadmap, this is the last version that supports 2.7. We will continue backporting relevant bug fixes where feasible (e.g. the upcoming 1.0.1), but the next major release will be 2.0.0, which will support Python 3.6+.

As we look forward to this next milestone, it is time to consider the next directions of MDAnalysis. The development of MDAnalysis has always been driven by the growing need for standardised, accessible analysis tools for open, reproducible, and collaborative research. While many major packages for molecular dynamics simulation provide their own set-up and analysis software, these are necessarily targeted to their own particular standards. MDAnalysis aims to provide analysis tools for simulation data in general, so historically a key objective has been to expand the number of supported package-specific data formats. As of version 1.0.0, we support over 40 file formats used in major packages for both molecular dynamics and quantum chemistry.

In 1.0.0 we also began to explore an exciting new approach: direct interoperability with other popular packages for molecular analysis by becoming API compatible instead of just file-format compatible, an approach also reinforced by discussions at the 2019 MolSSI Workshop: Molecular Dynamics Software Interoperability. Our new converters are distinct from topology parsers and coordinate readers as a third avenue for loading data into MDAnalysis. In 1.0.0 we added converters for two libraries: the molecular editor ParmEd, and chemfiles, a library for reading data from computational chemistry formats.

The general lack of interoperability between software packages in the molecular modelling community has been highlighted in the 2019 report of the NSF MolSSI on Molecular Dynamics Software Interoperability, noting consequences such as great duplication of effort in developing and maintaining similar tools across different formats; significant barriers to collaborating and transferring data; and requiring scientists to learn multiple packages and languages to access the full breadth of available analysis algorithms.

Moving forward, our plan is to increase the range of analyses and formats accessible to users by becoming interoperable with other relevant libraries. This reduces the need to duplicate and support existing tools within our own framework, and allows MDAnalysis to become a general-purpose analysis toolkit. We are already in the process of expanding compatible libraries in 2.0.0 by adding support for the widely popular RDKit cheminformatics toolkit through a Google Summer of Code projects being carried out by Cédric Bouysset (@cbouy).

By the end of 2021, we aim to have expanded the range of our Converters framework to include packages in three categories: widely-used analysis libraries, such as MDTraj and pytraj; libraries that can expand the range of formats we can support, such as OpenBabel; and direct interfaces with computational chemistry engines such as OpenMM and Psi4.

Ensuring robust interoperability is best done as a community effort. If you are interested in contributing, or have comments or suggestions on our future directions, please get in touch!


Google Summer of Code Students 2020

We are happy to announce that MDAnalysis is hosting three GSoC students this year – @hmacdope, @cbouy, and @yuxuanzhuang. This is the first year that MDAnalysis has been accepted as its own organization with GSoC and we are grateful to Google for granting us three student slots so that we can have three exciting GSoC projects.

Hugo MacDermott-Opeskin: Trajectory New Generation: the trajectory format for the future of simulation

Hugo MacDermott-Opeskin

Trajectory storage has always proved problematic for the molecular simulation community, as large volumes of data can be generated quickly. Traditional trajectory formats suffer from poor portability, large file sizes and limited ability to include metadata relevant to simulation. The Trajectory New Generation (TNG) format developed by the GROMACS team represents the first trajectory format with small file sizes, metadata storage, archive integrity verification and user/software signatures. The primary goal of this project is for @hmacdope to refactor the existing TNG code into C++ to provide clarity and usability for GROMACS, other simulation packages and analysis tools. Thin FORTRAN and Python layers are also desirable to encourage widespread adoption and are a secondary goal of the project. An efficient and transferable implementation of the TNG format will represent a major step forward for the computational molecular sciences community, enabling easy storage and replication of simulations.

This project is a collaboration with the GROMACS developer team with @acmnpv from GROMACS serving as a co-mentor.

Hugo MacDermott-Opeskin is a PhD student in computational chemistry at the Australian National University. His work focuses on studying membrane biophysics through molecular dynamics simulations coupled with enhanced sampling techniques. Hugo can be found on github as @hmacdope and on twitter as @hugomacdermott. When not hard at work Hugo can be found running or mountain biking in the Canberra hills.

Through GSoC Hugo aims to bring the TNG next generation trajectory format to the simulation community and he will document his experience at his “Biophysics Bonanza” blog.

Cédric Bouysset: From RDKit to the Universe and back

Cédric Bouysset

The aim of the RDKit interoperability project is to give MDAnalysis the ability to use RDKit’s Chem.Mol structure as an input to an MDAnalysis Universe, but also to convert a Universe or AtomGroup to an RDKit molecule. RDKit is one of the most complete and one of the most commonly used chemoinformatics package, yet it lacks file readers for formats typically encountered in MD simulations. @cbouy will implement in MDAnalysis the ability to switch back and forth between a Universe and an RDKit molecule to perform typical chemoinformatics calculations and so add a lot of value to both packages.

Cédric is a PhD student in molecular modelling at Université Côte D’Azur, France. His research aims to decipher the molecular basis of chemosensory perception (smell and taste) using computational tools. His day-to-day work includes; modelling bitter taste receptors, building machine-learning models to search for molecules with interesting olfactive or sapid properties, maintaining the website of the Global Consortium of Chemosensory Researchers, and a bit of teaching. In his free time he enjoys cooking and playing video games. Cédric can be found on github as @cbouy and on twitter as @cedricbouysset.

Cédric will describe his progress in his blog.

Yuxuan Zhuang: Serialize Universes for parallel

Yuxuan Zhuang

As we approach the exascale barrier, researchers are handling increasingly large volumes of molecular dynamics (MD) data. Whilst MDAnalysis is a flexible and relatively fast framework for complex analysis tasks in MD simulations, implementing a parallel computing framework would play a pivotal role in accelerating the time to solution for such large datasets. To achieve a flawless implementation of parallelism, @yuxuanzhuang will implement serialization support for Universe, the core of MDAnalysis. Furthermore, he will adapt this new serialization functionality to accelerate MDAnalysis’ analysis modules using distributed computing frameworks, e.g. Dask, multiprocessing, or MPI.

Yuxuan is a PhD student at Stockholm University. He mainly works on understanding pentameric ligand-gated ion channels from MD simulations. His daily workflow involves setting up and running simulations, on lab clusters or HPC centers, and performing various analyses on the MD trajectories in his jupyter notebook. Yuxuan can be found on github as @yuxuanzhuang.

Yuxuan will chronicle his work on his blog.

@richardjgowers @IAlibay @acmnpv @fiona-naughton @orbeckst (mentors)

GSoD 2019: The New User Guide

GSoD with MDAnalysis The inaugural Google Season of Docs 2019 has wrapped up. Google sponsored a technical writer to work with an open source project to work on their documentation. MDAnalysis was one of the GSoD projects with technical writer @lilyminium.

She successfully completed her project A user guide structured by topic. She shared her thoughts in her blog post Project report: A user guide for MDAnalysis.

Quick Start Guide

Especially for new users, @lilyminium created the new Quick Start Guide, which is now the recommended first tutorial when learning MDAnalysis.

Screenshot of the new Quick Start Guide

User Guide

The new User Guide is meant to make it easy for all users to quickly become productive with MDAnalysis.

It starts with a Getting Started section with installation instructions, examples, the Quick Start Guide, and a FAQ. A discussion of the key data structures follows because understanding how to work with Universe and AtomGroup is fundamental to MDAnalysis. A section on selections explains how to create AtomGroups. The next chapters explain working with trajectories (including the new on-the-fly transformations) and general input/output. Most analysis classes are described and explained with examples, making the analysis section especially useful for anyone who “quickly wants to run analysis X” on their own trajectories.

The User Guide also documents a number of important internals and usage patterns as well as the development process, which makes it a key reference for intermediate users and developers.

As one seasoned core developer said: “Amazing, reading this I can still learn new things about MDAnalysis!”

Screenshot of the new User Guide

You can already see the pre-1.0 version of the new User Guide on our website; an expanded version of the User Guide will be released together with the upcoming 1.0 release of MDAnalysis.

More to come…

Furthermore, the new MDAnalysis docs will follow the layout and style of the User Guide.

Finally, @lilyminium will continue working with MDAnalysis as our newest MDAnalysis Core Developer!

@richardjgowers, @orbeckst