Blog

MDAnalysis Command Line Interface (CLI)

The Logo of the MDAnalysis command line interface

We are happy to announce the first version of a command-line interface (CLI) of MDAnalysis! The CLI allows users to calculate radial distribution functions (RDF), Root-mean-square deviations (RMSD), and many more directly from the command line. For example, extracting the RDF between the water oxygens is just one line of code

mda interrdf -s topol.tpr -f traj.trr -g1 "name OW" -g2 "name OW"

For a detailed synopsis and a list of all available modules check the sections below as well as the documentation or the repository.

Providing easier access to the library for day-to-day analysis, mdacli is not only for the MDAnalysis community but also perfect for non-Python users. The project evolved from the experiences of @joaomcteixeira in TaurenMD and from @PiCoCentauri in MAICoS.

Installation

Install mdacli from PyPI

pip install mdacli

A Conda build will be released soon.

Examples

To use mdacli, after installation open your terminal and run

mda -h

This command provides a list of all available modules similar to the in the documentation. The synopsis for calculating a radial distribution function (RDF) between two groups is given by

mda interrdf -h

A sample water trajectory is provided of rigid SPC/E water is provided testfiles. topol.tpr is a GROMACS topolgy file and traj.trr the corresponding trajectory. The oxygen-oxygen RDF can be calculated using

mda interrdf -s topol.tpr -f traj.trr -g1 "name OW" -g2 "name OW"

The oxygens are selected with the -g1 and -g2 flags. More verbose output is achieved by using the -v flag. Even more, information is provided with the --debug flag. All warnings of the run can also be directly stored to a log file using the --logfile flag

mda --debug --logfile rdf.log interrdf -v -s topol.tpr -f traj.trr -g1 "name OW" -g2 "name OW"

The results of each analysis are stored by default in the current directory. The output directory can be changed with the -o flag and an additional prefix can be set using the -pre flag.

The results of the RDF calculations are two .csv files. The actual RDF is saved in the 2nd and 3rd columns of InterRDF_count_bins_rdf.csv. The header rows of each file provide information about the stored data. Simple results such as bare numbers or strings are stored as JSON dumps. More complex data such as 4 or higher-dimensional arrays are saved as a bunch of CSV files zipped together.

Contributions

Contributions are always welcome on Github. Not only in the form of code contributions via pull requests but also issues, especially since the project is in an early development phase.

Motivation and Background

The CLI was developed since we were facing the same problem in the labs regularly. New students/scientists start writing their analysis scripts and face the same challenges and problems i.e.

  • How to initialize the universe and loop through frames without copying many lines of code?
  • How to write a CLI parser to analyze several of their simulations?
  • How to process and save their trajectories in a clever way?

Some of these problems can be solved by using the MDAnalysis.analysis.base.AnalysisBase class of MDA. However, this class is limited to Python. A generic CLI wrapper for all classes based on the AnalysisBase could therefore help people to analyze their simulation data with the least effort. With this approach, it is easier to use for MDA-users since they just stay within their known universe with known selection commands and results structures. An existing framework makes it also more attractive for users and developers to write their analysis using the base.AnalysisBase. For more technical details visit the documentation.

@PicoCentauri and @joaomcteixeira

GSoC 2021 wrap-up

Google Summer of Code 2021

Another year, another successful Google Summer of Code! We had two fantastic projects this year, including the first trials of our MDAKit model for building an ecosystem of analysis modules. Both projects were a resounding success and represent an exciting new direction for MDAnalysis. None of this would have been possible without our wonderful students: Estefania Barreto-Ojeda (@ojeda-e), and Orion Cohen (@orioncohen).

You can read more about their projects and packages in their blog posts:

We have immensely enjoyed working with Estefania and Orion and look forward to seeing how they continue to contribute to the MDAnalysis community, and where they take their science next.

— Mentors @hmacdope, @lilyminium, @richardjgowers, @fiona-naughton, @orbeckst, @IAlibay

GSoC Report: Solvation Analysis

Motivation

My research seeks to discover new materials for battery electrolytes. In particular, I use molecular dynamics and ab initio methods to computationally probe new electrolyte compositions. In my work, a core challenge is scaling up. A molecular dynamics trajectory is a rich and complex data structure, thorough analysis requires interactive visualization and data labeling. Automating this sort of interactive analysis is difficult, but to scale up to hundreds or thousands of electrolytes, it needs to be done. workflow

I applied for Google Summer of Code because I felt that MDAnalysis was the best available tool for the automated analysis of MD trajectories. My aim was to build an abstraction around solvation that would make writing automated analysis methods much easier. Over the past few months, I’ve done that. I am now working to refine the API and implement new methods to extract more information from MD simulations. In time, this will fit into a broader workflow that will enable the autonomous exploration of novel electrolytes.

Retrospective

At the outset of my GSoC project, I hoped to create a package with:

  • Functions for selecting the solvation shells of ions.
  • Automated analysis of ion coordination environment.
  • Documentation, testing, and well-written code.

After 10 weeks of working with the MDAnalysis team, the solvation_analysis package vastly exceeds that original intent. There is a robust and simple API, the functionality generalizes to non-ionic systems, solvation distances are automatically identified, and it’s really fast. Further, I’ve learned a huge amount about software development, the package release cycle, and the world of open-source.

In this post, I will summarize my contributions to solvation_analysis and several features of the package.

My Summer in Pull Requests

Template for a snazzy package

PR #7 and #20

My summer began with a duo of pull requests. PRs #7 and #20 filled out the template generated by the MolSSI cookiecutter, creating a fashionable scaffold that my code could fit into. In essence, the PRs set up the trappings of a real package, assuring users (and myself) of a certain core competence in the development.

Testing, testing, one-two-three

PR #6, #16, and #17

Robust testing is part of a finished package. PR #6 introduced the initial code base and some rudimentary testing. In PR #16, I started using pytest, which made writing and using tests much easier. Finally, PR #17 integrated testing with GitHub CI, finalizing a critical component of my development workflow.

Cutting off radial distribution functions

PR #25

PR #25 was a science challenge in addition to a software challenge. I needed to robustly identify the first minima of noisy RDF data so I could automate the solvation analysis. My approach was to perform a polynomial interpolation of the RDF and then take the 2nd-derivative to find the minima. This worked well for smooth RDFs but failed for RDFs with noise. As a quick fix, I added in several heuristics to get the solvation distances identified on my noisy test data. For now, the RDF parser works reasonably well for smooth data and warns users when it fails. An interpolation of an RDF is shown below.

interpolation

Analysis: the mega-PR

PR #29

PR #29 is responsible for the majority of functionality in solvation_analysis. It implemented the Solution class and all of the analysis functions, reaching 250 comments from myself and my mentors. Much of the discourse focused on iterating through API designs, which eventually led to a very clean Solution-centered package. The underlying data structure uses the ‘tidy data’ concept, implementing a tidy Pandas.DataFrame that allows analyses to be easily vectorized. As shown below, the DataFrame is indexed by step in the analysis (frame), the solvated_atom from the solute, and the solvent atoms atom_id. This solution provides a unique identifier for each atom participating in solvation.

solvation dataframe

In pursuit of user number one

PR #36

PR #36 introduced a tutorial to walk users through the basic functionality of solvation_analysis and how to respond to common errors. If you are interested in using the solvation_analysis, this is definitely the best place to start!

Release and versioning

PR #37

I was shocked by how easy it is to release a package to PyPI. For this PR, I learned to use versioneer and twine for Python release management. With just a few terminal commands and file edits, I was able to set up a functioning release management system and put out version 0.1.1.

Solvation Analysis in Practice

Since solvation analysis is up on PyPI, it can be installed with a pip!

>>> pip install solvation_analysis

As in any MDAnalysis workflow, we start by importing the package and instantiating an MDAnalysis Universe.

# imports
import MDAnalysis as mda

# define paths to data
data = "../solvation_analysis/tests/data/bn_fec.data"
traj = "../solvation_analysis/tests/data/bn_fec_short_unwrap.dcd"

# instantiate Universe
u = mda.Universe(data, traj)

Next, we need to define the solute and solvents of interest. The example system is a LiPF6 electrolyte with BN and FEC as solvents. Here, we treat the Li+ ion as the solute and all other molecules as solvents.

# define solute AtomGroup
li_atoms = u.atoms.select_atoms("type 22")

# define solvent AtomGroups
PF6 = u.atoms.select_atoms("byres type 21")
BN = u.atoms.select_atoms("byres type 5")
FEC = u.atoms.select_atoms("byres type 19")

Once we have the universe, solute, and solvents defined, it is quite simple to instantiate and run a solution! Solution, like all MDA analysis classes, subclasses the AnalysisBase class and requires a Solution.run() call to initiate the analysis.

# instantiate solution
from solvation_analysis.solution import Solution

solution = Solution(li_atoms, {'PF6': PF6, 'BN': BN, 'FEC': FEC})

solution.run()

Over the course of a few seconds, the solvation radii and the solvation shell of each ion are automatically identified and stored in a pandas.DataFrame (as described above). We can then examine how well the solvation radii were characterized by calling solution.plot_solvation_radius('<<solvent name>>'). For example,

# we just need this to display our plot
import matplotlib.pyplot as plt

solution.plot_solvation_radius('BN')
plt.show()

bn plot

Now that we have a completed solution, we can easily print the ion pairing, coordination numbers, and solvation radii.

>>> solution.pairing.pairing_dict
{'BN': 1.0, 'FEC': 0.21, 'PF6': 0.12}
>>> solution.coordination.cn_dict
{'BN': 4.33, 'FEC': 0.25, 'PF6': 0.12}
>>> solution.radii
{'PF6': 2.60, 'BN': 2.61, 'FEC': 2.43}

We can see that the Li+ ions coordinate most strongly with BN. The pairing shows that 100% of Li+ are coordinated with BN and on average there are 4.33 BN coordinated to each Li+. Similar information is available for FEC and PF6.

We can also use solvation_analysis and a simple Pandas operation to identify the most common solvation shell compositions and find examples of them for visualization. Below, we are using the speciation class to find the most common compositions of solvation shells and then filtering out all shells with a frequency less than 2%.

solution.speciation.speciation_percent.query("count > 0.02")

speciation dataframe

We can pick out a specific examples of that shell for visualization. First we find all shells with a given composition.

# find all shells matching the given dictionary
solution.speciation.find_shells({'BN': 4, 'FEC': 0, 'PF6': 1})

selection dataframe

Then we select the solvation shell and use our favorite visualization package to view it!

# get the shell atoms
atoms = solution.solvation_shell(6, 0)

# visualization is not covered here. I suggest nglview!
visualize(atoms)

shell visualization

Using that approach, you can find all the common solvation shells and generate visualizations for each. For example, see this speciation plot of the BN:FEC system from the example. I’ve plotted the fraction of each solvation shell at two temperatures and then visualized examples of those shells in the margin.

speciation

Together, the tools described above are a powerful and convenient workflow for analyzing the solvation structure of a liquid. Several analyses are pre-implemented and the core solvation data structure makes it easy to create new methods. I sincerely hope that it is useful to the community!


The Future

PR #39 and #40

While GSoC has ended, development of solvation_analysis has not. PR #39 adds solvent correlation analysis, statistics on uncoordinated solvents, and a new Valency class. PR #40 introduces a tutorial on interactive visualization. These are all features I am excited for and look forward to developing. As a teaser, I’ll include a correlation plot below:

correlation


Acknowledgements

As a whole, the MDAnalysis was incredibly welcoming and helpful. I am genuinly inspired by the dedication and competence of the core developers of the package. I am especially grateful for the help of @hmacdope (Hugo), @richardjgowers (Richard), and @IAlibay (Irfan), who were excellent mentors and are generally talented and kind individuals.

I am also thankful for Google’s generous support of open source software.

It has been an amazing ride 🎢

@orioncohen