Blog

Luna Morrow GSoC 2024 Final Submission Blog

I can’t believe I am at the end of GSoC! This past 6 months has absolutely flown by, but I have learnt so much. I am really grateful for the amazing support I have had along the way from my mentors Hugo ( @hmacdope ), Cédric ( @cbouy ) and Xu ( @xhgchen ). My body of work can be located on the MDAnalysis GitHub at mda-openbabel-converter.

Why a converter

Direct interoperability between molecular dynamics software is critical for enabling collaboration, data transfer and straightforward use by scientists. OpenBabel is a popular toolbox for chemical molecular modelling research as it enables searching, conversions, analysis and data storage. The ability to interconvert chemical file formats with OpenBabel, in particular, opens up the ability to utilise and work with other packages, as OpenBabel enables input and writing of over 100 chemical data file formats. Therefore, enabling an MDAnalysis Universe to be inter-converted with an OpenBabel OBMol would greatly increase the data formats available to MDAnalysis. Furthermore, it would encourage greater adoption of MDAnalysis as an “all-in-one” package for molecular dynamics analysis.

From OpenBabel to MDAnalysis

The first aim of my GSoC project was to convert OpenBabel OBMols to MDAnalysis Universes. This required the extraction of both molecule and atom data, alongside their 3D positions. I hit many roadblocks in this section, with the two most difficult being the installation of OpenBabel 3.1.1.1 Python bindings and understanding some of the more complex OpenBabel API. Around half of the coding period was spent writing the two classes for converting the atom attributes and positions respectively. Time was also committed to developing tests for both classes.

First, I created the OpenBabelParser Class (converts the atom attributes) and some basic unit tests for it in #12. Once this PR was opened, it revealed that the CI was not correctly setup. With additional help from Irfan ( @IAlibay ), this was corrected in #13. Next, I made the OpenBabelReader Class (converts the atom positions) and a whole suite of tests for it in #16, and then applied some of these ‘extended’ test conditions to the OpenBabelParser in #18. During this time some issues were found with compatibility between, and accessing of, attributes from OpenBabel. I opened an issue on the OpenBabel GitHub to obtain assistance (see here) and also made a issue to flag an attribute that was proving difficult but will need to be supported later in #17.

Documentation

Once the OpenBabelParser and OpenBabelReader were developed, I setup ReadTheDocs #19 and wrote documentation for these two classes #20. The documentation is readily available on ReadTheDocs here. The documentation is functional but currently quite sparse, I have created an issue to improve the landing page and getting started page, once functionality is complete, at #21.

From MDAnalysis to OpenBabel

The next stage of this this project was to implement the OpenBabelConverter Class to convert an MDAnalysis Universe to OpenBabel OBMol. Unfortunately, I have run out of time to implement this class during the allotted time for GSoC. There is an open issue detailing this class at #22, which I will be expanding on shortly. While my time with GSoC has ended, I am keen to complete this project in my own time and continue on with MDAnalysis as a developer.

Next Steps

I will be finishing the OpenBabelConverter, including tests and documentation. The next steps will then be to add attributes that were left for later, further development of the documentation with worked examples and deploying a release of this package. I would also like to integrate this converter into the MDAnalysis package so that it can be used alongside the other converters, as this would increase its visibility and usage.

What can we do now?

With what I currently implemented at the end of the GSoC coding period, users can very easily convert an OBMol to an MDAnalysis universe.

An example of how to use the converter to convert an OBMol to an MDAnalysis universe is shown below:


from openbabel import openbabel as ob
import MDAnalysis as mda
obconversion = ob.OBConversion()
obconversion.SetInFormat("pdb")
mol = ob.OBMol()
obconversion.ReadFile(mol, "1crn.pdb")
u = mda.Universe(mol)

What I’ve Learned

Things I have learned during GSoC include:

  • the importance of good documentation
  • how to develop the ‘backbone’ of a python package so it can be installed, tested and used
  • developing tests with pytest
  • managing myself and being able to make decisions about when something is out of scope or unviable to implement
  • how to use git and GitHub
  • developing classes that inherit from classes that were not developed by me

My experience

I have had an amazing time during GSoC. I had a lot of support and felt very welcomed and encouraged. I can’t wait for this converter to be up and running, so that the community can benefit from it. I am also really grateful for the experience and technical growth GSoC granted me, as I know it will be incredibly beneficial for my future in this field.

NumFocus Small Development Grant: Advancing Molecular Visualization with MolecularNodes

NumFOCUS Foundation

We are thrilled to announce that MDAnalysis has been awarded a Small Development Grant by NumFocus to enhance scientific molecular rendering with MolecularNodes in 2025. This initiative is a collaborative effort between Yuxuan Zhuang and Brady Johnston.

MolecularNodes facilitates the seamless import and visualization of structural biology data within Blender, leveraging Blender’s industry-leading visualization and animation tools. Molecular Nodes has garnered widespread excitement among scientists for its ability to create stunning and informative molecular visualizations. The current version of Molecular Nodes has an under-developed scripting interface, inhibiting the potential for automated molecular rendering. Our project aims to address these limtiations by developing a robust API, enabling users to render molecular structures with straightforward and customizable code.

MolecularNodes

Project Overview

The development will proceed in three key stages:

  1. API Development: We will create a stable API for Molecular Nodes, empowering users to automate molecular rendering with minimal effort.

  2. Interactive Jupyter Integration: A Jupyter widget will be built to integrate with MDAnalysis, providing an interactive environment for controlling and rendering molecular objects directly within notebooks via Blender.

  3. Advanced Visualization Tools: We will develop tools for visualizing basic geometric features and even complex analysis results from MDAnalysis.

Be Part of the Process!

We invite you to join our Discord channel to share your ideas and feedback as we build these tools. If you’d like to be a beta user, let us know—-your input will help shape the future of Molecular Nodes! Stay tuned for updates and sneak peeks of our progress.

Thank you to NumFocus for supporting this exciting project!

Release 2.8.0 of MDAnalysis

We are happy to release version 2.8.0 of MDAnalysis!

This is a minor release of the MDAnalysis library, which means that it contains enhancements, bug fixes, deprecations, and other backwards-compatible changes.

However, in this case minor does not quite do justice to what is happening in this release, given that we have (at least) three big changes/additions:

  1. The license was changed to the GNU Lesser General Public License so that MDAnalysis can be used by packages under any license while keeping the source code itself free and protected.
  2. We introduce the Guesser API for guessing missing topology attributes such as element or mass in a context-dependent manner. Until release 3.0, you should not notice any differences but under the hood we are getting ready to make it easier to work with simulations in a different context (e.g., with the MARTINI force field or experimental PDB files). With consistent attributes, such as elements, it becomes a lot easier to interface with tools like the cheminformatics RDKit (via the converters).

    The guessers are the GSoC 2022 project of @aya9aladdin with help from @lilyminium, @IAlibay, and @jbarnoud.1

  3. We are introducing parallel analysis for tools in MDAnalysis.analysis following the simple split-apply-combine paradigm that we originally prototyped in PMDA 2. What’s really exciting is that any analysis code that is based on MDAnalysis.analysis.base.AnalysisBase can enable parallelization with a few lines of extra code—all the hard work is done behind the scenes in the base class (in a way that is fully backwards compatible!).

    This new feature is the work of @marinegor who brought his GSoC 2023 project to completion, with great contributions by @p-j-smith, @yuxuanzhuang and @RMeli 3.

    Not all MDAnalysis analysis classes have parallelization enabled yet but @talagayev has been working tirelessly on already updating GNMAnalysis, BAT, Dihedral, Ramachandran, Janin, DSSP (yes, MDAnalysis has finally got DSSP, based on pydssp, also thanks to @marinegor), HydrogenBondAnalysis, in addition to RMSD.

Read on for more details on the license change and the usual information on supported environments, upgrading your version of MDAnalysis, and a summary of the most important changes.

License change to LGPL

LGPLv3 logo

This is the first release of MDAnalysis under the Lesser General Public License. We have been working towards this license change for the last 3 years; this release (almost) concludes the process that we described in our licensing update blog post.

  • All code is now under LGPLv2.1 license or any higher version.
  • The package is under the LGPLv3 license or any higher version. However, once we have removed dependencies that prevent licensing under LGPLv2.1+ at the moment, we will also license the package under the same LGPLv2.1+ as the code itself.

We would like to thank all our contributors who granted us permission to change the license. We would also like to thank a number of institutions who were especially supportive of our open source efforts, namely Arizona State University, Australian National University, Johns Hopkins University, and the Open Molecular Science Foundation. We are also grateful to NumFOCUS for legal support. The relicensing team was lead by @IAlibay and @orbeckst.

Supported environments

The minimum required NumPy version is 1.23.3; MDAnalysis now builds against NumPy 2.0.

Supported Python versions: 3.10, 3.11, 3.12, 3.13. Support for version 3.13 has been added in this release and support for 3.9 has been dropped (following SPEC 0).

Please note that Python 3.13 is limited to PyPi for now, the conda-forge channel installs only provide support for Python 3.10 to 3.12.

Supported Operating Systems:

Upgrading to MDAnalysis version 2.8.0

To update with mamba (or conda) from the conda-forge channel run

mamba update -c conda-forge mdanalysis

To update from PyPi with pip run

python -m pip install --upgrade MDAnalysis

For more help with installation see the installation instructions in the User Guide. Make sure you are using a Python version compatible with MDAnalysis before upgrading (Python >= 3.10).

Notable changes

For a full list of changes, bugfixes and deprecations see the CHANGELOG.

Enhancements:

  • Added guess_TopologyAttrs() API to the Universe to handle attribute guessing (PR #3753)
  • Added the DefaultGuesser class, which is a general-purpose guesser with the same functionalities as the existing guesser.py methods (PR #3753)
  • Introduce parallelization API to AnalysisBase and to analysis.rms.RMSD class (Issue #4158, PR #4304)
  • Add analysis.DSSP module for protein secondary structure assignment, based on pydssp
  • Improved performance of PDBWriter (Issue #2785, PR #4472)
  • Added parsing of arbitrary columns of the LAMMPS dump parser. (Issue #3504)
  • Implement average structures with iterative algorithm from DOI 10.1021/acs.jpcb.7b11988. (Issue #2039, PR #4524)
  • Add support for TPR files produced by Gromacs 2024.1 (PR #4523)

Fixes:

  • Fix Bohrium (Bh) atomic mass in tables.py (PR #3753)
  • Catch higher dimensional indexing in GroupBase & ComponentBase (Issue #4647)
  • Do not raise an Error reading H5MD files with datasets like observables/<particle>/<property> (part of Issue #4598, PR #4615)
  • Fix failure in double-serialization of TextIOPicklable file reader. (Issue #3723, PR #3722)
  • Fix failure to preserve modification of coordinates after serialization, e.g. with transformations (Issue #4633, PR #3722)
  • Fix PSFParser error when encountering string-like resids (Issue #2053, Issue #4189 PR #4582)
  • Convert openmm Quantity to raw value for KE and PE in OpenMMSimulationReader.
  • Atomname methods can handle empty groups (Issue #2879, PR #4529)
  • Fix bug in PCA preventing use of frames=... syntax (PR #4423)
  • Fix analysis/diffusionmap.py iteration through trajectory to iteration over self._sliced_trajectory, hence supporting DistanceMatrix.run(frames=...) (PR #4433)

Changes:

  • Relicense code contributions from GPLv2+ to LGPLv2.1+ and the package from GPLv3+ to LGPLv3+ (PR #4794)
  • only use distopia < 0.3.0 due to API changes (Issue #4739)
  • The fetch_mmtf method has been removed as the REST API service for MMTF files has ceased to exist (Issue #4634)
  • MDAnalysis now builds against numpy 2.0 rather than the minimum supported numpy version (PR #4620)

Deprecations:

  • Deprecations of old guessing functionality (in favor of the new Guesser API)
    • MDAnalysis.topology.guessers is deprecated in favour of the new Guessers API and will be removed in version 3.0 (PR #4752)
    • The guess_bonds, vdwradii, fudge_factor, and lower_bound kwargs are deprecated for bond guessing during Universe creation. Instead, pass ("bonds", "angles", "dihedrals") into to_guess or force_guess during Universe creation, and the associated vdwradii, fudge_factor, and lower_bound kwargs into Guesser creation. Alternatively, if vdwradii, fudge_factor, and lower_bound are passed into Universe.guess_TopologyAttrs, they will override the previous values of those kwargs. (Issue #4756, PR #4757)
    • MDAnalysis.topology.tables is deprecated in favour of MDAnalysis.guesser.tables and will be removed in version 3.0 (PR #4752)
    • Element guessing in the ITPParser is deprecated and will be removed in version 3.0 (Issue #4698)
    • Unknown masses are still set to 0.0 for current version, this will be changed in version 3.0.0 and replaced by Masses “no_value_label” attribute (np.nan) (PR #3753)
  • A number of analysis modules have been moved into their own MDAKits, following the 3.0 roadmap towards a trimmed down core library. Until release 3.0, these modules are still available through MDAnalysis.analysis (either as an import of the MDAKit as an automatically installed dependency of the MDAnalysis package or as the original code) but from 3.0 onwards, users must install the MDAKit explicitly and then import it by themselves.

    • The MDAnalysis.analysis.encore module has been deprecated in favour of the mdaencore MDAKit and will be removed in version 3.0.0 (PR #4737)
    • The MDAnalysis.analysis.waterdynamics module has been deprecated in favour of the waterdynamics MDAKit and will be removed in version 3.0.0 (PR #4404)
    • The MDAnalysis.analysis.psa module has been deprecated in favour of the PathSimAnalysis MDAKit and will be removed in version 3.0.0 (PR #4403)
  • The MMTF Reader is deprecated and will be removed in version 3.0 as the MMTF format is no longer supported (Issue #4634).

Author statistics

This release was the work of 22 contributors, 10 of which are new contributors.

Our new contributors are:

Acknowledgements

MDAnalysis thanks NumFOCUS for its continued support as our fiscal sponsor and the Chan Zuckerberg Initiative for supporting MDAnalysis under EOSS4 and EOSS5 awards.

@IAlibay (release manager) on behalf of the MDAnalysis Team


  1. The Guesser API was a big undertaking: Her merged PR #3753 totalled 668 (!) comments. We look forward to the community providing Guessers for specific context. 

  2. Shujie Fan, Max Linke, Ioannis Paraskevakos, Richard J. Gowers, Michael Gecht, and Oliver Beckstein. PMDA - Parallel Molecular Dynamics Analysis. In Chris Calloway, David Lippa, Dillon Niederhut, and David Shupe, editors, Proceedings of the 18th Python in Science Conference, 134 – 142. Austin, TX, 2019. SciPy. doi:10.25080/Majora-7ddc1dd1-013

  3. Adding parallelization in a transparent manner was quite a difficult undertaking that touched many parts of the analysis code and required a lot of thought. Feedback is very welcome! See PR #4162 with 713 (!) comments.