Blog

MDAnalysis 2.0 is here

We are happy to release version 2.0.0 of MDAnalysis!

This is a new major version, and includes major API breaking changes in addition to a large number of updates and enhancements. This release concludes our current roadmap, an updated roadmap detailing our future steps will shortly be published.

This version supports Python 3.6 through to 3.9 on Windows (32- and 64-bit), MacOS, and Linux. It also offers preliminary support for ppc64le and ARM64, but does not currently support Apple M1 processors.

Support for legacy version of Python (e.g. 2.7 and 3.5) is only available in MDAnalysis 1.x releases.

Upgrading to MDAnalysis version 2.0.0

To install with conda from the conda-forge channel run

conda update -c conda-forge mdanalysis

To install from PyPi with pip run

pip install --upgrade MDAnalysis

For more help with installation see the installation instructions in the User Guide.

Notable new additions

There were many awesome contributions to MDAnalysis for version 2.0.0. Here we list a few notable example of these. For more information please see the CHANGELOG.

RDKit converter

Google Summer of Code student @cbouy implemented a new RDKit converter which allows for seamless conversions from RDKit to MDAnalysis structures. See @cbouy’s GSoC RDKit report for more information.

It currently accurately converts 90% of chemical space (based on ChEMBL27), but future changes will improve this.

OpenMM converter

@ahy3nz implemented an OpenMM converter which can read OpenMM objects directly into MDAnalysis.

Universe serialization

Google Summer of Code student @yuxuanzhuang implemented a means to serialize MDAnalysis.Universe objects, paving the way for parallel analysis. See @yuxuanzhuang’s GSoC Universe serialization report for more information.

Mean Square Displacement

@hmacdope implemented a new Mean Square Displacement analysis class.

Results class

@PicoCentauri implemented a Results class to store results from analysis classes. This change streamlines the API for analysis classe and paves the way for a soon-to-be released command line interface for MDAnalysis.

H5MD support

NSF REU student @edisj implemented both a reader and writer for the H5MD format.

Bond-Angle-Torsion coordinates

@daveminh implemented a means to translate from Cartesian to Bond-Angle-Torsion coordinates.

Important fixes

For a full list of bugfixes see the CHANGELOG. The following are selected fixes that may have lead to wrong results depending on your use case.

Core

  • Fixes an issue where select_atom, AtomGroup.unique, ResidueGroup.unique, and SegmentGroup.unique did not sort the output atoms (Issues #3364 #2977)
  • Fixes the sometimes wrong sorting of atoms into fragments when unwrapping (Issue #3352)
  • AtomGroup.center now works correctly for compounds + unwrapping (Issue #2984)

File formats

  • Fixes support for DL_POLY HISTORY files that contain cell information even if there are no periodic boundary conditions (Issue #3314)
  • PDBWriter will use chainID instead of segID (Issue #3144)
  • PDBParser and PDBWriter now assign and use the element attribute (Issues #2422 #2423)

Analyses

  • Fixes issue when attempting to use/pass mean positions to PCA analysis (Issue #2728)
  • Fixes issue with WaterBridgeAnalysis double counting waters (Issue #3119)
  • Documents and fixes the density keyword for rdf.InterRDF_s (Issue #2811)
  • Fixed Janin analysis residue filtering, including CYSH (Issue #2898)

Changes to functionality

  • Deprecated hbonds.hbond_analysis has been removed in favour of hydrogenbonds.hbond_analysis (Issues #2739, #2746)
  • TPRParser now loads TPR files with tpr_resid_from_one=True by default, which starts TPR resid indexing from 1 (instead of 0 as in previous MDAnalysis versions) (Issue #2364, PR #3152)
  • analysis.hole has now been removed in favour of analysis.hole2.hole (Issue #2739)
  • Writer.write(Timestep) and Writer.write_next_timestep have been removed. Please use write() instead (Issue #2739)
  • Removes deprecated density_from_Universe, density_from_PDB, Bfactor2RMSF, and notwithin_coordinates_factory from MDAnalysis.analysis.density (Issue #2739)
  • Removes deprecated waterdynamics.HydrogenBondLifetimes (PR #2842)
  • hbonds.WaterBridgeAnalysis has been moved to hydrogenbonds.WaterBridgeAnalysis (Issue #2739 PR #2913)

Deprecations

  • The bfactors attribute is now aliased to tempfactors and will be removed in 3.0.0 (Issue #1901)
  • WaterBridgeAnalysis.generate_table() now returns table information, with the table attribute being deprecated
  • Various analysis result attributes which are now stored in Results will be deprecated in 3.0.0 (Issue #3261)
  • In 3.0.0 the ParmEd classes will only be accessible from the MDAnalysis.converters module
  • In 2.1.0 the TRZReader will default to a dt of 1.0 ps when failing to read it from the input TRZ trajectory

Author statistics

Altogether, this represents the work of 35 authors, 17 of which were new contributors:

MDAnalysis thanks NumFOCUS’s continued support as the organisation’s fiscal sponsor.

  • The MDAnalysis Team

Google Summer of Code Students 2021

We are happy to announce that MDAnalysis is hosting two GSoC students this year – @ojeda-e, and @orioncohen. MDAnalysis has been accepted as its own organization with GSoC for a second year running and we are grateful to Google for granting us two student slots for two exciting projects. Both the students and mentors have a very exciting few months ahead!

Estefania Barreto-Ojeda: Curvature analysis of biological membranes

Estefania Barreto-Ojeda

Interested in contributing to an open-source initiative, Estefania will expand the capabilities of MDAnalysis by integrating a new MDAnalysis module to calculate membrane curvature to derive and visualize membrane curvature profiles of protein-membrane/membrane-only systems obtained from Molecular Dynamics (MD) simulations. With the introduction of this analysis module, users will rapidly extract mean and gaussian curvature of biological membranes and their respective visualization in 2D-profile maps.

Estefania is a Ph.D. candidate in Biophysical Chemistry at The University of Calgary, Canada in the research group of Peter Tieleman. Her research work is focused on membrane curvature induced by ABC transporters, a superfamily of transmembrane proteins involved in cancer and antibiotic resistance. A typical day for Estefania includes running Coarse-Grained (CG) MD simulations using the Martini force field, reading literature on ABCs, and working on cool data visualization workflows. In her free time, she enjoys camping and road tripping in the Canadian Rockie Mountains and going for long bike rides.

Estefania can be found on github as @ojeda-e and on twitter as @ebojeda.

Her journey will be documented on the blog Le Mirroir.

Orion Cohen: A Solvation Module for MDAnalysis

Orion Cohen

The macroscopic behavior of a liquid is determined by its microscopic structure. For ionic systems, like batteries and many enzymes, the solvation environment surrounding ions is especially important. By studying the solvation of interesting materials, scientists can better understand, engineer, and design new technologies. The aim of this project is to implement a robust and cohesive set of methods for solvation analysis that would be widely useful in both biomolecular and battery electrolyte simulations. The core of the solvation module will be a set of functions for easily working with ionic solvation shells. Building from that core functionality, the module will implement several analysis methods for analyzing ion pairing, ion speciation, residence times, and shell association and dissociation.

Orion is a Ph.D. student at the University of California Berkeley, working with Dr. Kristin Persson at the Lawrence Berkeley National Laboratory. His work leverages high-throughput chemical simulations and machine learning to discover new materials for Lithium-ion batteries. Orion is passionate about making science more accessible, reproducible, and efficient with powerful open-source software. If he isn’t toiling at his computer, Orion is probably playing board games, camping, or relaxing in the temperate Berkeley sun.

Orion is on Github as @orioncohen and on twitter as @orion__archer.

He will be sharing his experience with GSoC on his website.

@richardjgowers @IAlibay @fiona-naughton @orbeckst @lilyminium @hmacdope (mentors)

Version 1

We are happy to release version 1 of MDAnalysis!

MDAnalysis 1.x is a stable legacy platform that supports

  • Python 2.7 (on Linux and MacOS only),
  • Python 3.5, 3.6, 3.7 and 3.8 (on Windows (32- and 64-bit), MacOS, and Linux).

The API will not change for any upcoming 1.x versions. This release completes the first phase of our roadmap. From here on, we will focus on developing version 2 of MDAnalysis, which will support Python 3.6+ only, and will include major API changes and improvements.

The 1.0.0, 1.0.1 and 1.1.1 versions of MDAnalysis are the product of more than 18 months effort. They contain a multitude of fixes, deprecations, and new features. We highlight important changes below, but users are strongly encouraged to read the CHANGELOG for full details. All users are recommended to upgrade to 1.1.1, as it includes important fixes.

Upgrading to MDAnalysis version 1.1.1

In Dependencies we list the required dependencies of MDAnaysis version 1; when installing with pip or conda, the correct dependencies should be automatically pulled in by the package manager:

To install with conda from the conda-forge channel run

conda update -c conda-forge mdanalysis

(Note that for Python 2.7 and 3.5, no conda packages are currently available due to the difficulty of building packages for legacy Python versions. Please install with pip.)

To install from PyPi with pip run

pip install --upgrade MDAnalysis

(This will likely build the package from source so you need the appropriate compilers. )

For more help with installation see the Installation instructions in the User Guide.

Notable new additions

We now support several new formats:

We have also added converters to and from other popular analysis packages. We plan to expand in this exciting direction in future versions, as laid out in our interoperability roadmap. For now, we support:

New additions to analysis include:

  • frames and times attributes to AnalysisBase to capture the frames and times that the analysis was run() on. This is accessible to all analyses that subclass AnalysisBase.
  • a correlations module for computing the discrete autocorrelation function
  • a new HydrogenBondAnalysis class for improved and more efficient analysis of hydrogen bonds, which replaces the now deprecated hbond_analysis code
  • an AverageStructure class for computing the average structure of a trajectory out of memory
  • a hole2 module for improved interfacing with the HOLE2 program, which replaces the now deprecated hole module
  • a DensityAnalysis class for improved density analysis, replacing the now deprecated density_from_Universe() code
  • a method to compute the root-mean-square-inner-product of subspaces
  • a method to calculate the cumulative overlap of a vector in a subspace

Other additions to core functionality include:

Miscellaneous performance improvements include:

  • Dihedral selection in the Ramachandran class has been sped up ~700x.
  • TPR parsing has been sped up 2–30x.

Notable improvements

We have improved the flexibility to our atom selection language, allowing for advanced pattern matching operators.

For example, we now support ? for single character matching, so using resname T?R in a selection string for a protein would yield both residues THR and TYR. More information can be found in our selection documentation.

Notable changes to analysis include:

  • The argument order to AnalysisFromFunction are now as specified in the documentation
  • The select keyword has been standardized by removing selection, atomselection, and ref_select in the contact, gnm, helanal, hole, encore, and hydrogenbonds modules
  • The save() functions have been removed from contacts, diffusionmap, hole, lineardensity, and rms modules
  • Progress bars have been replaced with an improved version from tqdm
  • The radius_cut_q method has been added to contacts.Contacts

Other notable improvements to the core functionality include:

  • AtomGroup.guess_bonds now uses periodic boundary information when available
  • The TPRParser now supports GROMACS 2020
  • When reading PDB and XYZ files, MDAnalysis now adds an elements attribute if the provided elements are valid

Important fixes

For the full list of fixed please see the CHANGELOG. The following are selection of fixes that could have either lead to wrong results or were often reported by users as problematic:

Core

  • Neighbor searching, which is a fundamental component of many analyses in MDAnalysis (such as hydrogenbonds and RDF calculation) had a number of bugs in 1.0.0 that could lead to wrong results, in particular with triclinic unit cells. The buggy code was disabled in 1.0.1 and fixed in 1.1.1. See issues #2229, #2345, #2919, #2670, #2930 for details.
  • Fixed a SegmentationFault for the selection “around 0.0 SELECTION” (Issue #2656)
  • AtomGroup.center() now works correctly for compounds and unwrapping (Issue #2984)
  • When bonds are guessed from distances (AtomGroup.guess_bonds), periodic boundary information is properly taken into account. Bonds that were split across the periodic boundary would have not beend correctly guessed previously. (Issue #2350)
  • The testsuite does not fail anymore with newer version of matplotlib (Issue #2191)

File formats

  • PDB files
    • Better handling of cryo-electron microscopy box dimensions in PDB files:
      • When a PDB file is read, a cryo-em 1 Å3 default CRYST1 record will be interpreted as “no dimensions” and the box dimension in MDAnalysis is set to None (Issue #2599)
      • When box dimensions are missing (u.dimensions is None or np.zeros(6)) then a unitary CRYST1 record (cubic box with sides of 1 Å) is written (Issue #2679)
    • PDB files no longer lose chainIDs when reading files without segIDs (Issue #2389)
    • PDBWriter now uses last character of segid as ChainID (Issue #2224)
  • In GRO files, unit cells with box vectors larger than 1000 nm are now correctly handled (Issue #2371)
  • Reading of XTC and TRR files will not anymore fail with an IOError when the hidden offset files cannot be read; instead, the offsets are recalculated from the trajectory (Issue #1893)
  • Masses and charges in HooMD XML files are now correctly read (#2888)

Analysis

  • PCA analysis:
    • PCA(align=True) now correctly aligns the trajectory and computes the correct means and covariance matrix (Issue #2561)
    • Specifying n_components now correctly selects the PCA components (Issue #2623)
  • Contact Analysis class respects PBC (Issue #2368)

Deprecations

This release brings several deprecations as the package heads towards version 2.0.0.

The following parts of the analysis code will be removed/changed in version 2.0.0:

  • analysis.hole is deprecated in favor of analysis.hole2.
  • analysis.hbonds.HydrogenBondsAnalysis is deprecated in favor of analysis.hydrogenbonds.hbond_analysis.
  • analysis.density.density_from_Universe() is deprecated in favor of analysis.density.DensityAnalysis.
  • The notwithin_coordinates_factory() and density_from_PDB() methods of analysis.density are deprecated.
  • analysis.waterdynamics.HydrogenBondLifetimes is deprecated in favor of analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis.lifetime() (to be implemented in version 2.0.0)
  • analysis.leaflets.LeafletFinder() will no longer accept a filename, in 2.0.0 only Universes will be supported as inputs.
  • analysis.helanal is deprecated and will be replaced by analysis.helix_analysis in 2.0.0.
  • analysis.hbonds.WaterBridgeAnalysis will be moved to analysis.hydrogenbonds.WaterBridgeAnalysis.

The following parts of the readers/writers will be removed/changed in version 2.0.0:

  • Writer.write_next_timestep() is deprecated in favor of Writer.write().
  • Passing Timestep objects to Writer.write() is deprecated. In 2.0.0 only Universe or AtomGroup objects will be accepted.
  • The way in which the NCDFWriter handles scale factors will change in version 2.x (see Issue #2327 for more details).
  • When writing PDB files, MDAnalysis will no longer be using the last letter of the SegID to set the chainID in version 2.0.0.
  • The bfactors and tempfactors attributes (set by the PDB and MMTF parsers respectively), will be aliased in version 2.0.0.
  • When parsing TPR files, resids will be indexed from 1 rather than the current default of 0.

The following part of the core and library components will be removed/changed in version 2.0.0:

  • lib.log.echo() is deprecated in favor of the new lib.log.ProgressBar.
  • core.universe.as_Universe() is deprecated.

Dependencies

We list below the core dependencies and versions that MDAnalysis has been tested on. They are provided as a string for easy use with conda or pip:

  • Python 2.7: biopython==1.76 cython==0.29.15 griddataformats==0.5.0 gsd==1.7.0 matplotlib==2.2.5 mmtf-python==1.1.2 netcdf4==1.3.1 numpy==1.16.5 scipy==1.2.1 tqdm==4.60.0
  • Python 3.5: biopython==1.72 cython==0.28.5 griddataformats==0.5.0 gsd==1.5.3 matplotlib==3.0.0 mmtf-python==1.1.2 netcdf4==1.3.1 numpy==1.15.2 scipy==1.1.0 tqdm==4.60.0
  • Python 3.6: biopython==1.78 cython==0.29.23 griddataformats==0.5.0 gsd==2.1.2 matplotlib==3.3.2 mmtf-python==1.1.2 netcdf4==1.5.4 numpy==1.16.0 scipy==1.5.1 tqdm==4.60.0
  • Python 3.7: biopython==1.78 cython==0.29.23 griddataformats==0.5.0 gsd==2.4.2 matplotlib==3.4.1 mmtf-python==1.1.2 netcdf4==1.5.6 numpy==1.20.2 scipy==1.6.3 tqdm==4.60.0
  • Python 3.8: biopython==1.78 cython==0.29.23 griddataformats==0.5.0 gsd==2.4.2 matplotlib==3.4.1 mmtf-python==1.1.2 netcdf4==1.5.6 numpy==1.20.2 scipy==1.6.3 tqdm==4.60.0

Author statistics

Altogether this represents the work of 42 contributors from around the world, and featured the work of 25 new contributors:

The MDAnalysis Project gratefully acknowledges a Small Development Grant provided by NumFOCUS.

— The MDAnalysis Team