Blog

Google Summer of Code Students 2021

We are happy to announce that MDAnalysis is hosting two GSoC students this year – @ojeda-e, and @orioncohen. MDAnalysis has been accepted as its own organization with GSoC for a second year running and we are grateful to Google for granting us two student slots for two exciting projects. Both the students and mentors have a very exciting few months ahead!

Estefania Barreto-Ojeda: Curvature analysis of biological membranes

Estefania Barreto-Ojeda

Interested in contributing to an open-source initiative, Estefania will expand the capabilities of MDAnalysis by integrating a new MDAnalysis module to calculate membrane curvature to derive and visualize membrane curvature profiles of protein-membrane/membrane-only systems obtained from Molecular Dynamics (MD) simulations. With the introduction of this analysis module, users will rapidly extract mean and gaussian curvature of biological membranes and their respective visualization in 2D-profile maps.

Estefania is a Ph.D. candidate in Biophysical Chemistry at The University of Calgary, Canada in the research group of Peter Tieleman. Her research work is focused on membrane curvature induced by ABC transporters, a superfamily of transmembrane proteins involved in cancer and antibiotic resistance. A typical day for Estefania includes running Coarse-Grained (CG) MD simulations using the Martini force field, reading literature on ABCs, and working on cool data visualization workflows. In her free time, she enjoys camping and road tripping in the Canadian Rockie Mountains and going for long bike rides.

Estefania can be found on github as @ojeda-e and on twitter as @ebojeda.

Her journey will be documented on the blog Le Mirroir.

Orion Cohen: A Solvation Module for MDAnalysis

Orion Cohen

The macroscopic behavior of a liquid is determined by its microscopic structure. For ionic systems, like batteries and many enzymes, the solvation environment surrounding ions is especially important. By studying the solvation of interesting materials, scientists can better understand, engineer, and design new technologies. The aim of this project is to implement a robust and cohesive set of methods for solvation analysis that would be widely useful in both biomolecular and battery electrolyte simulations. The core of the solvation module will be a set of functions for easily working with ionic solvation shells. Building from that core functionality, the module will implement several analysis methods for analyzing ion pairing, ion speciation, residence times, and shell association and dissociation.

Orion is a Ph.D. student at the University of California Berkeley, working with Dr. Kristin Persson at the Lawrence Berkeley National Laboratory. His work leverages high-throughput chemical simulations and machine learning to discover new materials for Lithium-ion batteries. Orion is passionate about making science more accessible, reproducible, and efficient with powerful open-source software. If he isn’t toiling at his computer, Orion is probably playing board games, camping, or relaxing in the temperate Berkeley sun.

Orion is on Github as @orioncohen and on twitter as @orion__archer.

He will be sharing his experience with GSoC on his website.

@richardjgowers @IAlibay @fiona-naughton @orbeckst @lilyminium @hmacdope (mentors)

Version 1

We are happy to release version 1 of MDAnalysis!

MDAnalysis 1.x is a stable legacy platform that supports

  • Python 2.7 (on Linux and MacOS only),
  • Python 3.5, 3.6, 3.7 and 3.8 (on Windows (32- and 64-bit), MacOS, and Linux).

The API will not change for any upcoming 1.x versions. This release completes the first phase of our roadmap. From here on, we will focus on developing version 2 of MDAnalysis, which will support Python 3.6+ only, and will include major API changes and improvements.

The 1.0.0, 1.0.1 and 1.1.1 versions of MDAnalysis are the product of more than 18 months effort. They contain a multitude of fixes, deprecations, and new features. We highlight important changes below, but users are strongly encouraged to read the CHANGELOG for full details. All users are recommended to upgrade to 1.1.1, as it includes important fixes.

Upgrading to MDAnalysis version 1.1.1

In Dependencies we list the required dependencies of MDAnaysis version 1; when installing with pip or conda, the correct dependencies should be automatically pulled in by the package manager:

To install with conda from the conda-forge channel run

conda update -c conda-forge mdanalysis

(Note that for Python 2.7 and 3.5, no conda packages are currently available due to the difficulty of building packages for legacy Python versions. Please install with pip.)

To install from PyPi with pip run

pip install --upgrade MDAnalysis

(This will likely build the package from source so you need the appropriate compilers. )

For more help with installation see the Installation instructions in the User Guide.

Notable new additions

We now support several new formats:

We have also added converters to and from other popular analysis packages. We plan to expand in this exciting direction in future versions, as laid out in our interoperability roadmap. For now, we support:

New additions to analysis include:

  • frames and times attributes to AnalysisBase to capture the frames and times that the analysis was run() on. This is accessible to all analyses that subclass AnalysisBase.
  • a correlations module for computing the discrete autocorrelation function
  • a new HydrogenBondAnalysis class for improved and more efficient analysis of hydrogen bonds, which replaces the now deprecated hbond_analysis code
  • an AverageStructure class for computing the average structure of a trajectory out of memory
  • a hole2 module for improved interfacing with the HOLE2 program, which replaces the now deprecated hole module
  • a DensityAnalysis class for improved density analysis, replacing the now deprecated density_from_Universe() code
  • a method to compute the root-mean-square-inner-product of subspaces
  • a method to calculate the cumulative overlap of a vector in a subspace

Other additions to core functionality include:

Miscellaneous performance improvements include:

  • Dihedral selection in the Ramachandran class has been sped up ~700x.
  • TPR parsing has been sped up 2–30x.

Notable improvements

We have improved the flexibility to our atom selection language, allowing for advanced pattern matching operators.

For example, we now support ? for single character matching, so using resname T?R in a selection string for a protein would yield both residues THR and TYR. More information can be found in our selection documentation.

Notable changes to analysis include:

  • The argument order to AnalysisFromFunction are now as specified in the documentation
  • The select keyword has been standardized by removing selection, atomselection, and ref_select in the contact, gnm, helanal, hole, encore, and hydrogenbonds modules
  • The save() functions have been removed from contacts, diffusionmap, hole, lineardensity, and rms modules
  • Progress bars have been replaced with an improved version from tqdm
  • The radius_cut_q method has been added to contacts.Contacts

Other notable improvements to the core functionality include:

  • AtomGroup.guess_bonds now uses periodic boundary information when available
  • The TPRParser now supports GROMACS 2020
  • When reading PDB and XYZ files, MDAnalysis now adds an elements attribute if the provided elements are valid

Important fixes

For the full list of fixed please see the CHANGELOG. The following are selection of fixes that could have either lead to wrong results or were often reported by users as problematic:

Core

  • Neighbor searching, which is a fundamental component of many analyses in MDAnalysis (such as hydrogenbonds and RDF calculation) had a number of bugs in 1.0.0 that could lead to wrong results, in particular with triclinic unit cells. The buggy code was disabled in 1.0.1 and fixed in 1.1.1. See issues #2229, #2345, #2919, #2670, #2930 for details.
  • Fixed a SegmentationFault for the selection “around 0.0 SELECTION” (Issue #2656)
  • AtomGroup.center() now works correctly for compounds and unwrapping (Issue #2984)
  • When bonds are guessed from distances (AtomGroup.guess_bonds), periodic boundary information is properly taken into account. Bonds that were split across the periodic boundary would have not beend correctly guessed previously. (Issue #2350)
  • The testsuite does not fail anymore with newer version of matplotlib (Issue #2191)

File formats

  • PDB files
    • Better handling of cryo-electron microscopy box dimensions in PDB files:
      • When a PDB file is read, a cryo-em 1 Å3 default CRYST1 record will be interpreted as “no dimensions” and the box dimension in MDAnalysis is set to None (Issue #2599)
      • When box dimensions are missing (u.dimensions is None or np.zeros(6)) then a unitary CRYST1 record (cubic box with sides of 1 Å) is written (Issue #2679)
    • PDB files no longer lose chainIDs when reading files without segIDs (Issue #2389)
    • PDBWriter now uses last character of segid as ChainID (Issue #2224)
  • In GRO files, unit cells with box vectors larger than 1000 nm are now correctly handled (Issue #2371)
  • Reading of XTC and TRR files will not anymore fail with an IOError when the hidden offset files cannot be read; instead, the offsets are recalculated from the trajectory (Issue #1893)
  • Masses and charges in HooMD XML files are now correctly read (#2888)

Analysis

  • PCA analysis:
    • PCA(align=True) now correctly aligns the trajectory and computes the correct means and covariance matrix (Issue #2561)
    • Specifying n_components now correctly selects the PCA components (Issue #2623)
  • Contact Analysis class respects PBC (Issue #2368)

Deprecations

This release brings several deprecations as the package heads towards version 2.0.0.

The following parts of the analysis code will be removed/changed in version 2.0.0:

  • analysis.hole is deprecated in favor of analysis.hole2.
  • analysis.hbonds.HydrogenBondsAnalysis is deprecated in favor of analysis.hydrogenbonds.hbond_analysis.
  • analysis.density.density_from_Universe() is deprecated in favor of analysis.density.DensityAnalysis.
  • The notwithin_coordinates_factory() and density_from_PDB() methods of analysis.density are deprecated.
  • analysis.waterdynamics.HydrogenBondLifetimes is deprecated in favor of analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis.lifetime() (to be implemented in version 2.0.0)
  • analysis.leaflets.LeafletFinder() will no longer accept a filename, in 2.0.0 only Universes will be supported as inputs.
  • analysis.helanal is deprecated and will be replaced by analysis.helix_analysis in 2.0.0.
  • analysis.hbonds.WaterBridgeAnalysis will be moved to analysis.hydrogenbonds.WaterBridgeAnalysis.

The following parts of the readers/writers will be removed/changed in version 2.0.0:

  • Writer.write_next_timestep() is deprecated in favor of Writer.write().
  • Passing Timestep objects to Writer.write() is deprecated. In 2.0.0 only Universe or AtomGroup objects will be accepted.
  • The way in which the NCDFWriter handles scale factors will change in version 2.x (see Issue #2327 for more details).
  • When writing PDB files, MDAnalysis will no longer be using the last letter of the SegID to set the chainID in version 2.0.0.
  • The bfactors and tempfactors attributes (set by the PDB and MMTF parsers respectively), will be aliased in version 2.0.0.
  • When parsing TPR files, resids will be indexed from 1 rather than the current default of 0.

The following part of the core and library components will be removed/changed in version 2.0.0:

  • lib.log.echo() is deprecated in favor of the new lib.log.ProgressBar.
  • core.universe.as_Universe() is deprecated.

Dependencies

We list below the core dependencies and versions that MDAnalysis has been tested on. They are provided as a string for easy use with conda or pip:

  • Python 2.7: biopython==1.76 cython==0.29.15 griddataformats==0.5.0 gsd==1.7.0 matplotlib==2.2.5 mmtf-python==1.1.2 netcdf4==1.3.1 numpy==1.16.5 scipy==1.2.1 tqdm==4.60.0
  • Python 3.5: biopython==1.72 cython==0.28.5 griddataformats==0.5.0 gsd==1.5.3 matplotlib==3.0.0 mmtf-python==1.1.2 netcdf4==1.3.1 numpy==1.15.2 scipy==1.1.0 tqdm==4.60.0
  • Python 3.6: biopython==1.78 cython==0.29.23 griddataformats==0.5.0 gsd==2.1.2 matplotlib==3.3.2 mmtf-python==1.1.2 netcdf4==1.5.4 numpy==1.16.0 scipy==1.5.1 tqdm==4.60.0
  • Python 3.7: biopython==1.78 cython==0.29.23 griddataformats==0.5.0 gsd==2.4.2 matplotlib==3.4.1 mmtf-python==1.1.2 netcdf4==1.5.6 numpy==1.20.2 scipy==1.6.3 tqdm==4.60.0
  • Python 3.8: biopython==1.78 cython==0.29.23 griddataformats==0.5.0 gsd==2.4.2 matplotlib==3.4.1 mmtf-python==1.1.2 netcdf4==1.5.6 numpy==1.20.2 scipy==1.6.3 tqdm==4.60.0

Author statistics

Altogether this represents the work of 42 contributors from around the world, and featured the work of 25 new contributors:

The MDAnalysis Project gratefully acknowledges a Small Development Grant provided by NumFOCUS.

— The MDAnalysis Team

Powered by MDAnalysis

Powered by MDAnalysis We have created this nifty Shields.io badge for you to display on your MDAnalysis-using projects!

By showcasing your use of MDAnalysis you are

  • helping to spread the love ❤️ and strengthening the MDAnalysis user community,
  • rewarding our open-source effort and boosting our dev team’s motivation 😊,
  • providing a non-intrusive way for us to know our user-base — increasingly relevant in raising funds for development.

The above points all lead to happier users and devs, and ultimately to a better MDAnalysis overall 💪.

So just choose from our available badge-embedding markup examples and start spreading the word today!