Google Summer of Code Students 2020

We are happy to announce that MDAnalysis is hosting three GSoC students this year – @hmacdope, @cbouy, and @yuxuanzhuang. This is the first year that MDAnalysis has been accepted as its own organization with GSoC and we are grateful to Google for granting us three student slots so that we can have three exciting GSoC projects.

Hugo MacDermott-Opeskin: Trajectory New Generation: the trajectory format for the future of simulation

Hugo MacDermott-Opeskin

Trajectory storage has always proved problematic for the molecular simulation community, as large volumes of data can be generated quickly. Traditional trajectory formats suffer from poor portability, large file sizes and limited ability to include metadata relevant to simulation. The Trajectory New Generation (TNG) format developed by the GROMACS team represents the first trajectory format with small file sizes, metadata storage, archive integrity verification and user/software signatures. The primary goal of this project is for @hmacdope to refactor the existing TNG code into C++ to provide clarity and usability for GROMACS, other simulation packages and analysis tools. Thin FORTRAN and Python layers are also desirable to encourage widespread adoption and are a secondary goal of the project. An efficient and transferable implementation of the TNG format will represent a major step forward for the computational molecular sciences community, enabling easy storage and replication of simulations.

This project is a collaboration with the GROMACS developer team with @acmnpv from GROMACS serving as a co-mentor.

Hugo MacDermott-Opeskin is a PhD student in computational chemistry at the Australian National University. His work focuses on studying membrane biophysics through molecular dynamics simulations coupled with enhanced sampling techniques. Hugo can be found on github as @hmacdope and on twitter as @hugomacdermott. When not hard at work Hugo can be found running or mountain biking in the Canberra hills.

Through GSoC Hugo aims to bring the TNG next generation trajectory format to the simulation community and he will document his experience at his “Biophysics Bonanza” blog.

Cédric Bouysset: From RDKit to the Universe and back

Cédric Bouysset

The aim of the RDKit interoperability project is to give MDAnalysis the ability to use RDKit’s Chem.Mol structure as an input to an MDAnalysis Universe, but also to convert a Universe or AtomGroup to an RDKit molecule. RDKit is one of the most complete and one of the most commonly used chemoinformatics package, yet it lacks file readers for formats typically encountered in MD simulations. @cbouy will implement in MDAnalysis the ability to switch back and forth between a Universe and an RDKit molecule to perform typical chemoinformatics calculations and so add a lot of value to both packages.

Cédric is a PhD student in molecular modelling at Université Côte D’Azur, France. His research aims to decipher the molecular basis of chemosensory perception (smell and taste) using computational tools. His day-to-day work includes; modelling bitter taste receptors, building machine-learning models to search for molecules with interesting olfactive or sapid properties, maintaining the website of the Global Consortium of Chemosensory Researchers, and a bit of teaching. In his free time he enjoys cooking and playing video games. Cédric can be found on github as @cbouy and on twitter as @cedricbouysset.

Cédric will describe his progress in his blog.

Yuxuan Zhuang: Serialize Universes for parallel

Yuxuan Zhuang

As we approach the exascale barrier, researchers are handling increasingly large volumes of molecular dynamics (MD) data. Whilst MDAnalysis is a flexible and relatively fast framework for complex analysis tasks in MD simulations, implementing a parallel computing framework would play a pivotal role in accelerating the time to solution for such large datasets. To achieve a flawless implementation of parallelism, @yuxuanzhuang will implement serialization support for Universe, the core of MDAnalysis. Furthermore, he will adapt this new serialization functionality to accelerate MDAnalysis’ analysis modules using distributed computing frameworks, e.g. Dask, multiprocessing, or MPI.

Yuxuan is a PhD student at Stockholm University. He mainly works on understanding pentameric ligand-gated ion channels from MD simulations. His daily workflow involves setting up and running simulations, on lab clusters or HPC centers, and performing various analyses on the MD trajectories in his jupyter notebook. Yuxuan can be found on github as @yuxuanzhuang.

Yuxuan will chronicle his work on his blog.

@richardjgowers @IAlibay @acmnpv @fiona-naughton @orbeckst (mentors)