MDAnalysis started almost 10 years ago
when Python was around version 2.4 and interfacing with existing C code was
mostly done by writing C-wrappers that directly used CPython. This legacy code
has hampered a speedy full transition to Python 3 and consequently MDAnalysis
lags behind the rest of the scientific Python community in fully supporting
Python 3.
Although about 80% of code passes unit tests in Python 3, we urgently need to
close the remaining 20% gap in order to support our user base and to safeguard
the long term viability of the project.
In the meantime we are busy porting our last Python 2.7 only C-extension, the
DCD Reader and Writer, to Cython. We now have a working Cython version that can
be used without MDAnalysis, similar to our XTC and TRR readers. Only a clean up
of the new Cython / DCD handling code and updated documentation is required. You
can check our progress here.
We have just released MDAnalysis version 0.16.0. This release contains new
features as well as bug fixes. Highlights are listed below but for more details
see the release notes.
This is the biggest release we ever had with tons of new features. It includes a
rewrite of the topology system, the work of our GSoC students Fiona Naughton
(@fiona-naughton) and John Detlefs (@jdetle), a complete new ensemble analysis
with encore and much more. In total 28
people contributed 1904 commits to this release. We closed 162 issues
and merged 199 pull requests.
Upgrade
You can upgrade with pip install --upgrade MDAnalysis . If you use the conda
package manager run conda update -c conda-forge mdanalysis
Noticable Changes
You can find a notebook with code example of the
changes
here.
Rewrite of our internal topology representation
This is the change we are most exited about for this release. It brings
performance enhancements, makes maintaining the code easier, it’s easier to
extend and allowed us a simplification of the interface. We have previously
written about the details of
new topology system.
But with this change also a lot of deprecated functions have been removed. For
all of the deprecated functions the new replacements already exists and you will
get a warning with a suggested change. The easiest way to check if your scripts
will run without a problem after the update is to include the following on top
of it.
This will print a warning for every deprecated function you are using together
with a short code snippet how your code has to be changed. If you do not want to
upgrade your existing scripts
we posted a guide how to use conda and
python environments to run different versions of MDAnalysis on the same
computer.
Attach arbitrary time series to your trajectories
Our GSoC student @fiona-naughton has implemented an auxillary reader to add
arbitrary time series to a universe. The time series are kept in sync with the
trajectory so it is possible to iterate through the trajectory and access the
auxiliary data corresponding to the current time step.
importMDAnalysisasmdafromMDAnalysisTests.datafilesimportPDB_sub_sol,XTC_sub_sol,XVG_BZ2# Create your universe as usual
universe=mda.Universe(PDB_sub_sol,XTC_sub_sol)# Attach an auxiliary time serie with the name `forces`
# In this example, the XVG file contains the force that applies to each atom.
universe.trajectory.add_auxiliary('forces',XVG_BZ2)# Itarete through your trajectory, the time serie is kept in sync
fortime_stepinuniverse.trajectory:print(time_step.aux.forces)# The first element of each array is the time in picoseconds.
# The next elements are the other columns of the XVG file.
This feature is still in its beginning and will be expanded in future releases. You can
follow the conversation on the initial issue or on the pull request.
So far, only the XVG format used by gromacs and grace are supported. Open an issue
if you need support for other time series formats.
Do a dimension reduction with PCA and Diffusion Maps
@jdetle has implemented two new dimension reduction algorithms,
Principal Component Analysis and Diffusion Maps. Both can
be found in the analysis submodule. As an example lets look at the first two PCA
dimensions of ADK from our test files.
importmatplotlib.pyplotaspltimportMDAnalyisasmdafromMDAnalysis.analysis.pcaimportPCAfromMDAnalyisTests.datafilesimportPSF,DCDplt.style.use('ggplot')u=mda.Universe(PSF,DCD)pca=PCA(u,select='protein and name CA',verbose=True).run()reduced_data=pca.transform(ca,n_components=2)f,ax=plt.subplots()ax.plot(d[:,0],d[:,1],'o')ax.set(xlabel=r'PC$_1$ [$\AA$]',ylabel=r'PC$_2$ [$\AA$]',title='PCA of ADK')
Convenience functions to create a new analysis
A while back we introduced a new frame work for analysis to unify the API for
the different analysis methods we offer. With this release we also add a new
class AnalysisFromFunction to make it easier to calculate observables from a
simulation. Now code like this with a handwritten loop.
This class also takes arguments to adjust the iteration (start,stop,step)
and you can add verbosity with verbose=True . You will also profit from any
performance improvements in the analysis class in the future without changing
your code. If you have a specific observable that you want to calculate several
times you can also create a new analysis class with analysis_class like this.
Thanks for work from our NSF REU student @rbrtdlgd our RMSD calculations are about 40% faster now.
If you are using the low-level qcprot algorithm yourself instead of our provided
wrappers you have to change your code since the API has changed. For more see
the release notes.
MemoryReader: Reading trajectories from memory
MDAnalysis typically reads trajectories from files on-demand, so that it can efficiently deal with large trajectories - even those that do not fit in memory. However, in some cases, both for convenience and for efficiency, it can be an advantage to work with trajectories directly in memory. In this release, we have introduced a MemoryReader, which makes this possible. This Reader has been originally implemented in the encore package.
The MemoryReader works with numpy arrays, using the same format as that used by for instance DCDReader.timeseries(). You can create a Universe directly from such an array:
importnumpyasnpfromMDAnalysisimportUniversefromMDAnalysisTests.datafilesimportDCD,PSFfromMDAnalysis.coordinates.memoryimportMemoryReader# Create a Universe using a DCD reader
universe=Universe(PSF,DCD)# Create a numpy array with random coordinates (100 frames) for the same topology
coordinates=np.random.uniform(size=(100,universe.atoms.n_atoms,3)).cumsum(0)# Create a new Universe directly from these coordinates
universe2=Universe(PSF,coordinates,format=MemoryReader)
The MemoryReader will work just as any other reader. In particular, you can iterate over it as usual, or use the .timeseries() method to retrieve a reference to the raw array:
Certain operations can be speeded up by moving a trajectory to memory, and we have therefore
added functionality to directly transfer any existing trajectory to a MemoryReader using Universe.transfer_to_memory:
universe=Universe(PSF,DCD)# Switches to a MemoryReader representation
universe.transfer_to_memory()
You can also do this directly upon construction of a Universe, by using the in_memory flag:
universe=Universe(PSF,DCD,in_memory=True)
Likewise, the AlignTraj class in the analysis/align.py module also has an in_memory flag, allowing it to do in-place alignments.
Others
We also blogged since the start of the year about features of the upcoming release.
Research projects can often take months to years to complete, but the precise
version of software they use will often have a shorter lifetime than this. These
new versions of software will often include new features which might be of great
use, but they might also introduce changes which break your existing work and
introduce compatibility issues with other pieces of software. So whilst for
newer projects we might want to use the most up-to-date versions of software,
for existing projects we want to be able to freeze the versions of software that
are in use. This leads to us needing to install and manage multiple versions of
software across our various research projects.
With the upcoming release of 0.16.0 of MDAnalysis, alongside various
improvements, we are also introducing some changes which could break existing
code. In this post we will explain how conda environments
or Python virtual environments can be used to
manage this transition, allowing you to finish existing projects with version
0.15.0, while also enjoying the benefits provided in version 0.16.0.
Conda Environments
Conda is a general package manager for
scientific applications. It is mostly used for Python packages but the system
can be used with any program. The conda-forge
community also provides a large collection of scientific software for Python, R
, Perl and others.
In this guide we will concentrate only on creating and managing environments
with conda. For more general information on installing conda please refer
to the official documentation.
Software is made available through different conda channels, which each act as a
source for different software. When attempting to install packages into a conda
environment, these channels are searched. In this post we will be using the
conda-forge channel which you can add to your configuration like so:
conda config --add channels conda-forge
For each research project, it is advised that you create a new environment so that
the software used in each project does not interfere across different projects.
To create a new environment for your next project that uses MDAnalysis in version
0.15.0 run:
conda create -n myproject mdanalysis=0.15.0 -y
This has created a new software environment called myproject but has not affected
anything currently! To have access to all the software installed within it we
must first activate it
source activate myproject
To list your available environments
conda env list
A nice feature of using conda-environments is that they are easy to share with
colleagues or transferred to other computers. This allows all collaborators on a
project to use an identical set of software and makes your research projects
reproducible. To store the state of the environment we created in a file called
myproject-environment
conda list --explicit--md5-n myproject > myproject-environment
You can now copy this file to a colleague or onto another computer. The first 3
lines also contain instructions how this file can be used with conda.
# This file may be used to create an environment using:# $ conda create --name <env> --file <this file># platform: linux-64
Like conda, virtual environments managed with virtualenv allow you to use
different versions of python and python packages for your different project. On
the contrary to conda, however, virtualenv is not a general-purpose package
manager; it leverages what is available on your system, and let you install
python packages using pip.
To use virtual environments you have to install the virtualenv package first.
This can be done with either pip or the package manager of your system:
pip install virtualenv
# or on ubuntusudo apt install virtualenv
# or on fedorasudo dnf install python-virtualenv
Virtual environments can be created per project directory.
cd myproject
# if you want to use the default python interpretor
virtualenv myproject-env
# or if you want to explicitly use python 2.7
virtualenv -p /usr/bin/python2.7 myproject-env
# or if you want to use python 3.5
virtualenv -p /usr/bin/python3.5 myproject-env
# or maybe you want to use pypy
virtualenv -p /usr/bin/pypy myproject-env
This will create a new folder myproject-env. This folder contains the virtual
environment and all packages you have installed in it. To activate it in the
current terminal run:
source myproject-env/bin/activate
Now you can install packages via pip without affecting your global
environment. The packages you install when the environment is activated will be
available in terminal sessions that have the environment activated. You can
deactivate the virtual environment by running
deactivate
The virtualenvwrapper package
makes virtual environment easier to use. It provides some very useful features:
it organizes the virtual environment in a single directory, avoiding to have
them scattered throughout the file system;
it defines command to easy the creation, deletion, and copy of virtual
environments;
it defines a command to activate a virtual environment using its name;
all commands defined by virtualenvwrapper have tab-completion for virtual
environment names.
To use virtualenvwrapper you first need to install it outside of a virtual
environment:
pip install virtualenvwrapper
# or on ubuntusudo apt install virtualenvwrapper
# or on fedorasudo dnf install python-virtualenvwrapper
Then, you need to have it loaded in your terminal session. Add the following
lines in ~/.bashrc, they will be executed every time you open a new terminal
session:
# Decide where to store the virtual environmentsexport WORKON_HOME=~/Envs
# Make sure the directory existsmkdir-p${WORKON_HOME}# Load virtualenvwrappersource /usr/local/bin/virtualenvwrapper.sh
Open a new terminal or run source ~/.bashrc to update your session. You can
now create a virtual environment with
mkvirtualenv my-project
Like the virtualenv command we saw earlier, mkvirtualenv lets you choose
your python interpretor with the -p option. Regardless of your current
working directory, we created the virtual environment in ~/Envs/ and it is
now loaded in our terminal session.
You can load your virtual environments by running workon my-project, and
exit them by running deactivate.
Virtual environments, especially with virtualenvwrapper, can do much more.
The Hitchhikers Guide to Python has a good
tutorial that
gives a more in depth explanation of virtual environments. The
virtualenvwrapper documentation
is also a good resource to read.