# Calculating the Harmonic Ensemble Similarity between ensembles¶

Here we compare the conformational ensembles of proteins in four trajectories, using the harmonic ensemble similarity method.

Last updated: January 2020

Minimum version of MDAnalysis: 0.21.0

Packages required:

Optional packages for visualisation:

Note

The metrics and methods in the encore module are from ([TPB+15]). Please cite them when using the MDAnalysis.analysis.encore module in published work.

:

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import (PSF, DCD, DCD2, GRO, XTC,
PSF_NAMD_GBIS, DCD_NAMD_GBIS,
PDB_small, CRD)
from MDAnalysis.analysis import encore

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

WARNING:root:sklearn.cluster could not be imported: some functionality will not be available in encore.fit_clusters()


The test files we will be working with here feature adenylate kinase (AdK), a phosophotransferase enzyme. (#beckstein_zipping_2009)

:

u1 = mda.Universe(PSF, DCD)
u2 = mda.Universe(PSF, DCD2)
u3 = mda.Universe(GRO, XTC)
u4 = mda.Universe(PSF_NAMD_GBIS, DCD_NAMD_GBIS)

labels = ['DCD', 'DCD2', 'XTC', 'NAMD']


The trajectories can have different lengths, as seen below.

:

print(len(u1.trajectory), len(u2.trajectory), len(u3.trajectory))

98 102 10


## Calculating harmonic similarity¶

The harmonic ensemble similarity method treats the conformational ensemble within each trajectory as a high-dimensional Gaussian distribution $$N(\mu, \Sigma)$$. The mean $$\mu$$ is estimated as the average over the ensemble. The covariance matrix $$\Sigma$$ is calculated either using a shrinkage estimator (cov_estimator='shrinkage') or a maximum-likelihood method (cov_estimator='ml').

The harmonic ensemble similarity is then calculated using the symmetrised version of the Kullback-Leibler divergence. This has no upper bound, so you can get some very high values for very different ensembles.

It is recommended that you align your trajectories before computing the harmonic similarity. You can either do this yourself with align.AlignTraj, or pass align=True into encore.hes. The latter option will align each of your Universes to the current timestep of the first Universe. Note that since encore.hes will pull your trajectories into memory, this changes the positions of your Universes.

:

hes, details = encore.hes([u1, u2, u3, u4],
selection='backbone',
align=True,
cov_estimator='shrinkage',
weights='mass')

:

hes

:

array([[      0.        ,   24955.71870601, 1879874.4652541 ,
145622.25409916],
[  24955.71870601,       0.        , 1659867.54594567,
161102.33620648],
[1879874.4652541 , 1659867.54594567,       0.        ,
9900092.71845526],
[ 145622.25409916,  161102.33620648, 9900092.71845526,
0.        ]])


The mean and covariance matrices for each Universe are saved in details.

### Plotting¶

:

fig, ax = plt.subplots()
im = plt.imshow(hes)
plt.xticks(np.arange(4), labels)
plt.yticks(np.arange(4), labels)
plt.title('Harmonic ensemble similarity')
cbar = fig.colorbar(im) 