# 6. Trajectory analysis¶

The Universe binds together the static topology (which atoms, how are they connected, what un-changing properties do the atoms possess (such as partial charge), …) and the changing coordinate information, which is stored in the trajectory.

The length of a trajectory (number of frames) is

>>> len(u.trajectory)
98


The standard way to assess each time step (or frame) in a trajectory is to iterate over the Universe.trajectory attribute (which is an instance of Reader class):

>>> for ts in u.trajectory:
...    print("Frame: {0:5d}, Time: {1:8.3f} ps".format(ts.frame, u.trajectory.time))
...    print("Rgyr: {0:g} A".format(u.atoms.radius_of_gyration()))

Frame:     0, Time:    0.000 ps
Rgyr: 16.669 A
Frame:     1, Time:    1.000 ps
Rgyr: 16.6732 A
Frame:     2, Time:    2.000 ps
...


The time attribute contains the current time step. The Reader only contains information about one time step: imagine a cursor or pointer moving along the trajectory file. Where the cursor points, there’s you current coordinates, frame number, and time.

Normally you will collect the data in a list or array, e.g.

Rgyr = []
protein = u.select_atoms("protein")
for ts in u.trajectory:
Rgyr = np.array(Rgyr)


The data can be plotted to give the graph below:

# quick plot
import matplotlib.pyplot as plt
ax = plt.subplot(111)
ax.plot(Rgyr[:,0], Rgyr[:,1], 'r--', lw=2, label=r"$R_G$")
ax.set_xlabel("time (ps)")
ax.set_ylabel(r"radius of gyration $R_G$ ($\AA$)")
ax.figure.savefig("Rgyr.pdf")
plt.draw()


The $$R_G(t)$$ increases over time, indicating an opening up of the AdK enzyme. Note

It is important to note that the coordinates and related properties calculated from the coordinates such as the radius of gyration change while selections on static properties (such as protein in the example) do not change when moving through a trajectory: You can define the selection once and then recalculate the property of interest for each frame of the trajectory.

However, if selections contain distance-dependent queries (such as around or point, see selection keywords for more details) then the selection should be updated for each time step using a dynamic selection (also known as an “updating selection”) by setting the keyword updating=True.

## 6.1. Exercises 4¶

1. Take the functions to calculate $$\theta_\text{NMP}$$ and $$\theta_\text{LID}$$

import numpy as np
from numpy.linalg import norm

def theta_NMP(u):
"""Calculate the NMP-CORE angle for E. coli AdK in degrees"""
C = u.select_atoms("resid 115-125 and (backbone or name CB)").center_of_geometry()
B = u.select_atoms("resid 90-100 and (backbone or name CB)").center_of_geometry()
A = u.select_atoms("resid 35-55 and (backbone or name CB)").center_of_geometry()
BA = A - B
BC = C - B
theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))

def theta_LID(u):
"""Calculate the LID-CORE angle for E. coli AdK in degrees"""
C = u.select_atoms("resid 179-185 and (backbone or name CB)").center_of_geometry()
B = u.select_atoms("resid 115-125 and (backbone or name CB)").center_of_geometry()
A = u.select_atoms("resid 125-153 and (backbone or name CB)").center_of_geometry()
BA = A - B
BC = C - B
theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))


and calculate the time series $$\theta_\text{NMP}(t)$$ and $$\theta_\text{LID}(t)$$.

Plot them together in one plot.

2. Plot $$\theta_\text{NMP}(t)$$ against $$\theta_\text{LID}(t)$$.

What does the plot show?

Why could such a plot be useful? The code to generate the figure contains theta_LID() and theta_NMP().

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 import numpy as np from numpy.linalg import norm def theta_NMP(u): """Calculate the NMP-CORE angle for E. coli AdK in degrees""" C = u.select_atoms("resid 115-125 and (backbone or name CB)").center_of_geometry() B = u.select_atoms("resid 90-100 and (backbone or name CB)").center_of_geometry() A = u.select_atoms("resid 35-55 and (backbone or name CB)").center_of_geometry() BA = A - B BC = C - B theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC))) return np.rad2deg(theta) def theta_LID(u): """Calculate the LID-CORE angle for E. coli AdK in degrees""" C = u.select_atoms("resid 179-185 and (backbone or name CB)").center_of_geometry() B = u.select_atoms("resid 115-125 and (backbone or name CB)").center_of_geometry() A = u.select_atoms("resid 125-153 and (backbone or name CB)").center_of_geometry() BA = A - B BC = C - B theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC))) return np.rad2deg(theta) if __name__ == "__main__": import MDAnalysis from MDAnalysis.tests.datafiles import PSF, DCD import matplotlib import matplotlib.pyplot as plt u = MDAnalysis.Universe(PSF, DCD) data = np.array([(u.trajectory.time, theta_NMP(u), theta_LID(u)) for ts in u.trajectory]) time, NMP, LID = data.T # plotting degreeFormatter = matplotlib.ticker.FormatStrFormatter(r"%g$^\circ$") fig = plt.figure(figsize=(6,3)) ax1 = fig.add_subplot(121) ax1.plot(time, NMP, 'b-', lw=2, label=r"$\theta_{\mathrm{NMP}}$") ax1.plot(time, LID, 'r-', lw=2, label=r"$\theta_{\mathrm{LID}}$") ax1.set_xlabel(r"time $t$ (ps)") ax1.set_ylabel(r"angle $\theta$") ax1.yaxis.set_major_formatter(degreeFormatter) ax1.legend(loc="best") ax2 = fig.add_subplot(122) ax2.plot(NMP, LID, 'k-', lw=3) ax2.set_xlabel(r"NMP-CORE angle $\theta_{\mathrm{NMP}}$") ax2.set_ylabel(r"LID-CORE angle $\theta_{\mathrm{LID}}$") ax2.xaxis.set_major_formatter(degreeFormatter) ax2.yaxis.set_major_formatter(degreeFormatter) ax2.yaxis.tick_right() ax2.yaxis.set_label_position("right") fig.subplots_adjust(left=0.12, right=0.88, bottom=0.2, wspace=0.15) for ext in ('svg', 'pdf', 'png'): fig.savefig("NMP_LID_angle_projection.{0}".format(ext)) 

Note that one would normally write the code more efficiently and generate the atom groups once and then pass them to a simple function to calculate the angle

def theta(A, B, C):
"""Calculate the angle between BA and BC for AtomGroups A, B, C"""
B_center = B.centroid()
BA = A.centroid() - B_center
BC = C.centroid() - B_center
theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))


## 6.2. Bells and whistles¶

Quick data acquisition

Especially useful for interactive analysis in ipython using list comprehensions (implicit for loops):

protein = u.select_atoms("protein")
data = np.array([(u.trajectory.time, protein.radius_of_gyration()) for ts in u.trajectory])
time, RG = data.T
plt.plot(time, RG)


More on the trajectory iterator

One can directly jump to a frame by using “indexing syntax”:

>>> u.trajectory
< Timestep 50 with unit cell dimensions array([  0.,   0.,   0.,  90.,  90.,  90.], dtype=float32) >
>>> ts.frame
50


Note

Frame indices are 0-based, i.e., the first frame in a trajectory is 0 and the last one is u.trajectory.n_frames - 1.

You can also slice trajectories, e.g. if you want to start at the 10th frame and go to 10th before the end, and only use every 5th frame:

for ts in u.trajectory[9:-10:5]:
print(ts.frame)
...


Note

Trajectory indexing and slicing uses 0-based indices (as in standard Python) and MDAnalysis also numbers frames starting with 0. Thus the “tenth frame” in a trjectory has ts.frame == 9.

One can iterate through multiple trajectories in parallel with the help of itertools.izip() from the itertools module, which also provide other interesting ways to work with trajectory iterators.