5. Working with AtomGroups

A AtomGroup has a large number of methods attributes defined that provide information about the atoms such as names, indices, or the coordinates in the positions attribute:

>>> CA = u.select_atoms("protein and name CA")
>>> r = CA.positions
>>> r.shape
(214, 3)

The resulting output is a numpy.ndarray. The main purpose of MDAnalysis is to get trajectory data into numpy arrays!

5.1. Important methods and attributes of AtomGroup

The coordinates positions attribute is probably the most important information that you can get from an AtomGroup.

Other quantities that can be easily calculated for a AtomGroup are

  • the center of mass center_of_mass() and the center of geoemtry (or centroid) center_of_geometry() (equivalent to centroid());

  • the total mass total_mass();

  • the total charge total_charge() (if partial charges are defined in the topology);

  • the radius of gyration

    \[R_\mathrm{gyr} = \sqrt{\frac{1}{M}\sum_{i=1}^{N} m_i(\mathbf{r}_i - \mathbf{R})^2}\]

    with radius_of_gyration();

  • the principal axes \(\mathbf{p}_1, \mathbf{p}_2, \mathbf{p}_1\) from principal_axes() 1 via a diagonalization of the tensor of inertia moment_of_inertia(),

    \[\Lambda = U^T I U, \quad \text{with}\ U=(\mathbf{p}_1, \mathbf{p}_2, \mathbf{p}_3)\]

    where \(U\) is a rotation matrix whose columns are the eigenvectors that form the principal axes, \(\Lambda\) is the diagonal matrix of eigenvalues (sorted from largest to smallest) known as the principal moments of inertia, and \(I = \sum_{i=1}^{N} m_i [(\mathbf{r}_i\cdot\mathbf{r}_i)\sum_{\alpha=1}^3 \mathbf{e}_\alpha \otimes \mathbf{e}_\alpha - \mathbf{r}_i \otimes \mathbf{r}_i]\) is the tensor of inertia.

5.2. Exercises 3

_images/angle_defs.png

AdK consists of three domains:

  • CORE residues 1-29, 60-121, 160-214 (gray)

  • NMP residues 30-59 (blue)

  • LID residues 122-159 (yellow)

  1. Calculate the center of mass and the center of geometry for each of the three domains.

    >>> domains = {
    >>>   'CORE': u.select_atoms("protein and (resid 1-29 60-121 160-214)"),
    >>>   'NMP': u.select_atoms("protein and resid 30-59"),
    >>>   'LID': u.select_atoms("protein and resid 122-159")
    >>>   }
    >>> cg = dict((name, dom.centroid()) for name,dom in domains.items())
    >>> cm = dict((name, dom.center_of_mass()) for name,dom in domains.items())
    >>> print(cg)
    {'LID': array([-15.16074944,   2.11599636,  -4.37305355], dtype=float32),
     'CORE': array([ 4.43884087,  2.05389476,  1.63895261], dtype=float32),
     'NMP': array([ -2.99990702, -13.62531662,  -2.93235731], dtype=float32)}
    >>> print(cm)
    {'LID': array([-15.11337499,   2.12292226,  -4.40910485]),
     'CORE': array([ 4.564116  ,  2.08700105,  1.54992649]),
     'NMP': array([ -3.20330174, -13.60247613,  -3.06221538])}
    
    • What are the distances between the centers of mass?

      (Hint: you can use numpy.linalg.norm() or calculate it manually.)

      >>> from numpy.linalg import norm
      >>> print(norm(cm['CORE'] - cm['NMP']))
      18.1042626244
      >>> print(norm(cm['CORE'] - cm['LID']))
      20.5600339602
      >>> print(norm(cm['NMP'] - cm['LID']))
      19.7725089609
      
    • Does it matter to use center of mass vs center of geometry?

      >>> print(norm(cg['CORE'] - cg['NMP']))
      17.9463
      >>> print(norm(cg['CORE'] - cg['LID']))
      20.501
      >>> print(norm(cg['NMP'] - cg['LID']))
      19.9437
      
_images/6b_angle_def_open.png

AdK undergoes a conformational transition during which the NMP and LID domain move relative to the CORE domain. The movement can be characterized by two angles, \(\theta_\text{NMP}\) and \(\theta_\text{LID}\), which are defined between the centers of geometry of the backbone and \(\text{C}_\beta\) atoms between groups of residues [Beckstein2009]:

definition of \(\theta_\text{NMP}\)

A: 115-125, B: 90-100, C: 35-55

definition of \(\theta_\text{LID}\)

A: 179-185, B: 115-125, C: 125-153

The angle between vectors \(\vec{BA}\) and \(\vec{BC}\) is

\[\theta = \arccos\left(\frac{\vec{BA}\cdot\vec{BC}}{|\vec{BA}||\vec{BC}|}\right)\]
  1. Write a function theta_NMP() that takes a Universe as an argument and computes \(\theta_\text{NMP}\):

    theta_NMP(u)

    Calculate the NMP-CORE angle for E. coli AdK in degrees from Universe u

    Use the following incomplete code as a starting point:

    import numpy as np
    from numpy.linalg import norm
    
    def theta_NMP(u):
       """Calculate the NMP-CORE angle for E. coli AdK in degrees"""
       A = u.select_atoms("resid 115-125 and (backbone or name CB)").center_of_geometry()
       B =
       C =
       BA = A - B
       BC =
       theta = np.arccos(
       return np.rad2deg(theta)
    

    Write the function to file adk.py and inside ipython run the file with %run adk.py to load the function while working on it.

    Test it on the AdK simulation (actually, the first frame):

    >>> theta_NMP(u)
    44.124821
    
  2. Add the corresponding function theta_LID() to adk.py.

    Test it:

    >>> theta_LID(u)
    107.00881
    

(See below for a solution.)

Calculation of the domain angles of AdK

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import numpy as np
from numpy.linalg import norm

def theta_NMP(u):
    """Calculate the NMP-CORE angle for E. coli AdK in degrees"""
    C = u.select_atoms("resid 115-125 and (backbone or name CB)").center_of_geometry()
    B = u.select_atoms("resid 90-100 and (backbone or name CB)").center_of_geometry()
    A = u.select_atoms("resid 35-55 and (backbone or name CB)").center_of_geometry()
    BA = A - B
    BC = C - B
    theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))
    return np.rad2deg(theta)

def theta_LID(u):
    """Calculate the LID-CORE angle for E. coli AdK in degrees"""
    C = u.select_atoms("resid 179-185 and (backbone or name CB)").center_of_geometry()
    B = u.select_atoms("resid 115-125 and (backbone or name CB)").center_of_geometry()
    A = u.select_atoms("resid 125-153 and (backbone or name CB)").center_of_geometry()
    BA = A - B
    BC = C - B
    theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))
    return np.rad2deg(theta)

5.3. Processing AtomGroups

You can directly write a AtomGroup to a file with the write() method:

CORE = u.select_atoms("resid 1-29 60-121 160-214")
CORE.write("AdK_CORE.pdb")

(The extension determines the file type. Writing a PDB file will result in a number of harmless warnings regarding quantities such as altLocs or occupancies that are needed for a PDB file but are not provided by the MD files 2; these warnings can be ignored.)

You can do fairly complicated things on the fly, such as writing the hydration shell around a protein to a file 2

from MDAnalysis.tests.datafiles import TPR, XTC
w = MDAnalysis.Universe(TPR, XTC)
w.select_atoms("byres (name OW and around 4.0 protein)").write("hydration_shell.pdb")

for further analysis or visualization.

You can also write Gromacs index files (in case you don’t like make_ndx…) with the write() method when selecting a format for an index file (see the supported index file formats):

CORE.write("CORE.ndx", name="CORE")

See also

The lists of supported formats:

Footnotes

1

The principal_axes() method returns an array [p1, p2, p3] where the principal axes p1, p2, p3 are arrays of length 3; this layout as row vectors was chosen for convenience (so that one can extract the vectors with p1, p2, p3 = ag.principal_axes()). To form a matrix U where the principal axes are the column vectors as in the usual treatment of the principal axes one has to transpose. For example:

import MDAnalysis
from MDAnalysis.tests.datafiles import PSF, DCD
import numpy as np

u = MDAnalysis.Universe(PSF, DCD)

CA = u.select_atoms("protein and name CA")

I = CA.moment_of_inertia()
UT = CA.principal_axes()

# transpose the row-vector layout UT = [p1, p2, p3]
U = UT.T

# test that U diagonalizes I
Lambda = U.T.dot(I.dot(U))
print(Lambda)

# check that it is diagonal (to machine precision)
print(np.allclose(Lambda - np.diag(np.diagonal(Lambda)), 0))

The matrix Lambda should be diagonal, i.e., the off-diagnonal elements should be close to machine precision, and hence the last print() should show True:

[[ 5.20816990e+05 -6.56706349e-10 -2.83491351e-12]
[-6.62283524e-10  4.74131234e+05 -2.06979926e-11]
[-6.56687024e-12 -2.07159142e-11  3.93536829e+05]]
True

Finally, if you want to calculate “by hand”:

values, evecs = np.linalg.eigh(I)
indices = np.argsort(values)
U = evecs[:, indices]
2(1,2)

PDB format files contain various data fields that are not necessarily used in a typical MD simulation such as altLocs, icodes, occupancies, or tempfactor. When you write a PDB file without providing values for these parameters, MDAnalysis has to set them to default values. When MDAnalysis does that, it warns you with output like

~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysis/coordinates/PDB.py:892: UserWarning: Found no information for attr: 'altLocs' Using default value of ' '
"".format(attrname, default))

~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysis/coordinates/PDB.py:892: UserWarning: Found no information for attr: 'icodes' Using default value of ' '
"".format(attrname, default))

~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysis/coordinates/PDB.py:892: UserWarning: Found no information for attr: 'occupancies' Using default value of '1.0'
"".format(attrname, default))

~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysis/coordinates/PDB.py:892: UserWarning: Found no information for attr: 'tempfactors' Using default value of '0.0'
"".format(attrname, default))

These warnings are for your information and in the context of this tutorial they are expected and do not indicate a problem.