# 5. Working with AtomGroups¶

A `AtomGroup`

has a large number of
methods attributes defined that provide information about the atoms
such as names, indices, or the coordinates in the
`positions`

attribute:

```
>>> CA = u.select_atoms("protein and name CA")
>>> r = CA.positions
>>> r.shape
(214, 3)
```

The resulting output is a `numpy.ndarray`

. The main purpose of
MDAnalysis is to get trajectory data into numpy arrays!

## 5.1. Important methods and attributes of AtomGroup¶

The coordinates `positions`

attribute is probably the most important information that you can get
from an `AtomGroup`

.

Other quantities that can be easily calculated for a
`AtomGroup`

are

the center of mass

`center_of_mass()`

and the center of geoemtry (or centroid)`center_of_geometry()`

(equivalent to`centroid()`

);the total mass

`total_mass()`

;the total charge

`total_charge()`

(if partial charges are defined in the topology);the radius of gyration

\[R_\mathrm{gyr} = \sqrt{\frac{1}{M}\sum_{i=1}^{N} m_i(\mathbf{r}_i - \mathbf{R})^2}\]with

`radius_of_gyration()`

;the principal axes \(\mathbf{p}_1, \mathbf{p}_2, \mathbf{p}_1\) from

`principal_axes()`

1 via a diagonalization of the tensor of inertia`moment_of_inertia()`

,\[\Lambda = U^T I U, \quad \text{with}\ U=(\mathbf{p}_1, \mathbf{p}_2, \mathbf{p}_3)\]where \(U\) is a rotation matrix whose columns are the eigenvectors that form the principal axes, \(\Lambda\) is the diagonal matrix of eigenvalues (sorted from largest to smallest) known as the principal moments of inertia, and \(I = \sum_{i=1}^{N} m_i [(\mathbf{r}_i\cdot\mathbf{r}_i)\sum_{\alpha=1}^3 \mathbf{e}_\alpha \otimes \mathbf{e}_\alpha - \mathbf{r}_i \otimes \mathbf{r}_i]\) is the tensor of inertia.

## 5.2. Exercises 3¶

AdK consists of three domains:

*CORE*residues 1-29, 60-121, 160-214 (gray)*NMP*residues 30-59 (blue)*LID*residues 122-159 (yellow)

Calculate the center of mass and the center of geometry for each of the three domains.

>>> domains = { >>> 'CORE': u.select_atoms("protein and (resid 1-29 60-121 160-214)"), >>> 'NMP': u.select_atoms("protein and resid 30-59"), >>> 'LID': u.select_atoms("protein and resid 122-159") >>> } >>> cg = dict((name, dom.centroid()) for name,dom in domains.items()) >>> cm = dict((name, dom.center_of_mass()) for name,dom in domains.items()) >>> print(cg) {'LID': array([-15.16074944, 2.11599636, -4.37305355], dtype=float32), 'CORE': array([ 4.43884087, 2.05389476, 1.63895261], dtype=float32), 'NMP': array([ -2.99990702, -13.62531662, -2.93235731], dtype=float32)} >>> print(cm) {'LID': array([-15.11337499, 2.12292226, -4.40910485]), 'CORE': array([ 4.564116 , 2.08700105, 1.54992649]), 'NMP': array([ -3.20330174, -13.60247613, -3.06221538])}

What are the distances between the centers of mass?

(Hint: you can use

`numpy.linalg.norm()`

or calculate it manually.)>>> from numpy.linalg import norm >>> print(norm(cm['CORE'] - cm['NMP'])) 18.1042626244 >>> print(norm(cm['CORE'] - cm['LID'])) 20.5600339602 >>> print(norm(cm['NMP'] - cm['LID'])) 19.7725089609

Does it matter to use center of mass vs center of geometry?

>>> print(norm(cg['CORE'] - cg['NMP'])) 17.9463 >>> print(norm(cg['CORE'] - cg['LID'])) 20.501 >>> print(norm(cg['NMP'] - cg['LID'])) 19.9437

AdK undergoes a conformational transition during which the NMP and LID domain
move relative to the CORE domain. The movement can be characterized by two
angles, \(\theta_\text{NMP}\) and \(\theta_\text{LID}\), which are
defined between the *centers of geometry* of the *backbone and*
\(\text{C}_\beta\) atoms between groups of residues [Beckstein2009]:

- definition of \(\theta_\text{NMP}\)
A: 115-125, B: 90-100, C: 35-55

- definition of \(\theta_\text{LID}\)
A: 179-185, B: 115-125, C: 125-153

The angle between vectors \(\vec{BA}\) and \(\vec{BC}\) is

Write a function

`theta_NMP()`

that takes a`Universe`

as an argument and computes \(\theta_\text{NMP}\):Use the following

*incomplete*code as a starting point:import numpy as np from numpy.linalg import norm def theta_NMP(u): """Calculate the NMP-CORE angle for E. coli AdK in degrees""" A = u.select_atoms("resid 115-125 and (backbone or name CB)").center_of_geometry() B = C = BA = A - B BC = theta = np.arccos( return np.rad2deg(theta)

Write the function to file

`adk.py`

and inside**ipython**run the file with`%run adk.py`

to load the function while working on it.Test it on the AdK simulation (actually, the first frame):

>>> theta_NMP(u) 44.124821

Add the corresponding function

`theta_LID()`

to`adk.py`

.Test it:

>>> theta_LID(u) 107.00881

(See below for a solution.)

Calculation of the domain angles of AdK

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ```
import numpy as np
from numpy.linalg import norm
def theta_NMP(u):
"""Calculate the NMP-CORE angle for E. coli AdK in degrees"""
C = u.select_atoms("resid 115-125 and (backbone or name CB)").center_of_geometry()
B = u.select_atoms("resid 90-100 and (backbone or name CB)").center_of_geometry()
A = u.select_atoms("resid 35-55 and (backbone or name CB)").center_of_geometry()
BA = A - B
BC = C - B
theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))
return np.rad2deg(theta)
def theta_LID(u):
"""Calculate the LID-CORE angle for E. coli AdK in degrees"""
C = u.select_atoms("resid 179-185 and (backbone or name CB)").center_of_geometry()
B = u.select_atoms("resid 115-125 and (backbone or name CB)").center_of_geometry()
A = u.select_atoms("resid 125-153 and (backbone or name CB)").center_of_geometry()
BA = A - B
BC = C - B
theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))
return np.rad2deg(theta)
``` |

## 5.3. Processing AtomGroups¶

You can directly write a `AtomGroup`

to a file with the `write()`

method:

```
CORE = u.select_atoms("resid 1-29 60-121 160-214")
CORE.write("AdK_CORE.pdb")
```

(The extension determines the file type. Writing a PDB file will result in a
number of harmless warnings regarding quantities such as *altLocs* or
*occupancies* that are needed for a PDB file but are not provided by the MD
files 2; these warnings can be ignored.)

You can do fairly complicated things on the fly, such as writing the hydration shell around a protein to a file 2

```
from MDAnalysis.tests.datafiles import TPR, XTC
w = MDAnalysis.Universe(TPR, XTC)
w.select_atoms("byres (name OW and around 4.0 protein)").write("hydration_shell.pdb")
```

for further analysis or visualization.

You can also write Gromacs index files (in case you don’t like
**make_ndx**…) with the
`write()`

method when selecting
a format for an index file (see the supported index file formats):

```
CORE.write("CORE.ndx", name="CORE")
```

Footnotes

- 1
The

`principal_axes()`

method returns an array`[p1, p2, p3]`

where the principal axes`p1`

,`p2`

,`p3`

are arrays of length 3; this layout as row vectors was chosen for convenience (so that one can extract the vectors with`p1, p2, p3 = ag.principal_axes()`

). To form a matrix`U`

where the principal axes are the column vectors as in the usual treatment of the principal axes one has to transpose. For example:import MDAnalysis from MDAnalysis.tests.datafiles import PSF, DCD import numpy as np u = MDAnalysis.Universe(PSF, DCD) CA = u.select_atoms("protein and name CA") I = CA.moment_of_inertia() UT = CA.principal_axes() # transpose the row-vector layout UT = [p1, p2, p3] U = UT.T # test that U diagonalizes I Lambda = U.T.dot(I.dot(U)) print(Lambda) # check that it is diagonal (to machine precision) print(np.allclose(Lambda - np.diag(np.diagonal(Lambda)), 0))

The matrix

`Lambda`

should be diagonal, i.e., the off-diagnonal elements should be close to machine precision, and hence the last`print()`

should show`True`

:[[ 5.20816990e+05 -6.56706349e-10 -2.83491351e-12] [-6.62283524e-10 4.74131234e+05 -2.06979926e-11] [-6.56687024e-12 -2.07159142e-11 3.93536829e+05]] True

Finally, if you want to calculate “by hand”:

values, evecs = np.linalg.eigh(I) indices = np.argsort(values) U = evecs[:, indices]

- 2(1,2)
PDB format files contain various data fields that are not necessarily used in a typical MD simulation such as

*altLocs*,*icodes*,*occupancies*, or*tempfactor*. When you write a PDB file without providing values for these parameters, MDAnalysis has to set them to default values. When MDAnalysis does that, it warns you with output like~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysis/coordinates/PDB.py:892: UserWarning: Found no information for attr: 'altLocs' Using default value of ' ' "".format(attrname, default)) ~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysis/coordinates/PDB.py:892: UserWarning: Found no information for attr: 'icodes' Using default value of ' ' "".format(attrname, default)) ~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysis/coordinates/PDB.py:892: UserWarning: Found no information for attr: 'occupancies' Using default value of '1.0' "".format(attrname, default)) ~/anaconda3/envs/mda3/lib/python3.6/site-packages/MDAnalysis/coordinates/PDB.py:892: UserWarning: Found no information for attr: 'tempfactors' Using default value of '0.0' "".format(attrname, default))

These warnings are for your information and in the context of this tutorial they are expected and do not indicate a problem.