4. Basics¶
We discuss the fundamental objects in MDAnalysis, the
Universe and AtomGroup, and the facilities for
Selections of atoms. These selections themselves are again an
AtomGroup
.
4.1. Universe and AtomGroup¶
MDAnalysis is object oriented. Molecular systems consist of
Atom
objects (instances of the
class MDAnalysis.core.groups.Atom
), which are grouped in
AtomGroup
instances. You build the
AtomGroup
of your system by
loading a topology (list of atoms and possibly their connectivity)
together with a trajectory (coordinate information) into the
central data structure, the
Universe
object:
>>> u = MDAnalysis.Universe(PSF, DCD)
>>> print(u)
<Universe with 3341 atoms>
The atoms are stored in the attribute
atoms
of the
MDAnalysis.core.universe.Universe
:
>>> print(u.atoms)
<AtomGroup with 3341 atoms>
>>> list(u.atoms[:5])
[< Atom 1: name 'N' of type '56' of resname 'MET', resid 1 and segid '4AKE'>,
< Atom 2: name 'HT1' of type '2' of resname 'MET', resid 1 and segid '4AKE'>,
< Atom 3: name 'HT2' of type '2' of resname 'MET', resid 1 and segid '4AKE'>,
< Atom 4: name 'HT3' of type '2' of resname 'MET', resid 1 and segid '4AKE'>,
< Atom 5: name 'CA' of type '22' of resname 'MET', resid 1 and segid '4AKE'>]
Any AtomGroup
knows the residues
that the atoms belong to via the attribute
residues
, which produces a
ResidueGroup
. A
ResidueGroup
acts like a list of
Residue
objects:
>>> u.atoms[100:130].residues
<ResidueGroup with 3 residues>
>>> list(u.atoms[100:130].residues)
[<Residue LEU, 6>, <Residue GLY, 7>, <Residue ALA, 8>]
Larger organizational units are
Segment
instances, for example one
protein or all the solvent molecules or simply the whole
system. Atom
,
AtomGroup
,
Residue
, and
ResidueGroup
have an
attribute segments
that
will list the segment IDs (“segids”) as a
SegmentGroup
:
>>> u.atoms.segments
<SegmentGroup with 1 segment>
>>> list(u.atoms.segments)
[<Segment 4AKE>]
The converse is also true: each “higher” level in the hierarchy also
know about the Residue
and
Atom
instances it contains. For
example, to list the atoms of the
ResidueGroup
we had before:
>>> r = u.atoms[100:130].residues
>>> r.atoms
<AtomGroup with 36 atoms>
4.1.1. Exercise 1¶
What residue (“resname”) does the last atom belong to in the above example?
>>> r = u.atoms[100:130].residues >>> r.atoms[-1] <Atom 136: O of type 70 of resname ALA, resid 8 and segid 4AKE>
Why does the expression
len(u.atoms[100:130]) == len(u.atoms[100:130].residues.atoms)
return
False
?Because the complete residues contain more atoms than the arbitrary slice of atoms.
How many residues are in the
Universe
u
?>>> len(u.atoms.residues) 214 >>> u.atoms.n_residues 214
How do you get a list of the residue names (such as
["Ala", "Gly", "Gly", "Asp", ...]
) and residue numbers (“resid”) for atoms 1000 to 1300? And as a list of tuples(resname, resid)
(Hint:zip()
)?:>>> resnames = u.atoms[999:1300].residues.resnames >>> resids = u.atoms[999:1300].residues.resids >>> list(zip(resnames, resids))
How do you obtain the resid and the resname for the 100th residue? (Hint: investigate the
Residue
object interactively with TAB completion)>>> r100 = u.atoms.residues[99] >>> print(r100.resid, r100.resname) 100 GLY
How many segments are there?
>>> len(u.segments) 1 >>> len(u.atoms.segments) 1 >>> u.atoms.n_segments 1
What is the segment identifier of the first
Segment
?>>> s1 = u.segments[0] >>> s1.segid '4AKE'
See also
Methods of AtomGroup
,
ResidueGroup
, and
SegmentGroup
n_residues
andn_atoms
resids
resnames
4.2. Selections¶
MDAnalysis comes with a fairly complete atom selection
facility. Primarily, one uses the method
select_atoms()
of a
Universe
:
>>> CA = u.select_atoms("protein and name CA")
>>> CA
>>> <AtomGroup with 214 atoms>
but really any AtomGroup
has a
select_atoms()
method:
>>> acidic = CA.select_atoms("resname ASP or resname GLU")
>>> acidic
<AtomGroup with 35 atoms>
>>> list(acidic.residues)
[<Residue GLU, 22>,
<Residue ASP, 33>,
<Residue GLU, 44>,
...
<Residue GLU, 210>]
See also
All the selection keywords are described in the documentation.
Numerical ranges can be written as first-last
(or equivalently
first:last
1), where the range is inclusive. For example, get
residues with residue IDs 5 to 100:
>>> u.select_atoms("resid 5-100")
<AtomGroup with 1439 atoms>
>>> u.select_atoms("resid 5-100").n_residues
96
Selections can be combined with boolean expressions. For example, to select the \(\text{C}_\alpha\) atoms of all acidic residues [aspartic acid (“ASP”), glutamic acid (“GLU”), and histidines (named “HIS”, “HSD”, or “HSE”, depending on what force field is being used and what protonation state it is in)]:
>>> u.select_atoms("(resname ASP or resname GLU or resname HS*) and name CA")
<AtomGroup with 38 atoms>
We group with or
separate selections by residue name (keyword
resname
). First either ASP, GLU, or any histidines are selected
(we use “stemming” HS*
to match any residue name that starts with
“HS”). Then only those atoms whose name is “CA” are taken from the
first set by an and
selection. For convenience, the or
in the
first part of the selection can be taken implicitly with the shortcut
syntax
>>> u.select_atoms("resname ASP GLU HS* and name CA")
<AtomGroup with 38 atoms>
The implicit or syntax also works well for range selections such as
resid 1-5 20 45-99 101-199
.
It is also possible to select by geometric criteria, e.g. with the
around distance selection
keyword:
>>> u.select_atoms("((resname ASP or resname GLU) and not (backbone or name CB or name CG)) \
... and around 4.0 ((resname LYS or resname ARG) \
... and not (backbone or name CB or name CG))").residues
<ResidueGroup with 30 residues>
This selection will find atoms potentially involved in salt bridges between acidic and basic residues.
4.2.1. Exercises 2¶
Select the range of resids 100 to 200 (“100-200”) with a selection. Compare the result to what you get by slicing the
u.atoms.residues
appropriately.Which approach would you prefer to use in a analysis script?
Solution:
>>> u.select_atoms("resid 100-200") <AtomGroup with 1609 atoms>
Compare to the slicing solution (doing an element-wise comparison, i.e. residue by residue in each
list()
):>>> list(u.select_atoms("resid 100-200").residues) == list(u.atoms.residues[99:200]) True
If one wants to get specific residues in scripts one typically uses selections instead of slicing because the index in the slice might not correspond to the actual residue ids (minus 1): If a number of residues (e.g. 150-160) are missing from the structure then the selection will simply give you residues 100-149 and 161-200 but the slice 99:200 would give you residues 100-149 and 161-209.
Select all residues that do not contain a \(\mathrm{C}_\beta\) (“CB”) atom. How many are there? What residue names did you find?
Solution:
>>> sel = u.select_atoms("(byres name CA) and not (byres name CB)").residues >>> len(sel) 20
These are all Glycines, as can be seen by comparing the residue groups element-wise:
>>> glycines = u.select_atoms("resname GLY") >>> list(sel) == list(glycines.residues) True
Footnotes
- 1
For index ranges in atom selections,
first-last
andfirst:last
are completely equivalent. In this tutorial we prefer the formfirst-last
to reduce confusion with Python slicinggroup[first:last]
because in the atom selection syntax,last
is included in the selection whereas in Python slicing it is excluded.