6.12. PDB structure files in MDAnalysis — MDAnalysis.coordinates.PDB

MDAnalysis reads coordinates from PDB files and additional optional data such as B-factors. It is also possible to substitute a PDB file instead of PSF file in order to define the list of atoms (but no connectivity information will be available in this case).

PDB files contain both coordinate and atom information. It is also possible to write trajectories as multi-frame (or multi-model) PDB files. This is not very space efficient but is sometimes the lowest common denominator for exchanging trajectories. Single frame and multi-frame PDB files are automatically recognized; the only difference is thath the single-frame PDB is represented as a trajectory with only one frame.

In order to write a selection to a PDB file one typically uses the MDAnalysis.core.groups.AtomGroup.write() method of the selection:

calphas = universe.select_atoms("name CA")
calphas.write("calpha_only.pdb")

This uses the coordinates from the current timestep of the trajectory.

In order to write a PDB trajectory one needs to first obtain a multi-frame writer (keyword multiframe = True) and then iterate through the trajectory, while writing each frame:

calphas = universe.select_atoms("name CA")
with MDAnalysis.Writer("calpha_traj.pdb", multiframe=True) as W:
    for ts in u.trajectory:
        W.write(calphas)

It is important to always close the trajectory when done because only at this step is the final END record written, which is required by the PDB 3.2 standard. Using the writer as a context manager ensures that this always happens.

6.12.1. Capabilities

A pure-Python implementation for PDB files commonly encountered in MD simulations comes under the names PDBReader and PDBWriter. It only implements a subset of the PDB 3.2 standard and also allows some typical enhancements such as 4-letter resids (introduced by CHARMM/NAMD).

The PDBReader can read multi-frame PDB files and represents them as a trajectory. The PDBWriter can write single and multi-frame PDB files as specified by the multiframe keyword. By default, it writes single frames. On the other hand, the MultiPDBWriter is set up to write a PDB trajectory by default (equivalent to using multiframe = True).

6.12.2. Examples for working with PDB files

A single frame PDB can be written with the write() method of any AtomGroup:

protein = u.select_atoms("protein")
protein.write("protein.pdb")

Alternatively, get the single frame writer and supply the AtomGroup:

protein = u.select_atoms("protein")
with MDAnalysis.Writer("protein.pdb") as pdb:
    pdb.write(protein)

In order to write a multi-frame PDB trajectory from a universe u one can do the following:

with MDAnalysis.Writer("all.pdb", multiframe=True) as pdb:
    for ts in u.trajectory:
        pdb.write(u)

Similarly, writing only a protein:

protein = u.select_atoms("protein")
with MDAnalysis.Writer("protein.pdb", multiframe=True) as pdb:
    for ts in u.trajectory:
        pdb.write(protein)

6.12.3. Classes

Changed in version 0.16.0: PDB readers and writers based on Bio.PDB.PDBParser were retired and removed.

class MDAnalysis.coordinates.PDB.PDBReader(filename, **kwargs)[source]

PDBReader that reads a PDB-formatted file, no frills.

The following PDB records are parsed (see PDB coordinate section for details):

  • CRYST1 for unitcell A,B,C, alpha,beta,gamma
  • ATOM or HETATM for serial,name,resName,chainID,resSeq,x,y,z,occupancy,tempFactor
  • HEADER (header), TITLE (title), COMPND (compound), REMARK (remarks)
  • all other lines are ignored

Reads multi-MODEL PDB files as trajectories.

COLUMNS DATA TYPE FIELD DEFINITION
1 - 6 Record name “CRYST1”  
7 - 15 Real(9.3) a a (Angstroms).
16 - 24 Real(9.3) b b (Angstroms).
25 - 33 Real(9.3) c c (Angstroms).
34 - 40 Real(7.2) alpha alpha (degrees).
41 - 47 Real(7.2) beta beta (degrees).
48 - 54 Real(7.2) gamma gamma (degrees).
1 - 6 Record name “ATOM “  
7 - 11 Integer serial Atom serial number.
13 - 16 Atom name Atom name.
17 Character altLoc Alternate location indicator.
18 - 21 Residue name resName Residue name.
22 Character chainID Chain identifier.
23 - 26 Integer resSeq Residue sequence number.
27 AChar iCode Code for insertion of residues.
31 - 38 Real(8.3) x Orthogonal coordinates for X in Angstroms.
39 - 46 Real(8.3) y Orthogonal coordinates for Y in Angstroms.
47 - 54 Real(8.3) z Orthogonal coordinates for Z in Angstroms.
55 - 60 Real(6.2) occupancy Occupancy.
61 - 66 Real(6.2) tempFactor Temperature factor.
67 - 76 String segID (unofficial CHARMM extension ?)
77 - 78 LString(2) element Element symbol, right-justified.
79 - 80 LString(2) charge Charge on the atom.

See also

PDBWriter

PDBReader
implements a larger subset of the header records, which are accessible as PDBReader.metadata.

Changed in version 0.11.0: * Frames now 0-based instead of 1-based * New title (list with all TITLE lines).

Read coordinates from filename.

filename can be a gzipped or bzip2ed compressed PDB file.

If the pdb file contains multiple MODEL records then it is read as a trajectory where the MODEL numbers correspond to frame numbers.

Writer(filename, **kwargs)[source]

Returns a PDBWriter for filename.

Parameters:filename (str) – filename of the output PDB file
Returns:
Return type:PDBWriter
class MDAnalysis.coordinates.PDB.PDBWriter(filename, bonds=’conect’, n_atoms=None, start=0, step=1, remarks=’Created by PDBWriter’, convert_units=None, multiframe=None)[source]

PDB writer that implements a subset of the PDB 3.2 standard .

PDB format as used by NAMD/CHARMM: 4-letter resnames and segID are allowed, altLoc is written.

The PDBWriter can be used to either dump a coordinate set to a PDB file (operating as a “single frame writer”, selected with the constructor keyword multiframe = False, the default) or by writing a PDB “movie” (multi frame mode, multiframe = True), consisting of multiple models (using the MODEL and ENDMDL records).

Note

This class is identical to MultiPDBWriter with the one exception that it defaults to writing single-frame PDB files as if multiframe = False was selected.

Changed in version 0.7.5: Initial support for multi-frame PDB files.

Changed in version 0.7.6: The multiframe keyword was added to select the writing mode. The writing of bond information (CONECT records) is now disabled by default but can be enabled with the bonds keyword.

Changed in version 0.11.0: Frames now 0-based instead of 1-based

Changed in version 0.14.0: PDB doesn’t save charge information

Create a new PDBWriter

Parameters:
  • filename (str) – name of output file
  • start (int (optional)) – starting timestep
  • step (int (optional)) – skip between subsequent timesteps
  • remarks (str (optional)) – comments to annotate pdb file (added to the TITLE record); note that any remarks from the trajectory that serves as input are written to REMARK records with lines longer than remark_max_length (66 characters) being wrapped.
  • convert_units (str (optional)) – units are converted to the MDAnalysis base format; None selects the value of MDAnalysis.core.flags [‘convert_lengths’]
  • bonds ({"conect", "all", None} (optional)) – If set to “conect”, then only write those bonds that were already defined in an input PDB file as PDB CONECT record. If set to “all”, write all bonds (including guessed ones) to the file. None does not write any bonds. The default is “conect”.
  • multiframe (bool (optional)) – False: write a single frame to the file; True: create a multi frame PDB file in which frames are written as MODELENDMDL records. If None, then the class default is chosen. [None]

Note

Writing bonds currently only works when writing a whole Universe and if bond information is available in the topology. (For selections smaller than the whole Universe, the atom numbering in the CONECT records would not match the numbering of the atoms in the new PDB file and therefore a NotImplementedError is raised.)

_check_pdb_coordinates()[source]

Check if the coordinate values fall within the range allowed for PDB files.

Deletes the output file if this is the first frame or if frames have already been written (in multi-frame mode) adds a REMARK instead of the coordinates and closes the file.

Raises ValueError if the coordinates fail the check.

_write_pdb_bonds()[source]

Writes out all the bond records

_update_frame(obj)[source]

Method to initialize important attributes in writer from a AtomGroup or Universe obj.

Attributes initialized/updated:

  • PDBWriter.obj (the entity that provides topology information and coordinates, either a AtomGroup or a whole Universe)
  • PDBWriter.trajectory (the underlying trajectory Reader)
  • PDBWriter.timestep (the underlying trajectory Timestep)

Before calling write_next_timestep() this method must be called at least once to enable extracting topology information from the current frame.

_write_timestep(ts, multiframe=False)[source]

Write a new timestep ts to file

Does unit conversion if necessary.

By setting multiframe = True, MODELENDMDL records are written to represent trajectory frames in a multi-model PDB file. (At the moment we do not write the NUMMDL record.)

The multiframe = False keyword signals that the PDBWriter is in single frame mode and no MODEL records are written.

Changed in version 0.7.6: The multiframe keyword was added, which completely determines if MODEL records are written. (Previously, this was decided based on the underlying trajectory and only if len(traj) > 1 would MODEL records have been written.)

CONECT(conect)[source]

Write CONECT record.

CRYST1(dimensions, spacegroup=’P 1’, zvalue=1)[source]

Write CRYST1 record.

END()[source]

Write END record.

Only a single END record is written. Calling END multiple times has no effect. Because close() also calls this method right before closing the file it is recommended to not call END() explicitly.

ENDMDL()[source]

Write the ENDMDL record.

HEADER(trajectory)[source]

Write HEADER record.

MODEL(modelnumber)[source]

Write the MODEL record.

REMARK(*remarks)[source]

Write generic REMARK record (without number).

Each string provided in remarks is written as a separate REMARK record.

See also REMARK (update).

TITLE(*title)[source]

Write TITLE record.

close()[source]

Close PDB file and write END record

remark_max_length = 66

wrap comments into REMARK records that are not longer than

write(obj)[source]

Write object obj at current trajectory frame to file.

obj can be a selection (i.e. a AtomGroup) or a whole Universe.

The last letter of the segid is used as the PDB chainID (but see ATOM() for details).

Parameters:obj – The AtomGroup or Universe to write.
write_all_timesteps(obj)[source]

Write all timesteps associated with obj to the PDB file.

obj can be a AtomGroup or a Universe.

The method writes the frames from the one specified as start until the end, using a step of skip (start and skip are set in the constructor). Thus, if u is a Universe then

u.trajectory[-2]
pdb = PDBWriter("out.pdb", u.atoms.n_atoms)
pdb.write_all_timesteps(u)

will write a PDB trajectory containing the last 2 frames and

pdb = PDBWriter("out.pdb", u.atoms.n_atoms, start=12, skip=2)
pdb.write_all_timesteps(u)

will be writing frames 12, 14, 16, …

Changed in version 0.11.0: Frames now 0-based instead of 1-based

write_next_timestep(ts=None, **kwargs)[source]

write a new timestep to the PDB file

Keywords:
ts

base.Timestep object containing coordinates to be written to trajectory file; if None then PDBWriter.ts` is tried.

multiframe

False: write a single frame (default); True behave as a trajectory writer

Note

Before using this method with another base.Timestep in the ts argument, PDBWriter._update_frame() must be called with the Universe as its argument so that topology information can be gathered.

class MDAnalysis.coordinates.PDB.MultiPDBWriter(filename, bonds=’conect’, n_atoms=None, start=0, step=1, remarks=’Created by PDBWriter’, convert_units=None, multiframe=None)[source]

PDB writer that implements a subset of the PDB 3.2 standard .

PDB format as used by NAMD/CHARMM: 4-letter resnames and segID, altLoc is written.

By default, MultiPDBWriter writes a PDB “movie” (multi frame mode, multiframe = True), consisting of multiple models (using the MODEL and ENDMDL records).

See also

This, exception, single

New in version 0.7.6.

Create a new PDBWriter

Parameters:
  • filename (str) – name of output file
  • start (int (optional)) – starting timestep
  • step (int (optional)) – skip between subsequent timesteps
  • remarks (str (optional)) – comments to annotate pdb file (added to the TITLE record); note that any remarks from the trajectory that serves as input are written to REMARK records with lines longer than remark_max_length (66 characters) being wrapped.
  • convert_units (str (optional)) – units are converted to the MDAnalysis base format; None selects the value of MDAnalysis.core.flags [‘convert_lengths’]
  • bonds ({"conect", "all", None} (optional)) – If set to “conect”, then only write those bonds that were already defined in an input PDB file as PDB CONECT record. If set to “all”, write all bonds (including guessed ones) to the file. None does not write any bonds. The default is “conect”.
  • multiframe (bool (optional)) – False: write a single frame to the file; True: create a multi frame PDB file in which frames are written as MODELENDMDL records. If None, then the class default is chosen. [None]

Note

Writing bonds currently only works when writing a whole Universe and if bond information is available in the topology. (For selections smaller than the whole Universe, the atom numbering in the CONECT records would not match the numbering of the atoms in the new PDB file and therefore a NotImplementedError is raised.)

class MDAnalysis.coordinates.PDB.ExtendedPDBReader(filename, **kwargs)[source]

PDBReader that reads a PDB-formatted file with five-digit residue numbers.

This reader does not conform to the PDB 3.2 standard because it allows five-digit residue numbers that may take up columns 23 to 27 (inclusive) instead of being confined to 23-26 (with column 27 being reserved for the insertion code in the PDB standard). PDB files in this format are written by popular programs such as VMD.

See also

PDBReader

New in version 0.8.

Read coordinates from filename.

filename can be a gzipped or bzip2ed compressed PDB file.

If the pdb file contains multiple MODEL records then it is read as a trajectory where the MODEL numbers correspond to frame numbers.

OtherWriter(filename, **kwargs)

Returns a writer appropriate for filename.

Sets the default keywords start, step and dt (if available). n_atoms is always set from Reader.n_atoms.

See also

Reader.Writer()

Writer(filename, **kwargs)

Returns a PDBWriter for filename.

Parameters:filename (str) – filename of the output PDB file
Returns:
Return type:PDBWriter
add_auxiliary(auxname, auxdata, format=None, **kwargs)

Add auxiliary data to be read alongside trajectory.

Auxiliary data may be any data timeseries from the trajectory additional to that read in by the trajectory reader. auxdata can be an AuxReader instance, or the data itself as e.g. a filename; in the latter case an appropriate AuxReader is guessed from the data/file format. An appropriate format may also be directly provided as a key word argument.

On adding, the AuxReader is initially matched to the current timestep of the trajectory, and will be updated when the trajectory timestep changes (through a call to next() or jumping timesteps with trajectory[i]).

The representative value(s) of the auxiliary data for each timestep (as calculated by the AuxReader) are stored in the current timestep in the ts.aux namespace under auxname; e.g. to add additional pull force data stored in pull-force.xvg:

u = MDAnalysis.Universe(PDB, XTC)
u.trajectory.add_auxiliary('pull', 'pull-force.xvg')

The representative value for the current timestep may then be accessed as u.trajectory.ts.aux.pull or u.trajectory.ts.aux['pull'].

Note

Auxiliary data is assumed to be time-ordered, with no duplicates. See the Auxiliary API.

aux_list

Lists the names of added auxiliary data.

check_slice_indices(start, stop, step)

Check frame indices are valid and clip to fit trajectory.

The usage follows standard Python conventions for range() but see the warning below.

Parameters:
  • start (int or None) – Starting frame index (inclusive). None corresponds to the default of 0, i.e., the initial frame.
  • stop (int or None) – Last frame index (exclusive). None corresponds to the default of n_frames, i.e., it includes the last frame of the trajectory.
  • step (int or None) – step size of the slice, None corresponds to the default of 1, i.e, include every frame in the range start, stop.
Returns:

start, stop, step – Integers representing the slice

Return type:

tuple (int, int, int)

Warning

The returned values start, stop and step give the expected result when passed in range() but gives unexpected behavior when passed in a slice when stop=None and step=-1

This can be a problem for downstream processing of the output from this method. For example, slicing of trajectories is implemented by passing the values returned by check_slice_indices() to range()

range(start, stop, step)

and using them as the indices to randomly seek to. On the other hand, in MDAnalysis.analysis.base.AnalysisBase the values returned by check_slice_indices() are used to splice the trajectory by creating a slice instance

slice(start, stop, step)

This creates a discrepancy because these two lines are not equivalent:

range(10, -1, -1)             # [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
range(10)[slice(10, -1, -1)]  # []
convert_forces_from_native(force, inplace=True)

Conversion of forces array force from native to base units

Parameters:
  • force (array_like) – Forces to transform
  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input force is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

New in version 0.7.7.

convert_forces_to_native(force, inplace=True)

Conversion of force array force from base to native units.

Parameters:
  • force (array_like) – Forces to transform
  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input force is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

New in version 0.7.7.

convert_pos_from_native(x, inplace=True)

Conversion of coordinate array x from native units to base units.

Parameters:
  • x (array_like) – Positions to transform
  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input x is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

Changed in version 0.7.5: Keyword inplace can be set to False so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.

convert_pos_to_native(x, inplace=True)

Conversion of coordinate array x from base units to native units.

Parameters:
  • x (array_like) – Positions to transform
  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input x is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

Changed in version 0.7.5: Keyword inplace can be set to False so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.

convert_time_from_native(t, inplace=True)

Convert time t from native units to base units.

Parameters:
  • t (array_like) – Time values to transform
  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input t is modified in place and also returned (although note that scalar values t are passed by value in Python and hence an in-place modification has no effect on the caller.) In-place operations improve performance because allocating new arrays is avoided.

Changed in version 0.7.5: Keyword inplace can be set to False so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.

convert_time_to_native(t, inplace=True)

Convert time t from base units to native units.

Parameters:
  • t (array_like) – Time values to transform
  • inplace (bool, optional) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input t is modified in place and also returned. (Also note that scalar values t are passed by value in Python and hence an in-place modification has no effect on the caller.)

Changed in version 0.7.5: Keyword inplace can be set to False so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.

convert_velocities_from_native(v, inplace=True)

Conversion of velocities array v from native to base units

Parameters:
  • v (array_like) – Velocities to transform
  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input v is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

New in version 0.7.5.

convert_velocities_to_native(v, inplace=True)

Conversion of coordinate array v from base to native units

Parameters:
  • v (array_like) – Velocities to transform
  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input v is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

New in version 0.7.5.

dt

Time between two trajectory frames in picoseconds.

frame

Frame number of the current time step.

This is a simple short cut to Timestep.frame.

get_aux_attribute(auxname, attrname)

Get the value of attrname from the auxiliary auxname

Parameters:
  • auxname (str) – Name of the auxiliary to get value for
  • attrname (str) – Name of gettable attribute in the auxiliary reader
get_aux_descriptions(auxnames=None)

Get descriptions to allow reloading the specified auxiliaries.

If no auxnames are provided, defaults to the full list of added auxiliaries.

Passing the resultant description to add_auxiliary() will allow recreation of the auxiliary. e.g., to duplicate all auxiliaries into a second trajectory:

descriptions = trajectory_1.get_aux_descriptions()
for aux in descriptions:
    trajectory_2.add_auxiliary(**aux)
Returns:List of dictionaries of the args/kwargs describing each auxiliary.
Return type:list
iter_as_aux(auxname)

Iterate through timesteps for which there is at least one assigned step from the auxiliary auxname within the cutoff specified in auxname.

iter_auxiliary(auxname, start=None, stop=None, step=None, selected=None)

Iterate through the auxiliary auxname independently of the trajectory.

Will iterate over the specified steps of the auxiliary (defaults to all steps). Allows to access all values in an auxiliary, including those out of the time range of the trajectory, without having to also iterate through the trajectory.

After interation, the auxiliary will be repositioned at the current step.

Parameters:
  • auxname (str) – Name of the auxiliary to iterate over.
  • stop, step) ((start,) – Options for iterating over a slice of the auxiliary.
  • selected (lst | ndarray, optional) – List of steps to iterate over.
Yields:

AuxStep object

See also

iter_as_aux()

next()

Forward one step to next frame.

next_as_aux(auxname)

Move to the next timestep for which there is at least one step from the auxiliary auxname within the cutoff specified in auxname.

This allows progression through the trajectory without encountering NaN representative values (unless these are specifically part of the auxiliary data).

If the auxiliary cutoff is not set, where auxiliary steps are less frequent (auxiliary.dt > trajectory.dt), this allows progression at the auxiliary pace (rounded to nearest timestep); while if the auxiliary steps are more frequent, this will work the same as calling next().

See the Auxiliary API.

See also

iter_as_aux()

remove_auxiliary(auxname)

Clear data and close the AuxReader for the auxiliary auxname.

See also

add_auxiliary()

rename_aux(auxname, new)

Change the name of the auxiliary auxname to new.

Provided there is not already an auxiliary named new, the auxiliary name will be changed in ts.aux namespace, the trajectory’s list of added auxiliaries, and in the auxiliary reader itself.

Parameters:
  • auxname (str) – Name of the auxiliary to rename
  • new (str) – New name to try set
Raises:

ValueError – If the name new is already in use by an existing auxiliary.

rewind()

Position at beginning of trajectory

set_aux_attribute(auxname, attrname, new)

Set the value of attrname in the auxiliary auxname.

Parameters:
  • auxname (str) – Name of the auxiliary to alter
  • attrname (str) – Name of settable attribute in the auxiliary reader
  • new – New value to try set attrname to
time

Time of the current frame in MDAnalysis time units (typically ps).

This is either read straight from the Timestep, or calculated as time = Timestep.frame * Timestep.dt

totaltime

Total length of the trajectory

The time is calculated as (n_frames - 1) * dt, i.e., we assume that the first frame no time as elapsed. Thus, a trajectory with two frames will be considered to have a length of a single time step dt and a “trajectory” with a single frame will be reported as length 0.