5.1. Topology readers — MDAnalysis.topology

This submodule contains the topology readers. A topology file supplies the list of atoms in the system, their connectivity and possibly additional information such as B-factors, partial charges, etc. The details depend on the file format and not every topology file provides all (or even any) additional data. This data is made accessible through AtomGroup properties.

As a minimum, all topology parsers will provide atom ids, atom types, masses, resids, resnums and segids as well as assigning all atoms to residues and all residues to segments. For systems without residues and segments, this results in there being a single residue and segment to which all atoms belong. Often when data is not provided by a file, it will be guessed based on other data in the file. In the event that this happens, a UserWarning will always be issued.

The following table lists the currently supported topology formats along with the attributes they provide.

Table of Supported Topology Formats
Name extension attributes remarks
CHARMM/XPLOR PSF psf resnames, names, types, charges, bonds, angles, dihedrals, impropers MDAnalysis.topology.PSFParser
CHARMM CARD [1] crd names, tempfactors, resnames, “CARD” coordinate output from CHARMM; deals with either standard or EXTended format; MDAnalysis.topology.CRDParser
Brookhaven [1] pdb/ent names, bonds, resids, resnums, types, chainids, occupancies, bfactors, resids, icodes, resnames, segids, a simplified PDB format (as used in MD simulations) is read by default
XPDB [1] pdb As PDB except icodes Extended PDB format (can use 5-digit residue numbers). To use, specify the format “XPBD” explicitly: Universe(..., topology_format="XPDB"). Module MDAnalysis.coordinates.PDB
PQR [1] pqr names, charges, types, radii, resids, resnames, icodes, segids PDB-like but whitespace-separated files with charge and radius information; MDAnalysis.topology.PQRParser
PDBQT [1] pdbqt names, types, altLocs, charges, resnames, resids, icodes, occupancies, tempfactors, segids, file format used by AutoDock with atom types and partial charges. Module: MDAnalysis.topology.PDBQTParser
GROMOS96 [1] gro names, resids, resnames, GROMOS96 coordinate file; MDAnalysis.topology.GROParser
AMBER top, prmtop, parm7 names, charges type_indices, types, resnames, simple AMBER format reader (only supports a subset of flags); MDAnalysis.topology.TOPParser
DESRES [1] dms names, numbers, masses, charges, chainids, resids, resnames, segids, radii, DESRES molecular sturcture reader (only supports the atom and bond records); MDAnalysis.topology.DMSParser
TPR [2] tpr names, types, resids, resnames, charges, bonds, masses, moltypes, molnums Gromacs portable run input reader (limited experimental support for some of the more recent versions of the file format); MDAnalysis.topology.TPRParser
ITP itp names, types, resids, resnames, charges, bonds, masses, segids, moltypes, chargegroups Gromacs include topology file; MDAnalysis.topology.ITPParser
MOL2 [1] mol2 ids, names, types, resids, charges, bonds, resnames, Tripos MOL2 molecular structure format; MDAnalysis.topology.MOL2Parser
LAMMPS [1] data ids, types, masses, charges, resids, bonds, angles, dihedrals LAMMPS Data file parser MDAnalysis.topology.LAMMPSParser
LAMMPS [1] lammpsdump id, masses LAMMPS ascii dump file reader MDAnalysis.topology.LAMMPSParser
XYZ [1] xyz names XYZ File Parser. Reads only the labels from atoms and constructs minimal topology data. MDAnalysis.topology.XYZParser
TXYZ [1] txyz, arc names, atomids, masses, types, bonds Tinker XYZ File Parser. Reads atom labels, numbers and connectivity; masses are guessed from atoms names. MDAnalysis.topology.TXYZParser
GAMESS [1] gms, log names, atomic charges, GAMESS output parser. Read only atoms of assembly section (atom, elems and coords) and construct topology. MDAnalysis.topology.GMSParser
DL_Poly [1] config, history ids, names DL_Poly CONFIG or HISTORY file. Reads only the atom names. If atoms are written out of order, will correct the order. MDAnalysis.topology.DLPolyParser
Hoomd XML xml types, charges, radii, masses bonds, angles, dihedrals HOOMD XML topology file. Reads atom types, masses, and charges if possible. Also reads bonds, angles, and dihedrals. MDAnalysis.topology.HoomdXMLParser
GSD [1] gsd types, charges, radii, masses bonds, angles, dihedrals GSD topology file. Reads atom types, masses, and charges if possible. Also reads bonds, angles, and dihedrals. MDAnalysis.topology.GSDParser
Macromolecular mmtf altLocs, Macromolecular Transmission Format (MMTF).
transmission   bfactors, bonds, An efficient compact format for biomolecular
format   charges, masses, names, occupancies, types, icodes, resnames, resids, segids, models structures.
[1](1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15) This format can also be used to provide coordinates so that it is possible to create a full Universe by simply providing a file of this format as the sole argument to Universe: u = Universe(filename)
[2]The Gromacs TPR format contains coordinate information but parsing coordinates from a TPR file is currently not implemented in TPRParser.

5.1.1. Developer Notes

New in version 0.8.

Changed in version 0.16.0: The new array-based topology system completely replaced the old system that was based on a list of Atom instances.

Topology information consists of data that do not change over time, i.e. information that is the same for all time steps of a trajectory. This includes

  • identity of atoms (name, type, number, partial charge, …) and to which residue and segment they belong; atoms are identified in MDAnalysis by their index, an integer number starting at 0 and incremented in the order of atoms found in the topology.
  • bonds (pairs of atoms)
  • angles (triplets of atoms)
  • dihedral angles (quadruplets of atoms) — proper and improper dihedrals should be treated separately

Topology readers are generally called “parsers” in MDAnalysis (for historical reasons and in order to distinguish them from coordinate “readers”). All parsers are derived from MDAnalysis.topology.base.TopologyReaderBase and have a parse() method that returns a MDAnalysis.core.topology.Topology instance. atoms

The atoms appear to the user as an array of Atom instances. However, under the hood this is essentially only an array of atom indices that are used to index the various components of the topology database Topology. The parser needs to initialize the Topology with the data read from the topology file.

See also

Topology system bonds

Bonds are represented as a tuple of tuple. Each tuple contains two atom numbers, which indicate the atoms between which the bond is formed. Only one of the two permutations is stored, typically the one with the lower atom number first. bondorder

Some bonds have additional information called order. When available this is stored in a dictionary of format {bondtuple:order}. This extra information is then passed to Bond initialisation in u._init_bonds() angles

Angles are represented by a list of tuple. Each tuple contains three atom numbers. The second of these numbers represents the apex of the angle. dihedrals

Proper dihedral angles are represented by a list of tuple. Each tuple contains four atom numbers. The angle of the torsion is defined by the angle between the planes formed by atoms 1, 2, and 3, and 2, 3, and 4. impropers

Improper dihedral angles are represented by a list of tuple. Each tuple contains four atom numbers. The angle of the improper torsion is again defined by the angle between the planes formed by atoms 1, 2, and 3, and 2, 3, and 4. Improper dihedrals differ from regular dihedrals as the four atoms need not be sequentially bonded, and are instead often all bonded to the second atom.