11.2.8. Helper functions — MDAnalysis.lib.util

Small helper functions that don’t fit anywhere else.

Changed in version 0.11.0: Moved mathematical functions into lib.mdamath

11.2.8.1. Files and directories

MDAnalysis.lib.util.filename(name, ext=None, keep=False)[source]

Return a new name that has suffix attached; replaces other extensions.

Parameters:
  • name (str or NamedStream) – filename; extension is replaced unless keep=True; name can also be a NamedStream (and its NamedStream.name will be changed accordingly)
  • ext (None or str) – extension to use in the new filename
  • keep (bool) –
    • False: replace existing extension with ext;
    • True: keep old extension if one existed

Changed in version 0.9.0: Also permits NamedStream to pass through.

MDAnalysis.lib.util.openany(*args, **kwds)[source]

Context manager for anyopen().

Open the datasource and close it when the context of the with statement exits.

datasource can be a filename or a stream (see isstream()). A stream is reset to its start if possible (via seek() or reset()).

The advantage of this function is that very different input sources (“streams”) can be used for a “file”, ranging from files on disk (including compressed files) to open file objects to sockets and strings—as long as they have a file-like interface.

Parameters:
  • datasource (a file or a stream) –
  • mode ({'r', 'w'} (optional)) – open in r(ead) or w(rite) mode
  • reset (bool (optional)) – try to read (mode ‘r’) the stream from the start [True]

Examples

Open a gzipped file and process it line by line:

with openany("input.pdb.gz") as pdb:
    for line in pdb:
        if line.startswith('ATOM'):
            print(line)

Open a URL and read it:

import urllib2
with openany(urllib2.urlopen("http://www.mdanalysis.org/")) as html:
    print(html.read())

See also

anyopen()

MDAnalysis.lib.util.anyopen(datasource, mode=’r’, reset=True)[source]

Open datasource (gzipped, bzipped, uncompressed) and return a stream.

datasource can be a filename or a stream (see isstream()). By default, a stream is reset to its start if possible (via seek() or reset()).

If possible, the attribute stream.name is set to the filename or “<stream>” if no filename could be associated with the datasource.

Parameters:
  • datasource – a file (from file or open()) or a stream (e.g. from urllib2.urlopen() or cStringIO.StringIO)
  • mode ({'r', 'w', 'a'} (optional)) – Open in r(ead), w(rite) or a(ppen) mode. More complicated modes (‘r+’, ‘w+’, …) are not supported; only the first letter of mode is used and thus any additional modifiers are silently ignored.
  • reset (bool (optional)) – try to read (mode ‘r’) the stream from the start
Returns:

Return type:

file-like object

See also

openany()
to be used with the with statement.

Changed in version 0.9.0: Only returns the stream and tries to set stream.name = filename instead of the previous behavior to return a tuple (stream, filename).

MDAnalysis.lib.util.greedy_splitext(p)[source]

Split extension in path p at the left-most separator.

Extensions are taken to be separated from the filename with the separator os.extsep (as used by os.path.splitext()).

Parameters:p (str) – path
Returns:(root, extension) – where root is the full path and filename with all extensions removed whereas extension is the string of all extensions.
Return type:tuple

Example

>>> greedy_splitext("/home/joe/protein.pdb.bz2")
('/home/joe/protein', '.pdb.bz2')
MDAnalysis.lib.util.which(program)[source]

Determine full path of executable program on PATH.

(Jay at http://stackoverflow.com/questions/377017/test-if-executable-exists-in-python)

Parameters:programe (str) – name of the executable
Returns:path – absolute path to the executable if it can be found, else None
Return type:str or None
MDAnalysis.lib.util.realpath(*args)[source]

Join all args and return the real path, rooted at /.

Expands ‘~’, ‘~user’, and environment variables such as $HOME.

Returns None if any of the args is None.

MDAnalysis.lib.util.guess_format(filename)[source]

Return the format of filename

The current heuristic simply looks at the filename extension and can work around compressed format extensions.

Parameters:filename (str or stream) – path to the file or a stream, in which case filename.name is looked at for a hint to the format
Returns:format – format specifier (upper case)
Return type:str
Raises:ValueError – if the heuristics are insufficient to guess a supported format

New in version 0.11.0: Moved into lib.util

11.2.8.2. Streams

Many of the readers are not restricted to just reading files. They can also use gzip-compressed or bzip2-compressed files (through the internal use of openany()). It is also possible to provide more general streams as inputs, such as a cStringIO.StringIO() instances (essentially, a memory buffer) by wrapping these instances into a NamedStream. This NamedStream can then be used in place of an ordinary file name (typically, with a class:~MDAnalysis.core.universe.Universe but it is also possible to write to such a stream using MDAnalysis.Writer()).

In the following example, we use a PDB stored as a string pdb_s:

import MDAnalysis
from MDAnalysis.lib.util import NamedStream
import cStringIO

pdb_s = "TITLE     Lonely Ion\nATOM      1  NA  NA+     1      81.260  64.982  10.926  1.00  0.00\n"
u = MDAnalysis.Universe(NamedStream(cStringIO.StringIO(pdb_s), "ion.pdb"))
print(u)
#  <Universe with 1 atoms>
print(u.atoms.positions)
# [[ 81.26000214  64.98200226  10.92599964]]

It is important to provide a proper pseudo file name with the correct extension (“.pdb”) to NamedStream because the file type recognition uses the extension of the file name to determine the file format or alternatively provide the format="pdb" keyword argument to the Universe.

The use of streams becomes more interesting when MDAnalysis is used as glue between different analysis packages and when one can arrange things so that intermediate frames (typically in the PDB format) are not written to disk but remain in memory via e.g. cStringIO buffers.

Note

A remote connection created by urllib2.urlopen() is not seekable and therefore will often not work as an input. But try it…

class MDAnalysis.lib.util.NamedStream(stream, filename, reset=True, close=False)[source]

Stream that also provides a (fake) name.

By wrapping a stream stream in this class, it can be passed to code that uses inspection of the filename to make decisions. For instance. os.path.split() will work correctly on a NamedStream.

The class can be used as a context manager.

NamedStream is derived from io.IOBase (to indicate that it is a stream). Many operations that normally expect a string will also work with a NamedStream; for instance, most of the functions in os.path will work with the exception of os.path.expandvars() and os.path.expanduser(), which will return the NamedStream itself instead of a string if no substitutions were made.

Example

Wrap a cStringIO.StringIO() instance to write to:

import cStringIO
import os.path
stream = cStringIO.StringIO()
f = NamedStream(stream, "output.pdb")
print(os.path.splitext(f))

Wrap a file instance to read from:

stream = open("input.pdb")
f = NamedStream(stream, stream.name)

Use as a context manager (closes stream automatically when the with block is left):

with NamedStream(open("input.pdb"), "input.pdb") as f:
   # use f
   print f.closed  # --> False
   # ...
print f.closed     # --> True

Note

This class uses its own __getitem__() method so if stream implements stream.__getitem__() then that will be masked and this class should not be used.

Warning

By default, NamedStream.close() will not close the stream but instead reset() it to the beginning. [1] Provide the force=True keyword to NamedStream.close() to always close the stream.

Initialize the NamedStream from a stream and give it a name.

The constructor attempts to rewind the stream to the beginning unless the keyword reset is set to False. If rewinding fails, a MDAnalysis.StreamWarning is issued.

Parameters:
  • stream (stream) – an open stream (e.g. file or cStringIO.StringIO())
  • filename (str) – the filename that should be associated with the stream
  • reset (bool (optional)) – start the stream from the beginning (either reset() or seek()) when the class instance is constructed
  • close (bool (optional)) – close the stream when a with block exits or when close() is called; note that the default is not to close the stream

Notes

By default, this stream will not be closed by with and close() (see there) unless the close keyword is set to True.

New in version 0.9.0.

close(force=False)[source]

Reset or close the stream.

If NamedStream.close_stream is set to False (the default) then this method will not close the stream and only reset() it.

If the force = True keyword is provided, the stream will be closed.

Note

This close() method is non-standard. del NamedStream always closes the underlying stream.

closed

True if stream is closed.

fileno()[source]

Return the underlying file descriptor (an integer) of the stream if it exists.

An IOError is raised if the IO object does not use a file descriptor.

flush()[source]

Flush the write buffers of the stream if applicable.

This does nothing for read-only and non-blocking streams. For file objects one also needs to call os.fsync() to write contents to disk.

readable()[source]

Return True if the stream can be read from.

If False, read() will raise IOError.

reset()[source]

Move to the beginning of the stream

seek(offset, whence=0)[source]

Change the stream position to the given byte offset .

Parameters:
  • offset (int) – offset is interpreted relative to the position indicated by whence.
  • whence ({0, 1, 2} (optional)) –

    Values for whence are:

    • io.SEEK_SET or 0 – start of the stream (the default); offset should be zero or positive
    • io.SEEK_CUR or 1 – current stream position; offset may be negative
    • io.SEEK_END or 2 – end of the stream; offset is usually negative
Returns:

the new absolute position in bytes.

Return type:

int

seekable()[source]

Return True if the stream supports random access.

If False, seek(), tell() and truncate() will raise IOError.

tell()[source]

Return the current stream position.

truncate(*size)[source]

Truncate the stream’s size to size.

Parameters:size (int (optional)) – The size defaults to the current position (if no size argument is supplied). The current file position is not changed.
writable()[source]

Return True if the stream can be written to.

If False, write() will raise IOError.

MDAnalysis.lib.util.isstream(obj)[source]

Detect if obj is a stream.

We consider anything a stream that has the methods

  • close()

and either set of the following

  • read(), readline(), readlines()
  • write(), writeline(), writelines()
Parameters:obj (stream or str) –
Returns:True if obj is a stream, False otherwise
Return type:bool

See also

io

New in version 0.9.0.

11.2.8.3. Containers and lists

MDAnalysis.lib.util.iterable(obj)[source]

Returns True if obj can be iterated over and is not a string nor a NamedStream

MDAnalysis.lib.util.asiterable(obj)[source]

Returns obj so that it can be iterated over.

A string is not detected as and iterable and is wrapped into a list with a single element.

See also

iterable()

MDAnalysis.lib.util.hasmethod(obj, m)[source]

Return True if object obj contains the method m.

class MDAnalysis.lib.util.Namespace[source]

Class to allow storing attributes in new namespace.

11.2.8.4. File parsing

class MDAnalysis.lib.util.FORTRANReader(fmt)[source]

FORTRANReader provides a method to parse FORTRAN formatted lines in a file.

The contents of lines in a file can be parsed according to FORTRAN format edit descriptors (see Fortran Formats for the syntax).

Only simple one-character specifiers supported here: I F E A X (see FORTRAN_format_regex).

Strings are stripped of leading and trailing white space.

Set up the reader with the FORTRAN format string.

The string fmt should look like ‘2I10,2X,A8,2X,A8,3F20.10,2X,A8,2X,A8,F20.10’.

Parameters:fmt (str) – FORTRAN format edit descriptor for a line as described in Fortran Formats

Example

Parsing of a standard CRD file:

atomformat = FORTRANReader('2I10,2X,A8,2X,A8,3F20.10,2X,A8,2X,A8,F20.10')
for line in open('coordinates.crd'):
    serial,TotRes,resName,name,x,y,z,chainID,resSeq,tempFactor = atomformat.read(line)
number_of_matches(line)[source]

Return how many format entries could be populated with legal values.

parse_FORTRAN_format(edit_descriptor)[source]

Parse the descriptor.

Parameters:edit_descriptor (str) – FORTRAN format edit descriptor
Returns:dict with totallength (in chars), repeat, length, format, decimals
Return type:dict
Raises:ValueError – The edit_descriptor is not recognized and cannot be parsed

Note

Specifiers: L ES EN T TL TR / r S SP SS BN BZ are not supported, and neither are the scientific notation Ew.dEe forms.

read(line)[source]

Parse line according to the format string and return list of values.

Values are converted to Python types according to the format specifier.

Parameters:line (str) –
Returns:list of entries with appropriate types
Return type:list
Raises:ValueError – Any of the conversions cannot be made (e.g. space for an int)
MDAnalysis.lib.util.FORTRAN_format_regex = ‘(?P<repeat>\d+?)(?P<format>[IFEAX])(?P<numfmt>(?P<length>\d+)(\.(?P<decimals>\d+))?)?’

Regular expresssion (see re) to parse a simple FORTRAN edit descriptor. (?P<repeat>\d?)(?P<format>[IFELAX])(?P<numfmt>(?P<length>\d+)(\.(?P<decimals>\d+))?)?

11.2.8.5. Data manipulation and handling

MDAnalysis.lib.util.fixedwidth_bins(delta, xmin, xmax)[source]

Return bins of width delta that cover xmin, xmax (or a larger range).

The bin parameters are computed such that the bin size delta is guaranteed. In order to achieve this, the range [xmin, xmax] can be increased.

Bins can be calculated for 1D data (then all parameters are simple floats) or nD data (then parameters are supplied as arrays, with each entry correpsonding to one dimension).

Parameters:
  • delta (float or array_like) – desired spacing of the bins
  • xmin (float or array_like) – lower bound (left boundary of first bin)
  • xmax (float or array_like) – upper bound (right boundary of last bin)
Returns:

The dict contains ‘Nbins’, ‘delta’, ‘min’, and ‘max’; these are either floats or arrays, depending on the input.

Return type:

dict

Example

Use with numpy.histogram():

B = fixedwidth_bins(delta, xmin, xmax)
h, e = np.histogram(data, bins=B['Nbins'], range=(B['min'], B['max']))

11.2.8.6. Strings

MDAnalysis.lib.util.convert_aa_code(x)[source]

Converts between 3-letter and 1-letter amino acid codes.

Parameters:x (str) – 1-letter or 3-letter amino acid code
Returns:3-letter or 1-letter amino acid code
Return type:str
Raises:ValueError – No conversion can be made; the amino acid code is not defined.

Note

Data are defined in amino_acid_codes and inverse_aa_codes.

MDAnalysis.lib.util.parse_residue(residue)[source]

Process residue string.

Parameters:residue (str) – The residue must contain a 1-letter or 3-letter or 4-letter residue string, a number (the resid) and optionally an atom identifier, which must be separate from the residue with a colon (“:”). White space is allowed in between.
Returns:(3-letter aa string, resid, atomname); known 1-letter aa codes are converted to 3-letter codes
Return type:tuple

Examples

  • “LYS300:HZ1” –> (“LYS”, 300, “HZ1”)
  • “K300:HZ1” –> (“LYS”, 300, “HZ1”)
  • “K300” –> (“LYS”, 300, None)
  • “4GB300:H6O” –> (“4GB”, 300, “H6O”)
  • “4GB300” –> (“4GB”, 300, None)
MDAnalysis.lib.util.conv_float(s)[source]

Convert an object s to float if possible.

Function to be passed into map() or a list comprehension. If the argument can be interpreted as a float it is converted, otherwise the original object is passed back.

11.2.8.7. Class decorators

MDAnalysis.lib.util.cached(key)[source]

Cache a property within a class.

Requires the Class to have a cache dict called _cache.

Example

How to add a cache for a variable to a class by using the @cached decorator:

class A(object):
    def__init__(self):
        self._cache = dict()

    @property
    @cached('keyname')
    def size(self):
        # This code gets ran only if the lookup of keyname fails
        # After this code has been ran once, the result is stored in
        # _cache with the key: 'keyname'
        size = 10.0

New in version 0.9.0.

Footnotes

[1]The reason why NamedStream.close() does not close a stream by default (but just rewinds it to the beginning) is so that one can use the class NamedStream as a drop-in replacement for file names, which are often re-opened (e.g. when the same file is used as a topology and coordinate file or when repeatedly iterating through a trajectory in some implementations). The close=True keyword can be supplied in order to make NamedStream.close() actually close the underlying stream and NamedStream.close(force=True) will also close it.