Source code for gridData.OpenDX

# gridData --- python modules to read and write gridded data
# Copyright (c) 2009-2014 Oliver Beckstein <[email protected]>
# Released under the GNU Lesser General Public License, version 3 or later.

r"""
:mod:`~gridData.OpenDX` --- routines to read and write simple OpenDX files
==========================================================================

The OpenDX format for multi-dimensional grid data. OpenDX is a free
visualization software, see http://www.opendx.org.

.. Note:: This module only implements a primitive subset, sufficient
          to represent n-dimensional regular grids.

The OpenDX scalar file format is specified in Appendix `B.2 Data
Explorer Native Files`_ [#OpenDXformat]_.

If you want to build a dx object from your data you can either use the
convenient :class:`~gridData.core.Grid` class from the top level
module (:class:`gridData.Grid`) or see the lower-level methods
described below.


.. _opendx-read-write:

Reading and writing OpenDX files
--------------------------------

If you have OpenDX files from other software and you just want to
**read** it into a Python array then you do not really need to use the
interface in :mod:`gridData.OpenDX`: just use
:class:`~gridData.core.Grid` and load the file::

  from gridData import Grid
  g = Grid("data.dx")

This should work for files produced by common visualization programs
(VMD_, PyMOL_, Chimera_). The documentation for :mod:`gridData` tells
you more about what to do with the :class:`~gridData.core.Grid`
object.

If you want to **write** an OpenDX file then you just use the
:meth:`gridData.core.Grid.export` method with `file_format="dx"` (or
just use a filename with extension ".dx")::

  g.export("data.dx")

However, some visualization programs do not implement full OpenDX
specifications and only read very specific, "OpenDX-like"
files. :mod:`gridData.OpenDX` tries to be compatible with these
formats. However, sometimes additional help is needed to write an
OpenDX file that can be read by a specific software, as described
below:

Known issues for writing OpenDX files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* APBS require the delta to be written to the seventh significant figure.
  The delta is now written to reflect this increase in precision.

  .. versionchanged:: 0.6.0

* PyMOL_ requires OpenDX files with the type specification "double" in
  the `class array` section (see issue `#35`_). By default (since
  release 0.4.0), the type is set to the one that most closely
  approximates the dtype of the numpy array :attr:`Grid.grid`, which
  holds all data. This is often :class:`numpy.float64`, which will
  create an OpenDX type "double", which PyMOL will read.

  However, if you want to *force* a specific OpenDX type (such as
  "float" or "double", see :attr:`gridData.OpenDX.array.dx_types` for
  available values) then you can use the ``type`` keyword argument::

    g.export("for_pymol.dx", type="double")

  If you always want to be able to read OpenDX files with PyMOL, it is
  suggested to always export with ``type="double"``.

  .. versionadded:: 0.4.0



.. _VMD: http://www.ks.uiuc.edu/Research/vmd/
.. _PyMOL: http://www.pymol.org/
.. _Chimera: https://www.cgl.ucsf.edu/chimera/
.. _`#35`: https://github.com/MDAnalysis/GridDataFormats/issues/35




Building a dx object from a numpy array ``A``
---------------------------------------------

If you have a numpy array ``A`` that represents a density in cartesian
space then you can construct a dx object (named a *field* in OpenDX
parlance) if you provide some additional information that fixes the
coordinate system in space and defines the units along the axes.

The following data are required:

grid
    numpy nD array (typically a nD histogram)
grid.shape
    the shape of the array
origin
    the cartesian coordinates of the center of the (0,0,..,0) grid cell
delta
    :math:`n \times n` array with the length of a grid cell along
    each axis; for regular rectangular grids the off-diagonal
    elements are 0 and the diagonal ones correspond to the
    'bin width' of the histogram, eg ``delta[0,0] = 1.0`` (Angstrom)

The DX data type ("type" in the DX file) is determined from the
:class:`numpy.dtype` of the :class:`numpy.ndarray` that is provided as
the *grid* (or with the *type* keyword argument to
:class:`gridData.OpenDX.array`).

For example, to build a :class:`field`::

  dx = OpenDX.field('density')
  dx.add('positions', OpenDX.gridpositions(1, grid.shape, origin, delta))
  dx.add('connections', OpenDX.gridconnections(2, grid.shape))
  dx.add('data', OpenDX.array(3, grid))

or all with the constructor::

  dx = OpenDX.field('density', components=dict(
            positions=OpenDX.gridpositions(1,grid.shape, d.origin, d.delta),
            connections=OpenDX.gridconnections(2, grid.shape),
            data=OpenDX.array(3, grid)))


Building a dx object from a dx file
-----------------------------------

One can also read data from an existing dx file::

 dx = OpenDX.field(0)
 dx.read('file.dx')

Only simple arrays are read and initially stored as a 1-d
:class:`numpy.ndarray` in the `dx.components['data'].array` with the
:class:`numpy.dtype` determined by the DX type in the file.

The dx :class:`field` object has a method
:meth:`~OpenDX.field.histogramdd` that produces output identical to
the :func:`numpy.histogramdd` function by taking the stored dimension
and deltas into account. In this way, one can store nD histograms in a
portable and universal manner::

  histogram, edges = dx.histogramdd()

.. rubric:; Footnotes

.. [#OpenDXformat] The original link to the OpenDX file format specs
   http://opendx.sdsc.edu/docs/html/pages/usrgu068.htm#HDREDF is dead so I am linking
   to an archived copy at the Internet Archive , `B.2 Data Explorer Native Files`_.

.. _`B.2 Data Explorer Native Files`:
   https://web.archive.org/web/20080808140524/http://opendx.sdsc.edu/docs/html/pages/usrgu068.htm
.. http://opendx.sdsc.edu/docs/html/pages/usrgu068.htm#HDREDF

Classes and functions
---------------------

"""
import numpy
import re
import gzip

import warnings

# Python 2/3 compatibility (see issue #99)
# and https://bugs.python.org/issue30012
import sys
if sys.version_info >= (3, ):
    def _gzip_open(filename, mode="rt"):
        return gzip.open(filename, mode)
else:
    def _gzip_open(filename, mode="rt"):
        return gzip.open(filename)
del sys


[docs]
class DXclass(object):
    """'class' object as defined by OpenDX"""
    def __init__(self,classid):
        """id is the object number"""
        self.id = classid  # serial number of the object
        self.name = None   # name of the DXclass
        self.component = None   # component type
        self.D = None      # dimensions


[docs]
    def write(self, stream, optstring="", quote=False):
        """write the 'object' line; additional args are packed in string"""
        classid = str(self.id)
        if quote: classid = '"'+classid+'"'
        # Only use a *single* space between tokens; both chimera's and pymol's DX parser
        # does not properly implement the OpenDX specs and produces garbage with multiple
        # spaces. (Chimera 1.4.1, PyMOL 1.3)
        to_write = 'object '+classid+' class '+str(self.name)+' '+optstring+'\n'
        self._write_line(stream, to_write)


    @staticmethod
    def _write_line(stream, line="", quote=False):
        """write a line to the file"""
        if isinstance(stream, gzip.GzipFile):
            line = line.encode()
        stream.write(line)

    def read(self, stream):
        raise NotImplementedError('Reading is currently not supported.')


[docs]
    def ndformat(self,s):
        """Returns a string with as many repetitions of s as self
        has dimensions (derived from shape)"""
        return s * len(self.shape)


    def __repr__(self):
        return '<OpenDX.'+str(self.name)+' object, id='+str(self.id)+'>'




[docs]
class gridpositions(DXclass):
    """OpenDX gridpositions class.

    shape     D-tuplet describing size in each dimension
    origin    coordinates of the centre of the grid cell with index 0,0,...,0
    delta     DxD array describing the deltas
    """
    def __init__(self,classid,shape=None,origin=None,delta=None,**kwargs):
        if shape is None or origin is None or delta is None:
            raise ValueError('all keyword arguments are required')
        self.id = classid
        self.name = 'gridpositions'
        self.component = 'positions'
        self.shape = numpy.asarray(shape)      # D dimensional shape
        self.origin = numpy.asarray(origin)    # D vector
        self.rank = len(self.shape)            # D === rank

        self.delta = numpy.asarray(delta)      # DxD array of grid spacings
        # gridDataFormats  actually provides a simple 1D array with the deltas because only
        # regular grids are used but the following is a reminder that OpenDX should be able
        # to handle more complicated volume elements
        if len(self.delta.shape) == 1:
            self.delta = numpy.diag(delta)
        if self.delta.shape != (self.rank, self.rank):
            # check OpenDX specs for irreg spacing if we want to implement
            # anything more complicated
            raise NotImplementedError('Only regularly spaced grids allowed, '
                                      'not delta={}'.format(self.delta))

[docs]
    def write(self, stream):
        super(gridpositions, self).write(
            stream, ('counts '+self.ndformat(' %d')) % tuple(self.shape))
        self._write_line(stream, 'origin %f %f %f\n' % tuple(self.origin))
        for delta in self.delta:
            self._write_line(
                stream, ('delta ' +
                         self.ndformat(' {:.7g}').format(*delta) +
                         '\n'))



[docs]
    def edges(self):
        """Edges of the grid cells, origin at centre of 0,0,..,0 grid cell.

        Only works for regular, orthonormal grids.
        """
        return [self.delta[d,d] * numpy.arange(self.shape[d]+1) + self.origin[d]\
                - 0.5*self.delta[d,d]     for d in range(self.rank)]





[docs]
class gridconnections(DXclass):
    """OpenDX gridconnections class"""
    def __init__(self,classid,shape=None,**kwargs):
        if shape is None:
            raise ValueError('all keyword arguments are required')
        self.id = classid
        self.name = 'gridconnections'
        self.component = 'connections'
        self.shape = numpy.asarray(shape)      # D dimensional shape


[docs]
    def write(self, stream):
        super(gridconnections, self).write(
            stream, ('counts '+self.ndformat(' %d')) % tuple(self.shape))





[docs]
class array(DXclass):
    """OpenDX array class.

    See `Array Objects`_ for details.

    .. _Array Objects:
       https://web.archive.org/web/20080808140524/http://opendx.sdsc.edu/docs/html/pages/usrgu068.htm#Header_440
    """
    #: conversion from :attr:`numpy.dtype.name` to closest OpenDX array type
    #: (round-tripping is not guaranteed to produce identical types); not all
    #: types are supported (e.g., strings are missing)
    np_types = {
        "uint8": "byte",         # DX "unsigned byte" equivalent
        "int8": "signed byte",
        "uint16": "unsigned short",
        "int16": "short",         # DX "signed short" equivalent
        "uint32": "unsigned int",
        "int32": "int",           # DX "signed int"   equivalent
        "uint64": "unsigned int", # not explicit in DX, for compatibility
        "int64": "int",           # not explicit in DX, for compatibility
        # "hyper",                # ?
        "float32": "float",       # default
        "float64": "double",
        "float16": "float",       # float16 not available in DX, use float
        # numpy "float128 not available, raise error
        # "string" not automatically supported
    }
    #: conversion from OpenDX type to closest :class:`numpy.dtype`
    #: (round-tripping is not guaranteed to produce identical types); not all
    #: types are supported (e.g., strings and conversion to int64 are missing)
    dx_types = {
        "byte": "uint8",
        "unsigned byte": "uint8",
        "signed byte": "int8",
        "unsigned short": "uint16",
        "short": "int16",
        "signed short": "int16",
        "unsigned int": "uint32",
        "int": "int32",
        "signed int": "int32",
        # "hyper",                # ?
        "float": "float32",       # default
        "double": "float64",
        # "string" not automatically supported
    }

    def __init__(self, classid, array=None, type=None, typequote='"',
                 **kwargs):
        """
        Parameters
        ----------
        classid : int
        array : array_like
        type : str (optional)
             Set the DX type in the output file and cast `array` to
             the closest numpy dtype.  `type` must be one of the
             allowed types in DX files as defined under `Array
             Objects`_.  The default ``None`` tries to set the type
             from the :class:`numpy.dtype` of `array`.

             .. versionadded:: 0.4.0

        Raises
        ------
        ValueError
             if `array` is not provided; or if `type` is not of the correct
             DX type
        """
        if array is None:
            raise ValueError('array keyword argument is required')
        self.id = classid
        self.name = 'array'
        self.component = 'data'
        # detect type https://github.com/MDAnalysis/GridDataFormats/issues/35
        if type is None:
            self.array = numpy.asarray(array)
            try:
                self.type = self.np_types[self.array.dtype.name]
            except KeyError:
                warnings.warn(("array dtype.name = {0} can not be automatically "
                               "converted to a DX array type. Use the 'type' keyword "
                               "to manually specify the correct type.").format(
                                   self.array.dtype.name))
                self.type = self.array.dtype.name  # will raise ValueError on writing
        else:
            try:
                self.array = numpy.asarray(array, dtype=self.dx_types[type])
            except KeyError:
                raise ValueError(("DX type {0} cannot be converted to an "
                                  "appropriate numpy dtype. Available "
                                  "types are: {1}".format(type,
                                                          list(self.dx_types.values()))))
            self.type = type
        self.typequote = typequote


[docs]
    def write(self, stream):
        """Write the *class array* section.

        Parameters
        ----------
        stream : stream

        Raises
        ------
        ValueError
             If the `dxtype` is not a valid type, :exc:`ValueError` is
             raised.

        """
        if self.type not in self.dx_types:
            raise ValueError(("DX type {} is not supported in the DX format. \n"
                              "Supported valus are: {}\n"
                              "Use the type=<type> keyword argument.").format(
                                  self.type, list(self.dx_types.keys())))
        typelabel = (self.typequote+self.type+self.typequote)
        super(array, self).write(stream, 'type {0} rank 0 items {1} data follows'.format(
            typelabel, self.array.size))

        # grid data, serialized as a C array (z fastest varying)
        # (flat iterator is equivalent to: for x: for y: for z: grid[x,y,z])
        # VMD's DX reader requires exactly 3 values per line
        fmt_string = "{:d}"
        if (self.array.dtype.kind == 'f' or self.array.dtype.kind == 'c'):
            precision = numpy.finfo(self.array.dtype).precision
            fmt_string = "{:."+"{:d}".format(precision)+"f}"
        values_per_line = 3
        values = self.array.flat
        while 1:
            try:
                for i in range(values_per_line):
                    self._write_line(stream, fmt_string.format(next(values)) + "\t")
                self._write_line(stream, '\n')
            except StopIteration:
                self._write_line(stream, '\n')
                break
        self._write_line(stream, 'attribute "dep" string "positions"\n')




[docs]
class field(DXclass):
    """OpenDX container class

    The *field* is the top-level object and represents the whole
    OpenDX file. It contains a number of other objects.

    Instantiate a DX object from this class and add subclasses with
    :meth:`add`.

    """
    # perhaps this should not derive from DXclass as those are
    # objects in field but a field cannot contain itself
    def __init__(self,classid='0',components=None,comments=None):
        """OpenDX object, which is build from a list of components.

        Parameters
        ----------

        id : str
               arbitrary string
        components : dict
               dictionary of DXclass instances (no sanity check on the
               individual ids!) which correspond to

               * positions
               * connections
               * data

        comments : list
               list of strings; each string becomes a comment line
               prefixed with '#'. Avoid newlines.


        A field must have at least the components 'positions',
        'connections', and 'data'. Those components are associated
        with objects belonging to the field. When writing a dx file
        from the field, only the required objects are dumped to the file.

        (For a more general class that can use field:
        Because there could be more objects than components, we keep a
        separate object list. When dumping the dx file, first all
        objects are written and then the field object describes its
        components. Objects are referenced by their unique id.)

        .. Note:: uniqueness of the *id* is not checked.


        Example
        -------
        Create a new dx object::

           dx = OpenDX.field('density',[gridpoints,gridconnections,array])

        """
        if components is None:
            components = dict(positions=None,connections=None,data=None)
        if comments is None:
            comments = ['OpenDX written by gridData.OpenDX',
                        'from https://github.com/MDAnalysis/GridDataFormats']
        elif type(comments) is not list:
            comments = [str(comments)]
        self.id = classid       # can be an arbitrary string
        self.name = 'field'
        self.component = None   # cannot be a component of a field
        self.components = components
        self.comments= comments

    def _openfile_writing(self, filename):
        """Returns a regular or gz file stream for writing"""
        if filename.endswith('.gz'):
            return gzip.open(filename, 'wb')
        else:
            return open(filename, 'w')


[docs]
    def write(self, filename):
        """Write the complete dx object to the file.

        This is the simple OpenDX format which includes the data into
        the header via the 'object array ... data follows' statement.

        Only simple regular arrays are supported.

        The format should be compatible with VMD's dx reader plugin.
        """
        # comments (VMD chokes on lines of len > 80, so truncate)
        maxcol = 80
        with self._openfile_writing(str(filename)) as outfile:
            for line in self.comments:
                comment = '# '+str(line)
                self._write_line(outfile, comment[:maxcol]+'\n')
            # each individual object
            for component, object in self.sorted_components():
                object.write(outfile)
            # the field object itself
            super(field, self).write(outfile, quote=True)
            for component, object in self.sorted_components():
                self._write_line(outfile, 'component "%s" value %s\n' % (
                    component, str(object.id)))



[docs]
    def read(self, stream):
        """Read DX field from file.

            dx = OpenDX.field.read(dxfile)

        The classid is discarded and replaced with the one from the file.
        """
        DXfield = self
        p = DXParser(stream)
        p.parse(DXfield)



[docs]
    def add(self,component,DXobj):
        """add a component to the field"""
        self[component] = DXobj



[docs]
    def add_comment(self,comment):
        """add comments"""
        self.comments.append(comment)



[docs]
    def sorted_components(self):
        """iterator that returns (component,object) in id order"""
        for component, object in \
                sorted(self.components.items(),
                       key=lambda comp_obj: comp_obj[1].id):
            yield component, object



[docs]
    def histogramdd(self):
        """Return array data as (edges,grid), i.e. a numpy nD histogram."""
        shape = self.components['positions'].shape
        edges = self.components['positions'].edges()
        hist = self.components['data'].array.reshape(shape)
        return (hist,edges)


    def __getitem__(self,key):
        return self.components[key]

    def __setitem__(self,key,value):
        self.components[key] = value

    def __repr__(self):
        return '<OpenDX.field object, id='+str(self.id)+', with '+\
               str(len(self.components))+' components and '+\
               str(len(self.components))+' objects>'



#------------------------------------------------------------
# DX file parsing
#------------------------------------------------------------


[docs]
class DXParseError(Exception):
    """general exception for parsing errors in DX files"""
    pass


[docs]
class DXParserNoTokens(DXParseError):
    """raised when the token buffer is exhausted"""
    pass


class Token:
    # token categories (values of dx_regex must match up with these categories)
    category = {'COMMENT': ['COMMENT'],
                'WORD': ['WORD'],
                'STRING': ['QUOTEDSTRING','BARESTRING','STRING'],
                'WHITESPACE': ['WHITESPACE'],
                'INTEGER': ['INTEGER'],
                'REAL': ['REAL'],
                'NUMBER': ['INTEGER','REAL']}
    # cast functions
    cast = {'COMMENT': lambda s:re.sub(r'#\s*','',s),
            'WORD': str,
            'STRING': str, 'QUOTEDSTRING': str, 'BARESTRING': str,
            'WHITESPACE': None,
            'NUMBER': float, 'INTEGER': int, 'REAL': float}

    def __init__(self,code,text):
        self.code = code    # store raw code
        self.text = text
    def equals(self,v):
        return self.text == v
    def iscode(self,code):
        return self.code in self.category[code]  # use many -> 1 mappings
    def value(self,ascode=None):
        """Return text cast to the correct type or the selected type"""
        if ascode is None:
            ascode = self.code
        return self.cast[ascode](self.text)
    def __repr__(self):
        return '<token '+str(self.code)+','+str(self.value())+'>'


[docs]
class DXInitObject(object):
    """Storage class that holds data to initialize one of the 'real'
    classes such as OpenDX.array, OpenDX.gridconnections, ...

    All variables are stored in args which will be turned into the
    arguments for the DX class.
    """
    DXclasses = {'gridpositions':gridpositions,
                 'gridconnections':gridconnections,
                 'array':array, 'field':field,
                 }

    def __init__(self,classtype,classid):
        self.type = classtype
        self.id = classid
        self.args = dict()

[docs]
    def initialize(self):
        """Initialize the corresponding DXclass from the data.

        class = DXInitObject.initialize()
        """
        return self.DXclasses[self.type](self.id,**self.args)

    def __getitem__(self,k):
        return self.args[k]
    def __setitem__(self,k,v):
        self.args[k] = v
    def __repr__(self):
        return '<DXInitObject instance type='+str(self.type)+', id='+str(self.id)+'>'



[docs]
class DXParser(object):
    """Brain-dead baroque implementation to read a simple (VMD) dx file.

    Requires a OpenDX.field instance.

    1) scan for 'object' lines:
       'object' id 'class' class  [data]
       [data ...]
    2) parse data according to class
    3) construct dx field from classes
    """

    # the regexes must match with the categories defined in the Token class
    # REAL regular expression will catch both integers and floats.
    # Taken from
    # https://docs.python.org/3/library/re.html#simulating-scanf
    dx_regex = re.compile(r"""
    (?P<COMMENT>\#.*$)            # comment (until end of line)
    |(?P<WORD>(object|class|counts|origin|delta|type|counts|rank|items|data))
    |"(?P<QUOTEDSTRING>[^\"]*)"   # string in double quotes  (quotes removed)
    |(?P<WHITESPACE>\s+)          # white space
    |(?P<REAL>[-+]?               # true real number (decimal point or
    (\d+(\.\d*)?|\.\d+)           # scientific notation) and integers
    ([eE][-+]?\d+)?)
    |(?P<BARESTRING>[a-zA-Z_][^\s\#\"]+) # unquoted strings, starting with non-numeric
    """, re.VERBOSE)


    def __init__(self, filename):
        """Setup a parser for a simple DX file (from VMD)

        >>> DXfield_object = OpenDX.field(id)
        >>> p = DXparser('bulk.dx')
        >>> p.parse(DXfield_object)

        The field object will be completely rewritten (including the
        id if one is found in the input file. The input files
        component layout is currently ignored.

        Note that quotes are removed from quoted strings.
        """
        self.filename = str(filename)
        self.field = field('grid data',comments=['filename: {0}'.format(self.filename)])
        # other variables are initialised every time parse() is called

        self.parsers = {'general':self.__general,
                        'comment':self.__comment, 'object':self.__object,
                        'gridpositions':self.__gridpositions,
                        'gridconnections':self.__gridconnections,
                        'array':self.__array, 'field':self.__field,
                        }



[docs]
    def parse(self, DXfield):
        """Parse the dx file and construct a DX field object with component classes.

        A :class:`field` instance *DXfield* must be provided to be
        filled by the parser::

           DXfield_object = OpenDX.field(*args)
           parse(DXfield_object)

        A tokenizer turns the dx file into a stream of tokens. A
        hierarchy of parsers examines the stream. The level-0 parser
        ('general') distinguishes comments and objects (level-1). The
        object parser calls level-3 parsers depending on the object
        found. The basic idea is that of a 'state machine'. There is
        one parser active at any time. The main loop is the general
        parser.

        * Constructing the dx objects with classtype and classid is
          not implemented yet.
        * Unknown tokens raise an exception.
        """

        self.DXfield = DXfield              # OpenDX.field (used by comment parser)
        self.currentobject = None           # containers for data
        self.objects = []                   # |
        self.tokens = []                    # token buffer

        if self.filename.endswith('.gz'):
            with _gzip_open(self.filename, 'rt') as self.dxfile:
                self.use_parser('general')
        else:
            with open(self.filename, 'r') as self.dxfile:
                self.use_parser('general')      # parse the whole file and populate self.objects

        # assemble field from objects
        for o in self.objects:
            if o.type == 'field':
                # Almost ignore the field object; VMD, for instance,
                # does not write components. To make this work
                # seamlessly I have to think harder how to organize
                # and use the data, eg preping the field object
                # properly and the initializing. Probably should also
                # check uniqueness of ids etc.
                DXfield.id = o.id
                continue
            c = o.initialize()
            self.DXfield.add(c.component,c)

        # free space
        del self.currentobject, self.objects




    def __general(self):
        """Level-0 parser and main loop.

        Look for a token that matches a level-1 parser and hand over control."""
        while 1:                            # main loop
            try:
                tok = self.__peek()         # only peek, apply_parser() will consume
            except DXParserNoTokens:
                # save previous DXInitObject
                # (kludge in here as the last level-2 parser usually does not return
                # via the object parser)
                if self.currentobject and self.currentobject not in self.objects:
                    self.objects.append(self.currentobject)
                return                      # stop parsing and finish
            # decision branches for all level-1 parsers:
            # (the only way to get out of the lower level parsers!)
            if tok.iscode('COMMENT'):
                self.set_parser('comment')  # switch the state
            elif tok.iscode('WORD') and tok.equals('object'):
                self.set_parser('object')   # switch the state
            elif self.__parser is self.__general:
                # Either a level-2 parser screwed up or some level-1
                # construct is not implemented.  (Note: this elif can
                # be only reached at the beginning or after comments;
                # later we never formally switch back to __general
                # (would create inifinite loop)
                raise DXParseError('Unknown level-1 construct at '+str(tok))

            self.apply_parser()     # hand over to new parser
                                    # (possibly been set further down the hierarchy!)

    # Level-1 parser
    def __comment(self):
        """Level-1 parser for comments.

        pattern: #.*
        Append comment (with initial '# ' stripped) to all comments.
        """
        tok = self.__consume()
        self.DXfield.add_comment(tok.value())
        self.set_parser('general')   # switch back to general parser

    def __object(self):
        """Level-1 parser for objects.

        pattern: 'object' id 'class' type ...

        id ::=   integer|string|'"'white space string'"'
        type ::= string
        """
        self.__consume()                    # 'object'
        classid = self.__consume().text
        word = self.__consume().text
        if word != "class":
            raise DXParseError("reserved word %s should have been 'class'." % word)
        # save previous DXInitObject
        if self.currentobject:
            self.objects.append(self.currentobject)
        # setup new DXInitObject
        classtype = self.__consume().text
        self.currentobject = DXInitObject(classtype=classtype,classid=classid)

        self.use_parser(classtype)

    # Level-2 parser (object parsers)
    def __gridpositions(self):
        """Level-2 parser for gridpositions.

        pattern:
        object 1 class gridpositions counts 97 93 99
        origin -46.5 -45.5 -48.5
        delta 1 0 0
        delta 0 1 0
        delta 0 0 1
        """
        try:
            tok = self.__consume()
        except DXParserNoTokens:
            return

        if tok.equals('counts'):
            shape = []
            try:
                while True:
                    # raises exception if not an int
                    self.__peek().value('INTEGER')
                    tok = self.__consume()
                    shape.append(tok.value('INTEGER'))
            except (DXParserNoTokens, ValueError):
                pass
            if len(shape) == 0:
                raise DXParseError('gridpositions: no shape parameters')
            self.currentobject['shape'] = shape
        elif tok.equals('origin'):
            origin = []
            try:
                while (self.__peek().iscode('INTEGER') or
                       self.__peek().iscode('REAL')):
                    tok = self.__consume()
                    origin.append(tok.value())
            except DXParserNoTokens:
                pass
            if len(origin) == 0:
                raise DXParseError('gridpositions: no origin parameters')
            self.currentobject['origin'] = origin
        elif tok.equals('delta'):
            d = []
            try:
                while (self.__peek().iscode('INTEGER') or
                       self.__peek().iscode('REAL')):
                    tok = self.__consume()
                    d.append(tok.value())
            except DXParserNoTokens:
                pass
            if len(d) == 0:
                raise DXParseError('gridpositions: missing delta parameters')
            try:
                self.currentobject['delta'].append(d)
            except KeyError:
                self.currentobject['delta'] = [d]
        else:
            raise DXParseError('gridpositions: '+str(tok)+' not recognized.')


    def __gridconnections(self):
        """Level-2 parser for gridconnections.

        pattern:
        object 2 class gridconnections counts 97 93 99
        """
        try:
            tok = self.__consume()
        except DXParserNoTokens:
            return

        if tok.equals('counts'):
            shape = []
            try:
                while True:
                    # raises exception if not an int
                    self.__peek().value('INTEGER')
                    tok = self.__consume()
                    shape.append(tok.value('INTEGER'))
            except (DXParserNoTokens, ValueError):
                pass
            if len(shape) == 0:
                raise DXParseError('gridconnections: no shape parameters')
            self.currentobject['shape'] = shape
        else:
            raise DXParseError('gridconnections: '+str(tok)+' not recognized.')


    def __array(self):
        """Level-2 parser for arrays.

        pattern:
        object 3 class array type double rank 0 items 12 data follows
        0 2 0
        0 0 3.6
        0 -2.0 1e-12
        +4.534e+01 .34534 0.43654
        attribute "dep" string "positions"
        """
        try:
            tok = self.__consume()
        except DXParserNoTokens:
            return

        if tok.equals('type'):
            tok = self.__consume()
            if not tok.iscode('STRING'):
                raise DXParseError('array: type was "%s", not a string.'%\
                                   tok.text)
            self.currentobject['type'] = tok.value()
        elif tok.equals('rank'):
            tok = self.__consume()
            try:
                self.currentobject['rank'] = tok.value('INTEGER')
            except ValueError:
                raise DXParseError('array: rank was "%s", not an integer.'%\
                                   tok.text)
        elif tok.equals('items'):
            tok = self.__consume()
            try:
                self.currentobject['size'] = tok.value('INTEGER')
            except ValueError:
                raise DXParseError('array: items was "%s", not an integer.'%\
                                   tok.text)
        elif tok.equals('data'):
            tok = self.__consume()
            if not tok.iscode('STRING'):
                raise DXParseError('array: data was "%s", not a string.'%\
                                   tok.text)
            if tok.text != 'follows':
                raise NotImplementedError(\
                            'array: Only the "data follows header" format is supported.')
            if not self.currentobject['size']:
                raise DXParseError("array: missing number of items")
            # This is the slow part.  Once we get here, we are just
            # reading in a long list of numbers.  Conversion to floats
            # will be done later when the numpy array is created.

            # Don't assume anything about whitespace or the number of elements per row
            self.currentobject['array'] = []
            while len(self.currentobject['array']) <self.currentobject['size']:
                 self.currentobject['array'].extend(self.dxfile.readline().strip().split())

            # If you assume that there are three elements per row
            # (except the last) the following version works and is a little faster.
            # for i in range(int(numpy.ceil(self.currentobject['size']/3))):
            #     self.currentobject['array'].append(self.dxfile.readline())
            # self.currentobject['array'] = ' '.join(self.currentobject['array']).split()
        elif tok.equals('attribute'):
            # not used at the moment
            attribute = self.__consume().value()
            if not self.__consume().equals('string'):
                raise DXParseError('array: "string" expected.')
            value = self.__consume().value()
        else:
            raise DXParseError('array: '+str(tok)+' not recognized.')

    def __field(self):
        """Level-2 parser for a DX field object.

        pattern:
        object "site map 1" class field
        component "positions" value 1
        component "connections" value 2
        component "data" value 3
        """
        try:
            tok = self.__consume()
        except DXParserNoTokens:
            return

        if tok.equals('component'):
            component = self.__consume().value()
            if not self.__consume().equals('value'):
                raise DXParseError('field: "value" expected')
            classid = self.__consume().value()
            try:
                self.currentobject['components'][component] = classid
            except KeyError:
                self.currentobject['components'] = {component:classid}
        else:
            raise DXParseError('field: '+str(tok)+' not recognized.')

    # parser routines independent of the dx classes
    # (with ideas from MDAnalysis.Selection and
    # http://effbot.org/zone/xml-scanner.htm)


[docs]
    def use_parser(self,parsername):
        """Set parsername as the current parser and apply it."""
        self.__parser = self.parsers[parsername]
        self.__parser()


[docs]
    def set_parser(self,parsername):
        """Set parsername as the current parser."""
        self.__parser = self.parsers[parsername]


[docs]
    def apply_parser(self):
        """Apply the current parser to the token stream."""
        self.__parser()


    def __tokenize(self,string):
        """Split s into tokens and update the token buffer.

        __tokenize(string)

        New tokens are appended to the token buffer, discarding white
        space.  Based on http://effbot.org/zone/xml-scanner.htm
        """
        for m in self.dx_regex.finditer(string.strip()):
            code = m.lastgroup
            text = m.group(m.lastgroup)
            tok = Token(code,text)
            if not tok.iscode('WHITESPACE'):
                 self.tokens.append(tok)
                 # print "DEBUG tokenize: "+str(tok)

    def __refill_tokenbuffer(self):
        """Add a new tokenized line from the file to the token buffer.

        __refill_tokenbuffer()

        Only reads a new line if the buffer is empty. It is safe to
        call it repeatedly.

        At end of file, method returns empty strings and it is up to
        __peek and __consume to flag the end of the stream.
        """
        if len(self.tokens) == 0:
            self.__tokenize(self.dxfile.readline())

    def __peek(self):
        self.__refill_tokenbuffer()
        try:
            return self.tokens[0]
        except IndexError:
            raise DXParserNoTokens

    def __consume(self,):
        """Get the next token from the buffer and remove it/them.

        try:
          while 1:
             token = __consume()
        except DXParserNoTokens:
          pass
        """
        self.__refill_tokenbuffer()
        #print "DEBUG consume: "+str(self.__parser)+' '+str(self.__peek())
        try:
            return self.tokens.pop(0)  # singlet
        except IndexError:
            raise DXParserNoTokens