Utility functions — pmda.util

This module contains helper functions and classes that can be used throughout pmda.

pmda.util.fold_second_order_moments(*args)[source]

Fold action for second_order_moments() calculation.

Takes in data that can be combined associatively (order doesn’t matter) and applies a combining function in a recursive fashion. In this case, it takes in a list of lists that each contain the total number of time steps, an n x m array of mean positions for n atoms, and an n x m array of second order moments for n atoms, for a given partition of a trajectory. It takes the first partition, combines it with the second, combines that result with the third, and that result with the fourth, etc. using second_order_moments(). The final result is a list of the summed time steps, combined mean positions, and combined second order moments of all atoms in the combined trajectory.

New in version 0.3.0.

pmda.util.make_balanced_slices(n_frames, n_blocks, start=None, stop=None, step=None)[source]

Divide n_frames into n_blocks balanced blocks.

The blocks are generated in such a way that they contain equal numbers of frames when possible, but there are also no empty blocks (which can happen with a naive distribution of ceil(n_frames/n_blocks) per block and a remainder block).

If the trajectory is sliced in any way (u.trajectory[start:stop:step]) then the appropriate values for start, stop, and step must be passed to this function. Defaults can be set to None. Only a subset of legal values for slices is supported: 0 start stop and step 1.

Parameters
  • n_frames (int) – number of frames in the trajectory (≥0). This must be the number of frames after the trajectory has been sliced, i.e. len(u.trajectory[start:stop:step]). If any of start, stop, and `step are not the defaults (left empty or set to None) they must be provided as parameters.

  • n_blocks (int) – number of blocks (>0 and <n_frames)

  • start (int or None) – The first index of the trajectory (default is None, which is interpreted as “first frame”, i.e., 0).

  • stop (int or None) – The index of the last frame + 1 (default is None, which is interpreted as “up to and including the last frame”.

  • step (int or None) – Step size by which the trajectory is sliced; the default is None which corresponds to step=1.

Returns

slices – List of length n_blocks with one slice for each block.

If n_frames = 0 then an empty list [] is returned.

Return type

list of slice

Example

For a trajectory with 5 frames and 4 blocks we get block sizes [2, 1, 1, 1] (instead of [2, 2, 1, 0] with the naive algorithm).

The slices will be [slice(0, 2, None), slice(2, 3, None), slice(3, 4, None), slice(4, 5, None)].

The indices can be used to slice a trajectory into blocks:

n_blocks = 5
n_frames = len(u.trajectory[start:stop:step])

slices = make_balanced_slices(n_frames, n_blocks,
                              start=start, stop=stop, step=step)
for i_block, block in enumerate(slices):
   for ts in u.trajectory[block]:
       # do stuff for block number i_block

Notes

Explanation of the algorithm: For M frames in the trajectory and N blocks (or processes), where i with 0 ≤ iN - 1 is the block number and m[i] is the number of frames for block i we get a balanced distribution (one that does not contain blocks of size 0) with the algorithm

m[i] = M // N     # initial frames for block i
r = M % N         # remaining frames 0 ≤ r < N
for i in range(r):
    m[i] += 1     # distribute the remaining frames
                  # over the first r blocks

For a step > 1, we use m[i] *= step. The last slice will never go beyond the original stop if a value was provided.

New in version 0.2.0.

Changed in version 0.3.0: raise ValueError when n_blocks is larger than n_frames

pmda.util.second_order_moments(S1, S2)[source]

Calculates the combined second order moment.

Given the partial centered moments of two partitions (S1 and S2) of a data set S, calculates the second order moments of S = S1 ∪ S2.

Parameters
  • S1 (array) – Contains (T1, mu1, M1) where T1 is an integer (number of elements in the partition, e.g., the number of time frames), mu1 is an n x m array of the means for n atoms (and for example, m=3 for the center of geometry), M1 is also an n x m array of the sum of squares.

  • S2 (array) – Contains (T2, mu2, M2) where T2 is an integer (number of elements in the partition, e.g., the number of time frames), mu2 is an n x m array of the means for n atoms (and for example, m=3 for the center of geometry), M2 is also an n x m array of the sum of squares.

Returns

S – The returned tuple contains the total number of elements in the partition T, the mean mu and the “second moment” M (sum of squares) for the combined data.

Return type

(T, mu, M)

Notes

For a given set of data, the sum of squares, also known as the second order moment about the mean, is defined as the sum of the squares of the differences between each data point and the mean (sum of the individual deviations from the mean) of the data set,

\[M_{2, S_{i}} = \sum_{t=t_{0}}^{T}(x_{t} - \mu_{i})^2,\]

where \(\mu_{i}\) is the time average of \(x_{t}\). If the average of the squares of the individual deviations is taken (instead of the sum), this yields the variance:

\[\sigma_{i}^{2} = \frac{1}{T}\sum_{t=t_{0}}^{T}(x_{t} - \mu_{i})^2\]

In order to combine the mean and second order moments of two separate partitions, [CGL1979] derived the following formulae:

\[\mu = \frac{T_{1}\mu_{1} + T_{2}\mu_{2}}{T}\]

and

\[M_{2, S} = M_{2, S_{1}} + M_{2, S_{2}} + \ \frac{T_{1}T_{2}}{T}(\mu_{2} - \mu_{1})^{2},\]

where \(T\), \(T_{1}\), and \(T_{2}\) are the respective cardinalities of \(S\), \(S_{1}\), and \(S_{2}\), \(\mu\), \(\mu_{1}\), and \(\mu_{2}\) are the respective means of \(S\), \(S_{1}\), and \(S_{2}\), and \(M_{2, S}\), \(M_{2, S_{1}}\), and \(M_{2, S_{2}}\) are the respective second order moments of \(S\), \(S_{1}\), and \(S_{2}\). This is similar notation to [Pebay2008]. With a combined sum of squares and mean, it is possible to calculate the root-mean-square fluctuations, otherwise known as the population standard deviation:

\[\sigma_{i} = \sqrt{\frac{1}{T} \ \sum_{t=t_{0}}^{T}(x_{t} - \mu_{i})^2}\]

References

CGL1979

T. F. Chan, G. H. Golub, and R. J. LeVeque. “Updating formulae and a pairwise algorithm for computing sample variances.” Technical Report STAN-CS-79-773, Stanford University, Department of Computer Science, 1979.

Pebay2008

P. Pebay. “Formulas for robust one-pass parallel computation of co-variances and arbitrary-order statistical moments.” Technical Report SAND2008-6212, 2008.

New in version 0.3.0.

class pmda.util.timeit[source]

measure time spend in context

timeit is a context manager (to be used with the with statement) that records the execution time for the enclosed context block in elapsed.

elapsed

Time in seconds that elapsed between entering and exiting the context.

Type

float

Example

Use as a context manager:

with timeit() as total:
   # code to be timed

print(total.elapsed, "seconds")

See also

time.time()