Get minimum distances#

MolSysMT includes a very versatile function to calculate distances between atoms of molecular systems, and/or centers of groups of atoms, and/or points in the coordinates space. The molsysmt.structure.get_distances() has many possibilities, that’s why one of the longest sections in MolSysMT documentation. But let’s go step by step.

import molsysmt as msm
from molsysmt import pyunitwizard as puw
import numpy as np
import matplotlib.pyplot as plt

MolSysMT includes a very versatile method to calculate distances between points in space, atoms and/or groups of atoms. As many other methods of this multitool, the method molsysmt.distance() has an input argument to choose the engine in charge of getting the result. For instance, molsysmt.distance() currently offers two engines MolSysMT and MDTraj. At this moment only MolSysMT will be reported in this guide.

The different options of the method molsysmt.distance() will be shown, little by little, along with the following examples.

The XYZ molecular system form#

The first case, and the most simple one, is getting distances from a points distribution in space. MolSysMT accepts a molecular system form where only spatial coordinates are described, with out topological information: the XYZ form.

molecular_system = np.zeros([6,3]) * puw.unit('nm')
msm.get_form(molecular_system)
'XYZ'

The XYZ form accepts numpy arrays with length units of the shape \([n\_structures, n\_atoms, 3]\) or \([n\_atoms, 3]\). In case of having an array of rank 2, MolSysMT always understands \(n\_structures=1\) and the first rank as the number of atoms:

msm.get(molecular_system, n_structures=True, n_atoms=True)
[1, 6]

Lets create a couple of XYZ molecular systems with more than a structure. These two systems will help us illustrate the firts distance calculations:

# Molecular system with three atoms and three structures.

molecular_system = np.zeros([3,4,3], dtype='float64') * puw.unit('nm')

## First atom
molecular_system[0,0,:] = [0, 2, -1] * puw.unit('nm')
molecular_system[1,0,:] = [1, 2, -1] * puw.unit('nm')
molecular_system[2,0,:] = [0, 2, -1] * puw.unit('nm')

## Second atom
molecular_system[0,1,:] = [-1, 1, 1] * puw.unit('nm')
molecular_system[1,1,:] = [-1, 0, 1] * puw.unit('nm')
molecular_system[2,1,:] = [0, 0, 1] * puw.unit('nm')

## Third atom
molecular_system[0,2,:] = [-2, 0, 1] * puw.unit('nm')
molecular_system[1,2,:] = [-2, 0, 0] * puw.unit('nm')
molecular_system[2,2,:] = [-1, 1, 0] * puw.unit('nm')

## Fourth atom
molecular_system[0,3,:] = [-2, -2, -2] * puw.unit('nm')
molecular_system[1,3,:] = [0, 0, 0] * puw.unit('nm')
molecular_system[2,3,:] = [2, 2, 2] * puw.unit('nm')
molecular_system
Magnitude
[[[0.0 2.0 -1.0]  [-1.0 1.0 1.0]  [-2.0 0.0 1.0]  [-2.0 -2.0 -2.0]] [[1.0 2.0 -1.0]  [-1.0 0.0 1.0]  [-2.0 0.0 0.0]  [0.0 0.0 0.0]] [[0.0 2.0 -1.0]  [0.0 0.0 1.0]  [-1.0 1.0 0.0]  [2.0 2.0 2.0]]]
Unitsnanometer
msm.info(molecular_system)
form n_atoms n_groups n_components n_chains n_molecules n_entities n_structures
XYZ 4 None None None None None 3

Minimum and Maximum distance#

Sometimes the minimum and maximum distance between two sets of atoms needs to be obtained. Although this step could be done with the method molsysmt.distance() and a little coding, MolSysMT includes two methods to make it even easier: molsysmt.minimum_distance() and molsysmt.maximum_distance(). Lets see in the following cells how they work.

As first example, lets get the minimum distance between two atoms selection:

min_pairs, min_distances = msm.structure.get_minimum_distances(molecular_system)

The result is offered as two numpy arrays: the list of atoms pairs minimizing the distance for each structure, and the corresponding value of the minimum distance (for each structure also).

min_pairs.shape
(3, 2)
min_pairs
array([[0, 0],
       [0, 0],
       [0, 0]])
min_distances.shape
(3,)
min_distances
Magnitude
[0.0 0.0 0.0]
Unitsnanometer
print('The minimum distance in structure 2-th is given by atom {}-th and atom {}-th: {}'.format(min_pairs[2,0], min_pairs[2,1], min_distances[2]))
The minimum distance in structure 2-th is given by atom 0-th and atom 0-th: 0.0 nanometer

All input arguments described in previous subsections can also be used with molsysmt.minimum_distance() and molsysmt.maximum_distance(). Lets see an example:

min_pairs, min_distances = msm.structure.get_minimum_distances(molecular_system, selection=[0,1,2], selection_2=[0,1,2],
                                               structure_indices=[0,1], structure_indices_2=[1,2], pairs=True)

Remember that with pairs=True, the output does not longer refer atoms indices but pairs indices. That is the reason why the shape of min_pairs is now:

min_pairs.shape
(2,)

While,

min_distances.shape
(2,)

Which means that the minimum displacement between consecutive structures was observed for:

for ii in range(2):
    print('Atom {}-th had the minimum displacement of A between structures {}-th and {}-th: {}'.format(min_pairs[ii], ii, ii+1, min_distances[ii]))
Atom 0-th had the minimum displacement of A between structures 0-th and 1-th: 1.0 nanometer
Atom 0-th had the minimum displacement of A between structures 1-th and 2-th: 1.0 nanometer

There are situations in which we have a list of atoms in selection and the minimum distance with a second set of atoms selection_2 needs to be known for every single atom of the first set. In this case the first set has to be considered not as entity (as set) in view of getting a single minimum distance. Lets illustrate this with an example:

min_pairs, min_distances = msm.structure.get_minimum_distances(molecular_system, selection=[1,2], structure_indices=[0,1,2],
                                                selection_2=[0,1], as_entity=False, as_entity_2=True)

The output corresponds to the minimum distance of atom 1-th of A to any atom of B and the minimum distance of atom 2-th of A to any atom of B, at every structure:

min_pairs.shape
(3, 2)
min_distances.shape
(3, 2)
selection_2=[0,1]
print('Atom 1-th of A has the minimum distance to B with its atom {}-th in structure 1-th: {}'.format(selection_2[min_pairs[1,0]], min_distances[1,0]))
Atom 1-th of A has the minimum distance to B with its atom 1-th in structure 1-th: 0.0 nanometer
for ii in range(3):
    print('The {}-th is the closest atom of B to atom 1-th of A at structure {}-th with {}'.format(selection_2[min_pairs[ii,0]],ii, min_distances[ii,0]))
The 1-th is the closest atom of B to atom 1-th of A at structure 0-th with 0.0 nanometer
The 1-th is the closest atom of B to atom 1-th of A at structure 1-th with 0.0 nanometer
The 1-th is the closest atom of B to atom 1-th of A at structure 2-th with 0.0 nanometer

Minimum and Maximum distance of atom groups#

Sometimes the pair of atom groups with the shortest distance between their geometric centers, or centers of mass, needs to be determined. Lets work to illustrate this case with a dimeric protein complex:

molecular_system = msm.convert('1TCD', 'molsysmt.MolSys')
msm.info(molecular_system, element='component', selection='molecule_type=="protein"')
index n atoms n groups chain index molecule index molecule type entity index entity name
0 1906 248 0 0 protein 0 TRIOSEPHOSPHATE ISOMERASE
1 1912 249 1 1 protein 0 TRIOSEPHOSPHATE ISOMERASE

Lets find out the closest pairs of distance from different components:

atoms_groups_component_0 = msm.get(molecular_system, element='group',
                                   selection='component_index==0', atom_index=True)
atoms_groups_component_1 = msm.get(molecular_system, element='group',
                                   selection='component_index==1', atom_index=True)
min_pairs, min_distances = msm.structure.get_minimum_distances(molecular_system,
                                                selection=atoms_groups_component_0,
                                                center_of_atoms=True,
                                                selection_2=atoms_groups_component_1,
                                                center_of_atoms_2=True)

There is a single structure in our molecular system, thats why the shape of the numpy array with the pair of groups is the following:

min_pairs.shape
(1, 2)

Where the indices found in min_pairs correspond to the n-th and m-th atoms group of the first list and the second list respectively:

min_pairs[0]
array([69, 12])
group_index_in_component_0 = msm.get(molecular_system, element='group',
                                     selection='component_index==0', index=True)[69]
group_index_in_component_1 = msm.get(molecular_system, element='group',
                                     selection='component_index==1', index=True)[12]
msm.info(molecular_system, element='group', selection=[group_index_in_component_0,
                                                       group_index_in_component_1])
index id name type n atoms component index chain index molecule index molecule type entity index entity name
69 73 GLY amino acid 4 0 0 0 protein 0 TRIOSEPHOSPHATE ISOMERASE
260 15 CYS amino acid 6 1 1 1 protein 0 TRIOSEPHOSPHATE ISOMERASE

And the corresponding minimum distance between both residues from the two components is:

min_distances[0]
0.38221311908773786 nanometer

On the other hand, if the maximum distance needs to be obtained, the method to be used is molsysmt.maximum_distance(). Lets show how this method works with a short trajectory of the pentalanine peptide.

molecular_system = msm.systems['pentalanine']['traj_pentalanine.h5']
molecular_system = msm.convert(molecular_system, to_form='molsysmt.MolSys')
msm.info(molecular_system)
form n_atoms n_groups n_components n_chains n_molecules n_entities n_peptides n_structures
molsysmt.MolSys 62 7 1 1 1 1 1 5000

The trajectory has 5000 structures. Lets see, for each structure, whats the pair of residues with the longest distance between their geometric centers:

list_atom_groups = msm.get(molecular_system, element='group', selection='all', atom_index=True)
max_pairs, max_distances = msm.structure.get_maximum_distances(molecular_system, selection=list_atom_groups,
                                                               center_of_atoms=True)

This time we have 5000 pairs of group, one for each structure; and 5000 maximum distances:

max_pairs.shape
(5000, 2)
max_pairs[0]
array([0, 6])
max_distances
Magnitude
[1.4298022125781427 1.5060004722702334 1.6863101013212674 ... 1.8300611459902283 1.245815434905033 1.392606254657962]
Unitsnanometer

To give a last example on this methods, lets wonder: what is the residue of the peptide with the largest displacement between each structure and the next one? (at each structure, of course)

list_atom_groups = msm.get(molecular_system, element='group', selection='all', atom_index=True)
structures=np.arange(msm.get(molecular_system, n_structures=True))
max_group, max_distances = msm.structure.get_maximum_distances(molecular_system,
                                                selection=list_atom_groups,
                                                center_of_atoms=True,
                                                selection_2=list_atom_groups,
                                                center_of_atoms_2=True,
                                                structure_indices=structures[:-1],
                                                structure_indices_2=structures[1:],
                                                pairs=True)

Since we are using the option pairs=True, the output this time corresponds to the index of the pair made by the elements in both groups_of_atoms_1 and groups_of_atoms_2 with the maximum distance, or displacement in this case, for each i-th structure with the consecutive (i+1)-th structure:

max_group.shape
(4999,)
max_group
array([2, 0, 6, ..., 6, 6, 6])

And the value of this maximum displacements are:

This way:

print('The {}-th is the group with the maximum displacement between structures 200-th and 201-th: {}'.format(max_group[200], max_distances[200]))
The 3-th is the group with the maximum displacement between structures 200-th and 201-th: 0.636721010809142 nanometer