Get minimum distances#
MolSysMT includes a very versatile function to calculate distances between atoms of molecular systems, and/or centers of groups of atoms, and/or points in the coordinates space. The molsysmt.structure.get_distances()
has many possibilities, that’s why one of the longest sections in MolSysMT documentation. But let’s go step by step.
import molsysmt as msm
from molsysmt import pyunitwizard as puw
import numpy as np
import matplotlib.pyplot as plt
MolSysMT includes a very versatile method to calculate distances between points in space, atoms and/or groups of atoms. As many other methods of this multitool, the method molsysmt.distance()
has an input argument to choose the engine in charge of getting the result. For instance, molsysmt.distance()
currently offers two engines MolSysMT
and MDTraj
. At this moment only MolSysMT
will be reported in this guide.
The different options of the method molsysmt.distance()
will be shown, little by little, along with the following examples.
The XYZ molecular system form#
The first case, and the most simple one, is getting distances from a points distribution in space. MolSysMT accepts a molecular system form where only spatial coordinates are described, with out topological information: the XYZ
form.
molecular_system = np.zeros([6,3]) * puw.unit('nm')
msm.get_form(molecular_system)
'XYZ'
The XYZ
form accepts numpy arrays with length units of the shape \([n\_structures, n\_atoms, 3]\) or \([n\_atoms, 3]\). In case of having an array of rank 2, MolSysMT always understands \(n\_structures=1\) and the first rank as the number of atoms:
msm.get(molecular_system, n_structures=True, n_atoms=True)
[1, 6]
Lets create a couple of XYZ
molecular systems with more than a structure. These two systems will help us illustrate the firts distance calculations:
# Molecular system with three atoms and three structures.
molecular_system = np.zeros([3,4,3], dtype='float64') * puw.unit('nm')
## First atom
molecular_system[0,0,:] = [0, 2, -1] * puw.unit('nm')
molecular_system[1,0,:] = [1, 2, -1] * puw.unit('nm')
molecular_system[2,0,:] = [0, 2, -1] * puw.unit('nm')
## Second atom
molecular_system[0,1,:] = [-1, 1, 1] * puw.unit('nm')
molecular_system[1,1,:] = [-1, 0, 1] * puw.unit('nm')
molecular_system[2,1,:] = [0, 0, 1] * puw.unit('nm')
## Third atom
molecular_system[0,2,:] = [-2, 0, 1] * puw.unit('nm')
molecular_system[1,2,:] = [-2, 0, 0] * puw.unit('nm')
molecular_system[2,2,:] = [-1, 1, 0] * puw.unit('nm')
## Fourth atom
molecular_system[0,3,:] = [-2, -2, -2] * puw.unit('nm')
molecular_system[1,3,:] = [0, 0, 0] * puw.unit('nm')
molecular_system[2,3,:] = [2, 2, 2] * puw.unit('nm')
molecular_system
Magnitude | [[[0.0 2.0 -1.0] [-1.0 1.0 1.0] [-2.0 0.0 1.0] [-2.0 -2.0 -2.0]] [[1.0 2.0 -1.0] [-1.0 0.0 1.0] [-2.0 0.0 0.0] [0.0 0.0 0.0]] [[0.0 2.0 -1.0] [0.0 0.0 1.0] [-1.0 1.0 0.0] [2.0 2.0 2.0]]] |
---|---|
Units | nanometer |
msm.info(molecular_system)
form | n_atoms | n_groups | n_components | n_chains | n_molecules | n_entities | n_structures |
---|---|---|---|---|---|---|---|
XYZ | 4 | None | None | None | None | None | 3 |
Minimum and Maximum distance#
Sometimes the minimum and maximum distance between two sets of atoms needs to be obtained. Although this step could be done with the method molsysmt.distance()
and a little coding, MolSysMT includes two methods to make it even easier: molsysmt.minimum_distance()
and molsysmt.maximum_distance()
. Lets see in the following cells how they work.
As first example, lets get the minimum distance between two atoms selection:
min_pairs, min_distances = msm.structure.get_minimum_distances(molecular_system)
The result is offered as two numpy arrays: the list of atoms pairs minimizing the distance for each structure, and the corresponding value of the minimum distance (for each structure also).
min_pairs.shape
(3, 2)
min_pairs
array([[0, 0],
[0, 0],
[0, 0]])
min_distances.shape
(3,)
min_distances
Magnitude | [0.0 0.0 0.0] |
---|---|
Units | nanometer |
print('The minimum distance in structure 2-th is given by atom {}-th and atom {}-th: {}'.format(min_pairs[2,0], min_pairs[2,1], min_distances[2]))
The minimum distance in structure 2-th is given by atom 0-th and atom 0-th: 0.0 nanometer
All input arguments described in previous subsections can also be used with molsysmt.minimum_distance()
and molsysmt.maximum_distance()
. Lets see an example:
min_pairs, min_distances = msm.structure.get_minimum_distances(molecular_system, selection=[0,1,2], selection_2=[0,1,2],
structure_indices=[0,1], structure_indices_2=[1,2], pairs=True)
Remember that with pairs=True
, the output does not longer refer atoms indices but pairs indices. That is the reason why the shape of min_pairs is now:
min_pairs.shape
(2,)
While,
min_distances.shape
(2,)
Which means that the minimum displacement between consecutive structures was observed for:
for ii in range(2):
print('Atom {}-th had the minimum displacement of A between structures {}-th and {}-th: {}'.format(min_pairs[ii], ii, ii+1, min_distances[ii]))
Atom 0-th had the minimum displacement of A between structures 0-th and 1-th: 1.0 nanometer
Atom 0-th had the minimum displacement of A between structures 1-th and 2-th: 1.0 nanometer
There are situations in which we have a list of atoms in selection
and the minimum distance with a second set of atoms selection_2
needs to be known for every single atom of the first set. In this case the first set has to be considered not as entity (as set) in view of getting a single minimum distance. Lets illustrate this with an example:
min_pairs, min_distances = msm.structure.get_minimum_distances(molecular_system, selection=[1,2], structure_indices=[0,1,2],
selection_2=[0,1], as_entity=False, as_entity_2=True)
The output corresponds to the minimum distance of atom 1-th of A to any atom of B and the minimum distance of atom 2-th of A to any atom of B, at every structure:
min_pairs.shape
(3, 2)
min_distances.shape
(3, 2)
selection_2=[0,1]
print('Atom 1-th of A has the minimum distance to B with its atom {}-th in structure 1-th: {}'.format(selection_2[min_pairs[1,0]], min_distances[1,0]))
Atom 1-th of A has the minimum distance to B with its atom 1-th in structure 1-th: 0.0 nanometer
for ii in range(3):
print('The {}-th is the closest atom of B to atom 1-th of A at structure {}-th with {}'.format(selection_2[min_pairs[ii,0]],ii, min_distances[ii,0]))
The 1-th is the closest atom of B to atom 1-th of A at structure 0-th with 0.0 nanometer
The 1-th is the closest atom of B to atom 1-th of A at structure 1-th with 0.0 nanometer
The 1-th is the closest atom of B to atom 1-th of A at structure 2-th with 0.0 nanometer
Minimum and Maximum distance of atom groups#
Sometimes the pair of atom groups with the shortest distance between their geometric centers, or centers of mass, needs to be determined. Lets work to illustrate this case with a dimeric protein complex:
molecular_system = msm.convert('1TCD', 'molsysmt.MolSys')
msm.info(molecular_system, element='component', selection='molecule_type=="protein"')
index | n atoms | n groups | chain index | molecule index | molecule type | entity index | entity name |
---|---|---|---|---|---|---|---|
0 | 1906 | 248 | 0 | 0 | protein | 0 | TRIOSEPHOSPHATE ISOMERASE |
1 | 1912 | 249 | 1 | 1 | protein | 0 | TRIOSEPHOSPHATE ISOMERASE |
Lets find out the closest pairs of distance from different components:
atoms_groups_component_0 = msm.get(molecular_system, element='group',
selection='component_index==0', atom_index=True)
atoms_groups_component_1 = msm.get(molecular_system, element='group',
selection='component_index==1', atom_index=True)
min_pairs, min_distances = msm.structure.get_minimum_distances(molecular_system,
selection=atoms_groups_component_0,
center_of_atoms=True,
selection_2=atoms_groups_component_1,
center_of_atoms_2=True)
There is a single structure in our molecular system, thats why the shape of the numpy array with the pair of groups is the following:
min_pairs.shape
(1, 2)
Where the indices found in min_pairs correspond to the n-th and m-th atoms group of the first list and the second list respectively:
min_pairs[0]
array([69, 12])
group_index_in_component_0 = msm.get(molecular_system, element='group',
selection='component_index==0', index=True)[69]
group_index_in_component_1 = msm.get(molecular_system, element='group',
selection='component_index==1', index=True)[12]
msm.info(molecular_system, element='group', selection=[group_index_in_component_0,
group_index_in_component_1])
index | id | name | type | n atoms | component index | chain index | molecule index | molecule type | entity index | entity name |
---|---|---|---|---|---|---|---|---|---|---|
69 | 73 | GLY | amino acid | 4 | 0 | 0 | 0 | protein | 0 | TRIOSEPHOSPHATE ISOMERASE |
260 | 15 | CYS | amino acid | 6 | 1 | 1 | 1 | protein | 0 | TRIOSEPHOSPHATE ISOMERASE |
And the corresponding minimum distance between both residues from the two components is:
min_distances[0]
On the other hand, if the maximum distance needs to be obtained, the method to be used is molsysmt.maximum_distance()
. Lets show how this method works with a short trajectory of the pentalanine peptide.
molecular_system = msm.systems['pentalanine']['traj_pentalanine.h5']
molecular_system = msm.convert(molecular_system, to_form='molsysmt.MolSys')
msm.info(molecular_system)
form | n_atoms | n_groups | n_components | n_chains | n_molecules | n_entities | n_peptides | n_structures |
---|---|---|---|---|---|---|---|---|
molsysmt.MolSys | 62 | 7 | 1 | 1 | 1 | 1 | 1 | 5000 |
The trajectory has 5000 structures. Lets see, for each structure, whats the pair of residues with the longest distance between their geometric centers:
list_atom_groups = msm.get(molecular_system, element='group', selection='all', atom_index=True)
max_pairs, max_distances = msm.structure.get_maximum_distances(molecular_system, selection=list_atom_groups,
center_of_atoms=True)
This time we have 5000 pairs of group, one for each structure; and 5000 maximum distances:
max_pairs.shape
(5000, 2)
max_pairs[0]
array([0, 6])
max_distances
Magnitude | [1.4298022125781427 1.5060004722702334 1.6863101013212674 ... 1.8300611459902283 1.245815434905033 1.392606254657962] |
---|---|
Units | nanometer |
To give a last example on this methods, lets wonder: what is the residue of the peptide with the largest displacement between each structure and the next one? (at each structure, of course)
list_atom_groups = msm.get(molecular_system, element='group', selection='all', atom_index=True)
structures=np.arange(msm.get(molecular_system, n_structures=True))
max_group, max_distances = msm.structure.get_maximum_distances(molecular_system,
selection=list_atom_groups,
center_of_atoms=True,
selection_2=list_atom_groups,
center_of_atoms_2=True,
structure_indices=structures[:-1],
structure_indices_2=structures[1:],
pairs=True)
Since we are using the option pairs=True
, the output this time corresponds to the index of the pair made by the elements in both groups_of_atoms_1
and groups_of_atoms_2
with the maximum distance, or displacement in this case, for each i-th structure with the consecutive (i+1)-th structure:
max_group.shape
(4999,)
max_group
array([2, 0, 6, ..., 6, 6, 6])
And the value of this maximum displacements are:
This way:
print('The {}-th is the group with the maximum displacement between structures 200-th and 201-th: {}'.format(max_group[200], max_distances[200]))
The 3-th is the group with the maximum displacement between structures 200-th and 201-th: 0.636721010809142 nanometer