%load_ext autoreload
%autoreload 2
import molsysmt as msm
import numpy as np
import matplotlib.pyplot as plt

Get RMSD#

Lets load a small molecular system with a 5000 structures trajectory to show how MolSysMT gets the RMSD.

h5_file = msm.systems['pentalanine']['traj_pentalanine.h5']
molecular_system = msm.convert(h5_file)
msm.info(molecular_system)
form n_atoms n_groups n_components n_chains n_molecules n_entities n_peptides n_structures
molsysmt.MolSys 62 7 1 1 1 1 1 5000
time = msm.get(molecular_system, element='system', time=True)

The root mean squared deviation of a set of atom coordinates \(\vec{R}:(\vec{r_{1}}, \vec{r_{2}}, ..., \vec{r_{n}})\), with respect to a reference set of atom coordinates \(\vec{R'}:(\vec{r'_{1}}, \vec{r'_{2}}, ..., \vec{r'_{n}})\), is defined as:

(1)#\[\begin{equation} \text{RMSD} (\vec{R'},\vec{R}) = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} \Vert \vec{r'_{i}} - \vec{r_{i}} \Vert ^{2}} \end{equation}\]
(2)#\[\begin{equation} \text{RMSD} (\vec{R'},\vec{R}) = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} [(x'-x)^2 + (z'-z)^2 + (z'-z)^2]} \end{equation}\]

These two sets of atom coordinates, with the same number of atoms, could belong to structures of two different molecular systems or could also be structures of the same system. Let see the case where the RMSD is calculated only with the backbone atoms of the structure of our molecular system at structure 100-th taking the structure 0-th as reference:

rmsd = msm.structure.get_rmsd(molecular_system, selection='backbone',
                              structure_indices=100, reference_structure_index=0)

There object returned is a numpy array with a single RMSD value:

print(rmsd)
[0.7381704258064243] nanometer

Lets get now the RMSD of every single structure in the trajectory with the same structure as reference, the one found in structure 0-th:

rmsd = msm.structure.get_rmsd(molecular_system, selection='backbone',
                              structure_indices='all', reference_structure_index=0)
plt.plot(time, rmsd, label='RMSD')
plt.xlabel('time [{}]'.format(msm.pyunitwizard.get_unit(time)))
plt.ylabel('RMSD [{}]'.format(msm.pyunitwizard.get_unit(rmsd)))
plt.title('RMSD with backbone atoms with respect structure 0-th')
plt.legend()
plt.show()
/home/diego/Myopt/miniconda3/envs/MolSysMT@uibcdf_3.12/lib/python3.12/site-packages/matplotlib/cbook.py:1345: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.asarray(x, float)
../../../../_images/8db31925c8ffa0caeb2f48df1035900d7e4de53c1d1fc40ac4d540db62e9d1e4.png

In case of having two different molecular systems, molsysmt.rmsd() accepts all necessary input arguments to specify the set of atom coordinates to obtain the rmsd, for a single structure or a set of them: item, selection and structure_indices for the subject molecular system; and reference_item, reference_selection, and reference_structure_index for the reference molecular system.

Lets, for instance, create two different molecular systems (same topology in this case) to illustrate how these input arguments work:

molecular_system_1 = msm.extract(molecular_system, structure_indices=range(0,100))
molecular_system_2 = msm.extract(molecular_system, structure_indices=range(200,300))
rmsd = msm.structure.get_rmsd(molecular_system_1, selection='backbone', structure_indices=80, 
                reference_molecular_system=molecular_system_2, reference_selection='backbone',
                reference_structure_index=20)

And we got the rmsd between to structures coming from two different structures from two molecular systems:

print(rmsd)
[0.945975307377399] nanometer