%load_ext autoreload
%autoreload 2
import numpy as np
#np.warnings.filterwarnings('ignore')
import molsysmt as msm
from molsysmt import pyunitwizard as puw
import numpy as np
import matplotlib.pyplot as plt
Get neighbors#
With the method molsysmt.distance()
many questions about a molecular system can be answered. Two of the most common distance related questions are: what are the closest n atoms to a given one? or what are the atoms closest than a given distance threshold? MolSysMT includes a method to provide with this distances processing: molsysmt.neighbors()
.
First closest neighbor atoms or groups#
There are two ways to compute distance neighbors. The closest n atoms to a given one can be obtained with the option num_neighbors
or threshold
. Lets show with a simple example how this first option works:
molecular_system = msm.systems['pentalanine']['traj_pentalanine.h5']
molecular_system = msm.convert(molecular_system, to_form='molsysmt.MolSys')
msm.info(molecular_system)
form | n_atoms | n_groups | n_components | n_chains | n_molecules | n_entities | n_peptides | n_structures |
---|---|---|---|---|---|---|---|---|
molsysmt.MolSys | 62 | 7 | 1 | 1 | 1 | 1 | 1 | 5000 |
We can compute the closest 3 CA atoms to each CA atom of the molecular system:
CA_atoms_list = msm.select(molecular_system, selection='atom_name=="CA"')
neighbors, distances = msm.structure.get_neighbors(molecular_system, selection=CA_atoms_list, n_neighbors=3)
Two objects are returned. A numpy array with the list of 3 neighbor atom indices per atom in selection_1
, per structure:
neighbors.shape
(5000, 5, 3)
And the corresponding distances:
distances.shape
(5000, 5, 3)
This way, the closest 3 atoms of the first CA atom at structure 2000-th are:
print("3 first neighbor CAs of atom {}-th at structure 0-th".format(CA_atoms_list[0]))
print("------------------------------------------")
for ii in range(3):
print("{}° neighbor is atom {}-th with distance: {}".format(ii+1, CA_atoms_list[neighbors[2000,0,ii]], distances[2000,0,ii]))
3 first neighbor CAs of atom 8-th at structure 0-th
------------------------------------------
1° neighbor is atom 18-th with distance: 0.3874317590975941 nanometer
2° neighbor is atom 28-th with distance: 0.5346208297333755 nanometer
3° neighbor is atom 48-th with distance: 0.5453396095979958 nanometer
Lets see now the 4 closest atoms, any kind, to each CA atom of the molecular system:
neighbors, distances = msm.structure.get_neighbors(molecular_system, selection=CA_atoms_list, selection_2='all', n_neighbors=4)
print("4 first neighbors of atom {}-th at structure 2000-th".format(CA_atoms_list[0]))
print("------------------------------------------")
for ii in range(4):
print("{}° neighbor is atom {}-th with distance: {}".format(ii+1, neighbors[2000,0,ii], distances[2000,0,ii]))
4 first neighbors of atom 8-th at structure 2000-th
------------------------------------------
1° neighbor is atom 8-th with distance: 0.0 nanometer
2° neighbor is atom 9-th with distance: 0.10900002347459704 nanometer
3° neighbor is atom 6-th with distance: 0.14533472683839888 nanometer
4° neighbor is atom 10-th with distance: 0.1532800898161647 nanometer
Notice that, in this case, msm.neighbors_list
is built to assume that is working with two different set of atoms since selection
\(\neq\)selection_2
. Thats the reason why this time the first neighbor atom is the atom itself.
The method msm.neighbors_list()
was built on top of msm.distance()
, thus the input arguments are almost the same. If you already had a look to the section about atoms distance, you will be probably wonder if msm.neighbors_list
can also work with atoms groups. Lets illustrate this case with the following cells:
molecular_system = msm.convert('1TCD', 'molsysmt.MolSys')
atoms_in_residues_chain_0 = msm.get(molecular_system, element='group',
selection="molecule_type=='protein' and chain_index==0",
atom_index=True)
atoms_in_residues_chain_1 = msm.get(molecular_system, element='group',
selection="molecule_type=='protein' and chain_index==1",
atom_index=True)
print('Number of residues in chain 0:', len(atoms_in_residues_chain_0))
print('Number of residues in chain 1:', len(atoms_in_residues_chain_1))
Number of residues in chain 0: 248
Number of residues in chain 1: 249
neighbors, distances = msm.structure.get_neighbors(molecular_system, selection=atoms_in_residues_chain_0,
center_of_atoms=True, n_neighbors=8)
print(neighbors.shape)
(1, 248, 8)
print("8 first group neighbors of the geometric center of residue 0-th")
print("------------------------------------------")
for ii in range(8):
print("{}° neighbor is group {}-th with distance: {}".format(ii+1, neighbors[0,0,ii], distances[0,0,ii]))
8 first group neighbors of the geometric center of residue 0-th
------------------------------------------
1° neighbor is group 1-th with distance: 0.4370347284513117 nanometer
2° neighbor is group 226-th with distance: 0.5876214911977258 nanometer
3° neighbor is group 220-th with distance: 0.7005928267026148 nanometer
4° neighbor is group 3-th with distance: 0.7546595763407229 nanometer
5° neighbor is group 224-th with distance: 0.7788646403094595 nanometer
6° neighbor is group 225-th with distance: 0.7971327058289361 nanometer
7° neighbor is group 223-th with distance: 0.8302823100919277 nanometer
8° neighbor is group 2-th with distance: 0.8680783396121018 nanometer
The list groups neighbors can be computed also from two molecular systems or two list of groups:
neighbors, distances = msm.structure.get_neighbors(molecular_system,
selection=atoms_in_residues_chain_0,
center_of_atoms=True,
selection_2=atoms_in_residues_chain_1,
center_of_atoms_2=True,
n_neighbors=8)
print("8 first group neighbors from chain 1 of the geometric center of residue 0-th from chain 0")
print("------------------------------------------")
for ii in range(8):
print("{}° neighbor is group {}-th with distance: {}".format(ii+1, neighbors[0,0,ii], distances[0,0,ii]))
8 first group neighbors from chain 1 of the geometric center of residue 0-th from chain 0
------------------------------------------
1° neighbor is group 73-th with distance: 2.7733041141137114 nanometer
2° neighbor is group 71-th with distance: 3.004041044490357 nanometer
3° neighbor is group 74-th with distance: 3.0495973900039375 nanometer
4° neighbor is group 70-th with distance: 3.201184539105875 nanometer
5° neighbor is group 72-th with distance: 3.2536310735699128 nanometer
6° neighbor is group 75-th with distance: 3.3579026367512856 nanometer
7° neighbor is group 44-th with distance: 3.481955675588244 nanometer
8° neighbor is group 17-th with distance: 3.539767691521334 nanometer
The method molsysmt.neighbors_lists()
can also mix atoms and atoms groups. Lets, as last example, get the closest geometric centers of residues to a specific atom:
neighbors, distances = msm.structure.get_neighbors(molecular_system, selection=100,
selection_2=atoms_in_residues_chain_1,
center_of_atoms_2= True,
n_neighbors=4)
print("4 closest geometric centers of residues of chain 1 from atom 100-th")
print("-------------------------------------------------------------------")
for ii in range(4):
print("{}° closest neighbor is group {}-th with distance: {}".format(ii+1, neighbors[0,0,ii], distances[0,0,ii]))
4 closest geometric centers of residues of chain 1 from atom 100-th
-------------------------------------------------------------------
1° closest neighbor is group 80-th with distance: 0.60482728655584 nanometer
2° closest neighbor is group 69-th with distance: 0.7263555641389102 nanometer
3° closest neighbor is group 70-th with distance: 0.7811893184433587 nanometer
4° closest neighbor is group 77-th with distance: 0.8498448121405598 nanometer
Closest neighbor atoms or groups below a distance threshold#
In addition to the input argument num_neighbors
, molsysmt.neighbors()
includes the option of getting those neighbors with a distance below a given threshols: threshold
. Lets get for the following molecular system the list of CA atoms closest than 8 \(\unicode{xC5}\):
molecular_system = msm.convert('1TCD', 'molsysmt.MolSys')
CA_atoms = msm.select(molecular_system, selection='atom_name=="CA"')
neighbors, distances = msm.structure.get_neighbors(molecular_system, selection=CA_atoms, threshold='8 angstroms')
In this example, each CA atom has a different number of neighbors. This time the output is not a tensor ranked 3, but a matrix where the elements are not numbers but list of neighbors:
print(neighbors.shape)
(1, 497)
print(distances.shape)
(1, 497)
The molecular system had 1 single structure and 497 CA atoms, lets see now the number of CA neighbors of the first 10 CA atoms in our list:
for ii in range(10):
print("The {}° CA has {} CA neighbors.".format(ii+1,len(neighbors[0,ii])))
The 1° CA has 7 CA neighbors.
The 2° CA has 8 CA neighbors.
The 3° CA has 9 CA neighbors.
The 4° CA has 12 CA neighbors.
The 5° CA has 13 CA neighbors.
The 6° CA has 15 CA neighbors.
The 7° CA has 12 CA neighbors.
The 8° CA has 12 CA neighbors.
The 9° CA has 14 CA neighbors.
The 10° CA has 14 CA neighbors.
Lets print out the neighbors of the 20-th CA in the list:
for ii,dd in zip(neighbors[0,20], distances[0,20]):
print("The {}-th CA is {} away from the 20-th CA".format(ii,dd))
The 21-th CA is 0.3807746183768036 nanometer away from the 20-th CA
The 19-th CA is 0.3882762676239687 nanometer away from the 20-th CA
The 23-th CA is 0.5117970007727681 nanometer away from the 20-th CA
The 22-th CA is 0.5541036906572636 nanometer away from the 20-th CA
The 18-th CA is 0.5639790953572659 nanometer away from the 20-th CA
The 24-th CA is 0.6343354475354508 nanometer away from the 20-th CA
The 17-th CA is 0.6452477431188736 nanometer away from the 20-th CA
The 16-th CA is 0.7399875134081648 nanometer away from the 20-th CA
As well as for the input argument n_neighbors
-previous subsection-, the neighbors closest than a given threshold can also be computed between atoms groups or atoms and atoms groups. Lets show a example where the neighbors of the residues of chain 0 in our molecular system are defined as those residues of chain 1 closest the 1.2 nm:
atoms_in_residues_chain_0 = msm.get(molecular_system, element='group',
selection="molecule_type=='protein' and chain_index==0",
atom_index=True)
atoms_in_residues_chain_1 = msm.get(molecular_system, element='group',
selection="molecule_type=='protein' and chain_index==1",
atom_index=True)
neighbors, distances = msm.structure.get_neighbors(molecular_system,
selection= atoms_in_residues_chain_0,
center_of_atoms=True,
selection_2= atoms_in_residues_chain_1,
center_of_atoms_2=True,
threshold='1.2 nm')
Lets print out the number of contacts in chain 1 per residue of chain 0, if any:
for ii in range(len(atoms_in_residues_chain_0)):
n_contacts = len(neighbors[0,ii])
if n_contacts >0:
print('The {}-th residue of chain 0 has {} residue contacts in chain 1'.format(ii,n_contacts))
The 7-th residue of chain 0 has 1 residue contacts in chain 1
The 8-th residue of chain 0 has 6 residue contacts in chain 1
The 9-th residue of chain 0 has 5 residue contacts in chain 1
The 10-th residue of chain 0 has 10 residue contacts in chain 1
The 11-th residue of chain 0 has 18 residue contacts in chain 1
The 12-th residue of chain 0 has 14 residue contacts in chain 1
The 13-th residue of chain 0 has 11 residue contacts in chain 1
The 14-th residue of chain 0 has 9 residue contacts in chain 1
The 15-th residue of chain 0 has 9 residue contacts in chain 1
The 16-th residue of chain 0 has 3 residue contacts in chain 1
The 17-th residue of chain 0 has 2 residue contacts in chain 1
The 18-th residue of chain 0 has 3 residue contacts in chain 1
The 39-th residue of chain 0 has 2 residue contacts in chain 1
The 40-th residue of chain 0 has 7 residue contacts in chain 1
The 41-th residue of chain 0 has 17 residue contacts in chain 1
The 42-th residue of chain 0 has 18 residue contacts in chain 1
The 43-th residue of chain 0 has 26 residue contacts in chain 1
The 44-th residue of chain 0 has 11 residue contacts in chain 1
The 45-th residue of chain 0 has 9 residue contacts in chain 1
The 46-th residue of chain 0 has 11 residue contacts in chain 1
The 47-th residue of chain 0 has 6 residue contacts in chain 1
The 48-th residue of chain 0 has 2 residue contacts in chain 1
The 49-th residue of chain 0 has 1 residue contacts in chain 1
The 50-th residue of chain 0 has 1 residue contacts in chain 1
The 59-th residue of chain 0 has 1 residue contacts in chain 1
The 60-th residue of chain 0 has 3 residue contacts in chain 1
The 61-th residue of chain 0 has 4 residue contacts in chain 1
The 62-th residue of chain 0 has 8 residue contacts in chain 1
The 63-th residue of chain 0 has 7 residue contacts in chain 1
The 64-th residue of chain 0 has 6 residue contacts in chain 1
The 65-th residue of chain 0 has 7 residue contacts in chain 1
The 66-th residue of chain 0 has 1 residue contacts in chain 1
The 67-th residue of chain 0 has 2 residue contacts in chain 1
The 68-th residue of chain 0 has 6 residue contacts in chain 1
The 69-th residue of chain 0 has 14 residue contacts in chain 1
The 70-th residue of chain 0 has 16 residue contacts in chain 1
The 71-th residue of chain 0 has 13 residue contacts in chain 1
The 72-th residue of chain 0 has 32 residue contacts in chain 1
The 73-th residue of chain 0 has 32 residue contacts in chain 1
The 74-th residue of chain 0 has 21 residue contacts in chain 1
The 75-th residue of chain 0 has 15 residue contacts in chain 1
The 76-th residue of chain 0 has 9 residue contacts in chain 1
The 77-th residue of chain 0 has 3 residue contacts in chain 1
The 78-th residue of chain 0 has 5 residue contacts in chain 1
The 79-th residue of chain 0 has 14 residue contacts in chain 1
The 80-th residue of chain 0 has 7 residue contacts in chain 1
The 81-th residue of chain 0 has 2 residue contacts in chain 1
The 82-th residue of chain 0 has 9 residue contacts in chain 1
The 83-th residue of chain 0 has 11 residue contacts in chain 1
The 84-th residue of chain 0 has 2 residue contacts in chain 1
The 85-th residue of chain 0 has 1 residue contacts in chain 1
The 88-th residue of chain 0 has 2 residue contacts in chain 1
The 89-th residue of chain 0 has 2 residue contacts in chain 1
The 90-th residue of chain 0 has 2 residue contacts in chain 1
The 91-th residue of chain 0 has 2 residue contacts in chain 1
The 92-th residue of chain 0 has 6 residue contacts in chain 1
The 93-th residue of chain 0 has 4 residue contacts in chain 1
The 94-th residue of chain 0 has 7 residue contacts in chain 1
The 95-th residue of chain 0 has 7 residue contacts in chain 1
The 96-th residue of chain 0 has 1 residue contacts in chain 1
The 97-th residue of chain 0 has 3 residue contacts in chain 1
The 98-th residue of chain 0 has 11 residue contacts in chain 1
The 99-th residue of chain 0 has 14 residue contacts in chain 1
The 100-th residue of chain 0 has 4 residue contacts in chain 1
The 101-th residue of chain 0 has 5 residue contacts in chain 1
The 105-th residue of chain 0 has 2 residue contacts in chain 1
The 123-th residue of chain 0 has 1 residue contacts in chain 1
The 164-th residue of chain 0 has 1 residue contacts in chain 1
The 166-th residue of chain 0 has 1 residue contacts in chain 1
The 169-th residue of chain 0 has 5 residue contacts in chain 1
The 170-th residue of chain 0 has 6 residue contacts in chain 1
The 171-th residue of chain 0 has 4 residue contacts in chain 1
The 172-th residue of chain 0 has 1 residue contacts in chain 1
The 229-th residue of chain 0 has 1 residue contacts in chain 1
The 231-th residue of chain 0 has 3 residue contacts in chain 1
The 232-th residue of chain 0 has 2 residue contacts in chain 1
The 235-th residue of chain 0 has 1 residue contacts in chain 1
This information is usually represented as a contact map. If this is what you are looking for, you will probably find the next section more appropriate to your needs.