Quickstart guide#

Get ready#

MolSysMT is currently distributed as a conda package. As such, to follow this quickstart guide, we recommend the next steps.

Install Conda in your machine and create a new working environment:

conda create -n MolSysMT python=3.12

Activate the new environment and install MolSysMT and Jupyter Lab:

conda activate MolSysMT
conda install -c uibcdf molsysmt
conda install -c conda-forge jupyterlab

Finally, launch Jupyter Lab and in a new Python notebook import the library:

import molsysmt as msm

A molecular system#

Let’s define a first molecular system with a PDB id.

molecular_system = '181L'

In MolSysMT’s language, a molecular system can have different forms. For example, a PDB id, the corresponding PDB file or BinaryCIF file, a trajectory object of mdtraj or pytraj, an NGLWidget of nglview, or a MolSys native object of MolSysMT, can be all different forms of a molecular system. And not every form has the same attributes. Some forms contain a list of atom names, atom residues, chains… and other contain spatial coordinates of their atoms, velocities… but they can be all different representations of the same system.

form = msm.get_form(molecular_system)
print(f'The molecular system has the "{form}" form')
The molecular system has the "string:pdb_id" form

We can get a description any molecular system with the help of the molsysmt.info function:

msm.info(molecular_system)
form n_atoms n_groups n_components n_chains n_molecules n_entities n_waters n_ions n_small_molecules n_proteins n_structures
string:pdb_id 1441 302 141 6 141 5 136 2 2 1 1

To follow with this “Quick guide”, we will need an appropriate form of our system. Let’s try to work with the native form molsysmt.MolSys since working all the time with the PDB id is not efficient. Forms conversions can be done with one of the most relevant MolSysMT basic functions: molsysmt.convert().

molecular_system = msm.convert(molecular_system, to_form='molsysmt.MolSys')

old_form = form
form = msm.get_form(molecular_system)
print(f'The molecular system was converted from "{old_form}" to "{form}"')
The molecular system was converted from "string:pdb_id" to "molsysmt.MolSys"

Note

MolSysMT includes some native forms such as ‘molsysmt.MolSys’, ‘molsysmt.Topology’ or ‘molsysmt.Structures’.

Elements selection#

One of the most common steps in every workflow when working with a molecular system model is the elements selection.

A molecular system is composed of elements. The basic and fundamental element is the “atom”. But other levels of hyerarchical groups of atoms are defined by other elements such as “group”, “component”, “molecular”, “chain” or “entity”.

Some selection examples#

Let’s get a general information summary regarding “entity” elements:

msm.info(molecular_system, element='entity')
index name type n atoms n groups n components n chains n molecules
0 T4 LYSOZYME protein 1289 162 1 1 1
1 CHLORIDE ION ion 2 2 2 2 2
2 2-HYDROXYETHYL DISULFIDE small molecule 8 1 1 1 1
3 BENZENE small molecule 6 1 1 1 1
4 water water 136 136 136 1 136

To quickly introduce how the selection tool works in MolSysMT, let’s illustrate its use getting the atom indices from the entity type “ion”:

ions = msm.select(molecular_system, selection='entity_type=="ion"')
print(ions)
[1289, 1290]

To proof that these atom indices correspond in deed to ions, let’s use again the function molsysmt.info this time with the input argument element='atom':

msm.info(molecular_system, element='atom', selection=ions)
index id name type group index group id group name group type component index chain index molecule index molecule type entity index entity name
1289 1290 CL CL 162 173 CL ion 1 1 1 ion 1 CHLORIDE ION
1290 1291 CL CL 163 178 CL ion 2 2 2 ion 1 CHLORIDE ION

The selection tool works as expected. But the selection condition was very simple. Let’s show in the following lines some cases just a bit more complicated. For example, can spatial restrictions be included in our selections with MolSysMT? The answer is yes… check this out:

CAs_in_contact = msm.select(molecular_system, selection='atom_name=="CA" within 5.0 angstroms of @ions')
print(CAs_in_contact)
[385, 1115, 1122, 1129, 1137]

Ok, but what if we want the groups (residues in this case) and not the atoms fulfiling this former condition:

residues_in_contact = msm.select(molecular_system, element='group', selection='atom_name=="CA" within 5.0 angstroms of @ions')
residues_in_contact
[48, 141, 142, 143, 144]

These former group indices in the variable residues_in_contact must content those group indices of the atom indices in CAs_in_contact. This can be checked with the following line:

residues_from_CAs_in_contact = msm.get(molecular_system, element='group', selection='atom_index in @CAs_in_contact', group_index=True)
set(residues_from_CAs_in_contact).issubset(set(residues_in_contact))
True

Use your favorite selection syntax#

You don’t need to learn a new selection syntax if you don’t want to. We are sure you got used to the syntax of other very popular and useful tools such as MDTraj. If thats your case, use it:

msm.select(molecular_system, selection='name =~ "C[1-4]"', syntax='MDTraj')
array([1291, 1293, 1299, 1300, 1301, 1302])

Translate a selection into other syntax#

Maybe, you need to use a selection condition in another very popular and useful tool such as NGLView. But you don’t remember the proper syntax rules. MolSysMT can also help in this case:

msm.select(molecular_system, element='group', selection='molecule_type=="ion"', to_syntax='NGLView')
'173:A 178:A'

Getting attributes#

Molecular systems, and its elements, have attributes. Those attributes can be the number of atoms, the number of structures, the name of an atom or the id of chain. MolSysMT includes a function to obtain the attribute values of a molecular system or of a specific set of elements of it: molsysmt.get().

Let’s show how this function operates with some simple examples. First, wondering about some general attributes of the molecular system:

msm.get(molecular_system, n_atoms=True)
1441
msm.get(molecular_system, n_structures=True)
1
msm.get(molecular_system, box_volume=True)
Magnitude
[311.55659621309997]
Unitsnanometer3

Let’s get now some attributes of some different elements:

msm.get(molecular_system, element='atom', selection=[10, 11, 12], atom_name=True, group_name=True)
[['C', 'O', 'CB'], ['ASN', 'ASN', 'ASN']]
msm.get(molecular_system, element='atom', selection='molecule_type=="ion"', coordinates=True)
Magnitude
[[[4.3141 1.6446999999999998 0.17689999999999997]  [3.1832 1.567 2.3874999999999997]]]
Unitsnanometer
msm.get(molecular_system, element='chain', selection='molecule_type=="water"', id=True)
[5]

Tools#

MolSysMT have different categories of tools to work with molecular modules. They can be found in the modules: molsysmt.basic, molsysmt.build, molsysmt.topology, molsysmt.structure, molsysmt.pbc, etc. Let’s illustrate here how some of these tools work.

Note

MolSysMT is form agnostic. All tools work no matter the form of the input molecular system.

Basic#

“Basic” tools such as select, get, convert, add, or remove, can be found in the module molsysmt.basic. Let’s see some examples.

Let’s convert a pdb ID to the native Python object of PDBFixer.

molecular_system = msm.basic.convert('181L', to_form='pdbfixer.PDBFixer')

We can check the form of the new molecular_system object:

msm.basic.get_form(molecular_system)
'pdbfixer.PDBFixer'

Now, let’s check if the system contains waters, ions, and small_molecules:

msm.basic.contains(molecular_system, waters=True, ions=True, small_molecules=True)
True

These molecules can be easily removed from the system.

molecular_system = msm.basic.remove(molecular_system, selection='molecule_type==["water", "ion", "small molecule"]')

We can wonder now: how many water, ions, and small molecules remain in the molecular system?

msm.basic.get(molecular_system, n_waters=True, n_ions=True, n_small_molecules=True)
[0, 0, 0]

As last example, let’s show in this notebook the 3D representation of the molecule thanks to the NGLView library:

msm.basic.view(molecular_system, viewer='NGLView')

Note

All methods defined in the molsysmt.basic module can be invoked also from the main level of the library. As such, molsysmt.convert is the same method as molsysmt.basic.convert.

Notice that all these basic methods were run over a molecular system with a non native form for MolSysMT:

msm.basic.info(molecular_system)
form n_atoms n_groups n_components n_chains n_molecules n_entities n_proteins n_structures
pdbfixer.PDBFixer 1289 162 1 1 1 1 1 1

Build#

The module molsysmt.build offers tools such as solvate, add_missing_hydrogens, build_peptide, get_atoms_with_alternate_locations, or make_bioassembly. Let’s see some examples.

Let’s create a peptide:

molecular_system = msm.build.build_peptide('AceAlaAlaAlaNme')

The peptide has two terminal cappings and three amino acids (five groups in total).

msm.get(molecular_system, n_aminoacids=True, n_groups=True)
[3, 5]

To keep on building the molecular system, let’s now solvate a truncated octahedral box around the peptide:

molecular_system = msm.build.solvate(molecular_system, box_shape='truncated octahedral',
                                     clearance='14.0 angstroms')

We can check that the new system is indeed solvated:

msm.build.is_solvated(molecular_system)
True
msm.get(molecular_system, n_waters=True)
497

Finally, to show in the notebook our system, let’s wrap all atoms to the minimum image convention with the peptide centered in the origin of coordinates -with the help of the molsysmt.pbc module-.

molecular_system = msm.pbc.wrap_to_mic(molecular_system,
                                       center_of_selection='molecule_type=="peptide"')

But, was the peptide centered in the origin of coordinates?

msm.structure.get_center(molecular_system, selection='molecule_type=="peptide"')
Magnitude
[[[-3.700743415417188e-17 4.7580986769649563e-17 2.3129646346357427e-18]]]
Unitsnanometer

And now, the view of the molecular system in the system.

msm.view(molecular_system, standard=True, with_water_as='surface', viewer='NGLView')

:class: note In this section, functions from the modules molsysmt.structure and molsysmt.pbc were used. These modules will be introduced in the following sections.

Structure#

The module molsysmt.structure offers tools such as get_distances, get_center, get_contacts, translate, or fit. Let’s see some examples with a new molecular system:

molecular_system = msm.basic.convert('181L', selection='molecule_type=="protein"')

The function molsysmt.structure.get_distances allows us to get distances between atoms:

msm.structure.get_distances(molecular_system, selection='atom_index==10', selection_2='atom_index==100')
Magnitude
[[[1.5211279959293365]]]
Unitsnanometer

And the function molsysmt.structure.get_dihedral_angles assists us to get specific dihedral angles in a peptide or protein:

phi_angles, psi_angles = msm.structure.get_dihedral_angles(molecular_system, selection='group_index==[3,4]', phi=True, psi=True)
phi_angles
Magnitude
[[-65.97266126462469]]
Unitsdegree

To conclude this section, let’s get the contacts between all CA atoms in the system (defining contact with a 9 angstroms distance cut-off):

msm.structure.get_contacts(molecular_system, selection='atom_name=="CA"', threshold='9 angstroms')
array([[[ True,  True,  True, ..., False,  True,  True],
        [ True,  True,  True, ..., False, False, False],
        [ True,  True,  True, ..., False, False, False],
        ...,
        [False, False, False, ...,  True,  True,  True],
        [ True, False, False, ...,  True,  True,  True],
        [ True, False, False, ...,  True,  True,  True]]])

Topology#

The module molsysmt.structure offers tools such as get_covalent_blocks, get_covalent_chains, get_sequence_alignment, or get_sequence_identity. Let’s see some examples, this time with a string:amino_acid_1 form:

molecular_system_A = msm.convert(molecular_system, selection='group_index in [4,5,6,7,8,9]', to_form='string:amino_acids_1')
print(molecular_system_A)
EMLRID
molecular_system_B = 'MLWAD'
msm.topology.get_sequence_alignment(molecular_system_A, reference_molecular_system=molecular_system_B, prettyprint=True)
EMLRID

-MLWAD

The module molsysmt.topology has also, for instance, functions such as molsysmt.topology.get_covalent_chains to find out the sequences of specific atoms covalently bounded:

msm.topology.get_covalent_chains(molecular_system, selection='group_index in [3,4,5]', chain=['atom_name=="C"',
                                                                                              'atom_name=="N"',
                                                                                              'atom_name=="CA"',
                                                                                              'atom_name in ["C", "CB"]'])
array([[26, 35, 36, 37],
       [26, 35, 36, 39],
       [37, 44, 45, 46],
       [37, 44, 45, 48]])

Let’s proof these function returned a correct answer:

msm.get(molecular_system, element='atom', selection=[37, 44, 45, 48], atom_name=True)
['C', 'N', 'CA', 'CB']
msm.get(molecular_system, element='atom', selection=[37, 44, 45, 48], inner_bonded_atom_pairs=True)
[[37, 44], [44, 45], [45, 48]]

Physchem#

The module molsysmt.physchem offers tools such as get_charge, get_hydrophobicity, or get_sasa. Let’s see an example using the molecular system from the previous example:

charge_groups = msm.physchem.get_charge(molecular_system, element='group', definition='physical_pH7')
charge_groups
Magnitude
[0.0 0.0 0.0 0.0 -1.0 0.0 0.0 1.0 0.0 -1.0 -1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 -1.0 0.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 -1.0 0.0 -1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 -1.0 -1.0 0.0 -1.0 1.0 0.0 0.0 0.0 0.0 -1.0 0.0 -1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 -1.0 0.0 0.0 -1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 -1.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 -1.0 0.0 0.0 1.0]
Unitselementary_charge

We can show the 3D graphical representation of our system as we did before, but this time colouring the residues according to its charge. Notice that we are here introducing a new module found in MolSysMT: molsysmt.thirds.

view = msm.view(molecular_system, viewer='NGLView')
msm.thirds.nglview.set_color_by_value(view, values=charge_groups, cmap='bwr_r')
view

PBC, HBonds, Molecular Mechanics…#

Other modules in MolSysMT helps the user working with the Periodic Boundary Conditions of the molecular system (such as molsysmt.pbc), the hydrogen bonds (such as molsysmt.hbonds), or the molecular mechanics (such as molsysmt.molecular_mechanics). Let’s illustrate the use of these last modules before finishing this Quickstart Guide.

The function molsysmt.pbc.has_pbc returns True if the molecular system has periodic boundary conditions.

msm.pbc.has_pbc('181L')
True

The system with PDB id 181L has a unit cell (box), tet’s get it:

msm.get('181L', box=True)
Magnitude
[[[6.089999999999999 0.0 0.0]  [-3.0449999999999995 5.274094699999999 0.0]  [0.0 0.0 9.7]]]
Unitsnanometer

Now, using the molecular system from the previous section, we can can compute the hydrogen bonds with molsysmt.hbonds.get_buch_hbonds. But before we need to be sure the system has hydrogens, and if that’s not the case, add them.

msm.build.has_hydrogens(molecular_system)
False
molecular_system = msm.build.add_missing_hydrogens(molecular_system, pH=7.4)

The same with the terminal cappings:

msm.build.get_missing_terminal_cappings(molecular_system)
{161: ['OXT']}
molecular_system = msm.build.add_missing_terminal_cappings(molecular_system)

The molecular system is now ready to play with it. Let’s check if there exists any hydrogen bond between ASP and TYR residues.

msm.hbonds.get_buch_hbonds(molecular_system, selection='group_name=="ASP"', selection_2='group_name=="TYR"')
(array([[[394, 395, 356]]]), <Quantity([[0.17385481]], 'nanometer')>)

Who are those atoms?

msm.info(molecular_system, element='atom', selection=[356, 394, 395])
index id name type group index group id group name group type component index chain index molecule index molecule type entity index entity name
356 357 OD1 O 19 20 ASP amino acid 0 0 0 protein 0 T4 LYSOZYME
394 395 N N 23 24 TYR amino acid 0 0 0 protein 0 T4 LYSOZYME
395 396 H H 23 24 TYR amino acid 0 0 0 protein 0 T4 LYSOZYME

Finnally, with the help of OpenMM, the non bonded term of the potential energy between atoms 394 and 395 with atom 356 can be computed:

msm.molecular_mechanics.get_non_bonded_potential_energy(molecular_system, selection=[394,  395], selection_2=[356], engine='OpenMM')
0.03475952148437467 kilojoule/mole

Do you want more?#

We hope you found MolSysMT useful. If that is the case, you can either keep on having a look to the “Showcase” section or maybe it is time for you to visit the “User Guide”. Enjoy the trip!