Get missing heavy atoms#

Getting the missing heavy atoms of a molecular system

Some molecular systems have missing heavy atoms that can be identified when the topological attributes included in the system allow it. For example, a structure solved by Xray diffraction and deposited in the Protein Data Bank can have some defects, as missing heavy atoms. Let’s see for instance a case where we can find missing heavy atoms in the system… and some other defects:

import molsysmt as msm

molecular_system = msm.convert('1BRS', selection='molecule_type=="protein"')

msm.info(molecular_system, element='molecule')

index	name	type	n atoms	n groups	n components	chain index	entity index	entity name
0	BARNASE	protein	864	108	1	0	0	BARNASE
1	BARNASE	protein	878	110	1	1	0	BARNASE
2	BARNASE	protein	839	108	1	2	0	BARNASE
3	BARSTAR	protein	693	87	2	3	1	BARSTAR
4	BARSTAR	protein	665	86	2	4	1	BARSTAR
5	BARSTAR	protein	699	89	1	5	1	BARSTAR

With a first visual inspection using the function molsysmt.basic.info() we can already detect a first defect: two molecules -index 3 and 4- are composed of two components. A complete protein should have a unique component, not two. Thereby, these two molecules have at least a missing residue each one. But is this only defect? Are there any other missing atoms? The function molsysmt.build.get_missing_heavy_atoms() assists us in this task:

missing_heavy_atoms = msm.build.get_missing_heavy_atoms(molecular_system)

missing_heavy_atoms

{234: ['CG', 'CD', 'CE', 'NZ'],
['CG', 'OD1', 'OD2'],
['CD', 'OE1', 'OE2'],
['CG', 'CD', 'OE1', 'NE2'],
['CD', 'CE', 'NZ'],
['CG1', 'CG2'],
['CG', 'CD', 'CE', 'NZ'],
['OG'],
['O'],
['OE1', 'NE2'],
['NZ'],
['CG', 'CD', 'OE1', 'NE2'],
['NE', 'CZ', 'NH1', 'NH2'],
['CG', 'CD', 'OE1', 'NE2'],
['NE', 'CZ', 'NH1', 'NH2'],
['CG', 'CD', 'OE1', 'NE2'],
['CG', 'CD', 'CE', 'NZ'],
['CG', 'CD', 'OE1', 'NE2'],
['CG', 'CD1', 'CD2'],
['CD', 'CE', 'NZ'],
['CG', 'CD', 'OE1', 'OE2'],
['CG', 'CD', 'OE1', 'OE2'],
['CG', 'CD', 'OE1', 'OE2'],
['CG', 'OD1', 'ND2'],
['O']}

And we can know what are the molecules with missing heavy atoms:

residue_indices = missing_heavy_atoms.keys()
molecule_indices = msm.get(molecular_system, element='molecule', selection='group_index in @residue_indices', molecule_index=True)

msm.info(molecular_system, element='molecule', selection='molecule_index in @molecule_indices')

index	name	type	n atoms	n groups	n components	chain index	entity index	entity name
2	BARNASE	protein	839	108	1	2	0	BARNASE
3	BARSTAR	protein	693	87	2	3	1	BARSTAR
4	BARSTAR	protein	665	86	2	4	1	BARSTAR
5	BARSTAR	protein	699	89	1	5	1	BARSTAR

What can we do when there are missing heavy atoms in the system? As you can see in the section XXX, the function molsysmt.build.add_missing_heavy_atoms() can help to solve this problem.

But are these flaws reported above the only ones in the system? Well, all structures solved by Xray diffraction have no hydrogen atoms; and some of them include atoms with alternate coordinates -and thereby appear multiple times in the system-:

msm.build.has_hydrogens(molecular_system)

False

msm.get(molecular_system, alternate_locations=True)

[{2686: {'location_id': array(['A', 'B'], dtype=object),
   'occupancy': array([0.5, 0.5]),
   'b_factor': <Quantity([0.2466 0.2467], 'nanometer ** 2')>,
   'atom_id': [2687, 2688],
   'coordinates': <Quantity([[3.2742 2.2579 0.1536]
    [3.2757 2.2571 0.1533]], 'nanometer')>},
  2687: {'location_id': array(['A', 'B'], dtype=object),
   'occupancy': array([0.5, 0.5]),
   'b_factor': <Quantity([0.2594 0.2596], 'nanometer ** 2')>,
   'atom_id': [2689, 2690],
   'coordinates': <Quantity([[3.1412 2.241  0.1076]
    [3.3396 2.192  0.2619]], 'nanometer')>}}]

If in addition to how to get the missing heavy atoms, you want to know a bit more on how to add the missing hydrogens or how to solve the problem of having atoms with alternate locations, check the following sections: XXX and YYY.