Get missing heavy atoms#

Getting the missing heavy atoms of a molecular system

Some molecular systems have missing heavy atoms that can be identified when the topological attributes included in the system allow it. For example, a structure solved by Xray diffraction and deposited in the Protein Data Bank can have some defects, as missing heavy atoms. Let’s see for instance a case where we can find missing heavy atoms in the system… and some other defects:

import molsysmt as msm
molecular_system = msm.convert('1BRS', selection='molecule_type=="protein"')
msm.info(molecular_system, element='molecule')
index name type n atoms n groups n components chain index entity index entity name
0 BARNASE protein 864 108 1 0 0 BARNASE
1 BARNASE protein 878 110 1 1 0 BARNASE
2 BARNASE protein 839 108 1 2 0 BARNASE
3 BARSTAR protein 693 87 2 3 1 BARSTAR
4 BARSTAR protein 665 86 2 4 1 BARSTAR
5 BARSTAR protein 699 89 1 5 1 BARSTAR

With a first visual inspection using the function molsysmt.basic.info() we can already detect a first defect: two molecules -index 3 and 4- are composed of two components. A complete protein should have a unique component, not two. Thereby, these two molecules have at least a missing residue each one. But is this only defect? Are there any other missing atoms? The function molsysmt.build.get_missing_heavy_atoms() assists us in this task:

missing_heavy_atoms = msm.build.get_missing_heavy_atoms(molecular_system)
missing_heavy_atoms
{234: ['CG', 'CD', 'CE', 'NZ'],
 237: ['CG', 'OD1', 'OD2'],
 244: ['CD', 'OE1', 'OE2'],
 246: ['CG', 'CD', 'OE1', 'NE2'],
 254: ['CD', 'CE', 'NZ'],
 260: ['CG1', 'CG2'],
 264: ['CG', 'CD', 'CE', 'NZ'],
 282: ['OG'],
 325: ['O'],
 383: ['OE1', 'NE2'],
 385: ['NZ'],
 386: ['CG', 'CD', 'OE1', 'NE2'],
 422: ['NE', 'CZ', 'NH1', 'NH2'],
 429: ['CG', 'CD', 'OE1', 'NE2'],
 465: ['NE', 'CZ', 'NH1', 'NH2'],
 469: ['CG', 'CD', 'OE1', 'NE2'],
 471: ['CG', 'CD', 'CE', 'NZ'],
 472: ['CG', 'CD', 'OE1', 'NE2'],
 473: ['CG', 'CD1', 'CD2'],
 520: ['CD', 'CE', 'NZ'],
 526: ['CG', 'CD', 'OE1', 'OE2'],
 544: ['CG', 'CD', 'OE1', 'OE2'],
 562: ['CG', 'CD', 'OE1', 'OE2'],
 563: ['CG', 'OD1', 'ND2'],
 587: ['O']}

And we can know what are the molecules with missing heavy atoms:

residue_indices = missing_heavy_atoms.keys()
molecule_indices = msm.get(molecular_system, element='molecule', selection='group_index in @residue_indices', molecule_index=True)
msm.info(molecular_system, element='molecule', selection='molecule_index in @molecule_indices')
index name type n atoms n groups n components chain index entity index entity name
2 BARNASE protein 839 108 1 2 0 BARNASE
3 BARSTAR protein 693 87 2 3 1 BARSTAR
4 BARSTAR protein 665 86 2 4 1 BARSTAR
5 BARSTAR protein 699 89 1 5 1 BARSTAR

What can we do when there are missing heavy atoms in the system? As you can see in the section XXX, the function molsysmt.build.add_missing_heavy_atoms() can help to solve this problem.

But are these flaws reported above the only ones in the system? Well, all structures solved by Xray diffraction have no hydrogen atoms; and some of them include atoms with alternate coordinates -and thereby appear multiple times in the system-:

msm.build.has_hydrogens(molecular_system)
False
msm.get(molecular_system, alternate_locations=True)
[{2686: {'location_id': array(['A', 'B'], dtype=object),
   'occupancy': array([0.5, 0.5]),
   'b_factor': <Quantity([0.2466 0.2467], 'nanometer ** 2')>,
   'atom_id': [2687, 2688],
   'coordinates': <Quantity([[3.2742 2.2579 0.1536]
    [3.2757 2.2571 0.1533]], 'nanometer')>},
  2687: {'location_id': array(['A', 'B'], dtype=object),
   'occupancy': array([0.5, 0.5]),
   'b_factor': <Quantity([0.2594 0.2596], 'nanometer ** 2')>,
   'atom_id': [2689, 2690],
   'coordinates': <Quantity([[3.1412 2.241  0.1076]
    [3.3396 2.192  0.2619]], 'nanometer')>}}]

If in addition to how to get the missing heavy atoms, you want to know a bit more on how to add the missing hydrogens or how to solve the problem of having atoms with alternate locations, check the following sections: XXX and YYY.