Get missing heavy atoms#
Getting the missing heavy atoms of a molecular system
Some molecular systems have missing heavy atoms that can be identified when the topological attributes included in the system allow it. For example, a structure solved by Xray diffraction and deposited in the Protein Data Bank can have some defects, as missing heavy atoms. Let’s see for instance a case where we can find missing heavy atoms in the system… and some other defects:
import molsysmt as msm
molecular_system = msm.convert('1BRS', selection='molecule_type=="protein"')
msm.info(molecular_system, element='molecule')
index | name | type | n atoms | n groups | n components | chain index | entity index | entity name |
---|---|---|---|---|---|---|---|---|
0 | BARNASE | protein | 864 | 108 | 1 | 0 | 0 | BARNASE |
1 | BARNASE | protein | 878 | 110 | 1 | 1 | 0 | BARNASE |
2 | BARNASE | protein | 839 | 108 | 1 | 2 | 0 | BARNASE |
3 | BARSTAR | protein | 693 | 87 | 2 | 3 | 1 | BARSTAR |
4 | BARSTAR | protein | 665 | 86 | 2 | 4 | 1 | BARSTAR |
5 | BARSTAR | protein | 699 | 89 | 1 | 5 | 1 | BARSTAR |
With a first visual inspection using the function molsysmt.basic.info()
we can already detect a first defect: two molecules -index 3 and 4- are composed of two components. A complete protein should have a unique component, not two. Thereby, these two molecules have at least a missing residue each one. But is this only defect? Are there any other missing atoms? The function molsysmt.build.get_missing_heavy_atoms()
assists us in this task:
missing_heavy_atoms = msm.build.get_missing_heavy_atoms(molecular_system)
missing_heavy_atoms
{234: ['CG', 'CD', 'CE', 'NZ'],
237: ['CG', 'OD1', 'OD2'],
244: ['CD', 'OE1', 'OE2'],
246: ['CG', 'CD', 'OE1', 'NE2'],
254: ['CD', 'CE', 'NZ'],
260: ['CG1', 'CG2'],
264: ['CG', 'CD', 'CE', 'NZ'],
282: ['OG'],
325: ['O'],
383: ['OE1', 'NE2'],
385: ['NZ'],
386: ['CG', 'CD', 'OE1', 'NE2'],
422: ['NE', 'CZ', 'NH1', 'NH2'],
429: ['CG', 'CD', 'OE1', 'NE2'],
465: ['NE', 'CZ', 'NH1', 'NH2'],
469: ['CG', 'CD', 'OE1', 'NE2'],
471: ['CG', 'CD', 'CE', 'NZ'],
472: ['CG', 'CD', 'OE1', 'NE2'],
473: ['CG', 'CD1', 'CD2'],
520: ['CD', 'CE', 'NZ'],
526: ['CG', 'CD', 'OE1', 'OE2'],
544: ['CG', 'CD', 'OE1', 'OE2'],
562: ['CG', 'CD', 'OE1', 'OE2'],
563: ['CG', 'OD1', 'ND2'],
587: ['O']}
And we can know what are the molecules with missing heavy atoms:
residue_indices = missing_heavy_atoms.keys()
molecule_indices = msm.get(molecular_system, element='molecule', selection='group_index in @residue_indices', molecule_index=True)
msm.info(molecular_system, element='molecule', selection='molecule_index in @molecule_indices')
index | name | type | n atoms | n groups | n components | chain index | entity index | entity name |
---|---|---|---|---|---|---|---|---|
2 | BARNASE | protein | 839 | 108 | 1 | 2 | 0 | BARNASE |
3 | BARSTAR | protein | 693 | 87 | 2 | 3 | 1 | BARSTAR |
4 | BARSTAR | protein | 665 | 86 | 2 | 4 | 1 | BARSTAR |
5 | BARSTAR | protein | 699 | 89 | 1 | 5 | 1 | BARSTAR |
What can we do when there are missing heavy atoms in the system? As you can see in the section XXX, the function molsysmt.build.add_missing_heavy_atoms()
can help to solve this problem.
But are these flaws reported above the only ones in the system? Well, all structures solved by Xray diffraction have no hydrogen atoms; and some of them include atoms with alternate coordinates -and thereby appear multiple times in the system-:
msm.build.has_hydrogens(molecular_system)
False
msm.get(molecular_system, alternate_locations=True)
[{2686: {'location_id': array(['A', 'B'], dtype=object),
'occupancy': array([0.5, 0.5]),
'b_factor': <Quantity([0.2466 0.2467], 'nanometer ** 2')>,
'atom_id': [2687, 2688],
'coordinates': <Quantity([[3.2742 2.2579 0.1536]
[3.2757 2.2571 0.1533]], 'nanometer')>},
2687: {'location_id': array(['A', 'B'], dtype=object),
'occupancy': array([0.5, 0.5]),
'b_factor': <Quantity([0.2594 0.2596], 'nanometer ** 2')>,
'atom_id': [2689, 2690],
'coordinates': <Quantity([[3.1412 2.241 0.1076]
[3.3396 2.192 0.2619]], 'nanometer')>}}]
If in addition to how to get the missing heavy atoms, you want to know a bit more on how to add the missing hydrogens or how to solve the problem of having atoms with alternate locations, check the following sections: XXX and YYY.
See also
molsysmt.build.get_missing_heavy_atoms()
, molsysmt.basic.convert()
, molsysmt.basic.info()
,molsysmt.basic.get()
, molsysmt.basic.contains()
, molsysmt.build.get_atoms_with_alternate_locations()