p3d.protein module

From p3d
Jump to: navigation, search

The protein module holds the Protein class with a set of functions to load protein data bank files into the structure, intialise all necessary associated modules and perform queries using the p3d.parser module.

Protein class

p3d.protein.Protein(<filename>,chains=None, MaxAtomsPerLeaf=24,DunbrackNaming=False,BSPTree=True)

returns an object with the initialised protein.

Optional keys are:

chains
usage: chains=['A','B','C'], will only load the protein atoms that are part of the list specified
default: None, the complete protein will be read in.
MaxAtomsPerLeaf
usage: MaxAtomsPerLeaf=12, specifies the amount of atoms per leaf on the BSP Tree. The user can define the balance of the BSP
tree query with this value. More atoms per leaf will reduce the size of the tree, thus decreasing its building time, but trading off the BSP Tree query time. A smaller amount of atoms per tree increases the building time but each query will be faster.
default: 24
DunbrackNaming
usage: DunbrackNaming=True, will only load the protein atoms that are part of the chain specified in the filename.
That is 2AXTA will only read in chain A of 2AXT. This naming convention is taken from the PISCES web server developed by Guoli Wang et al. [1] in the Dunbrack laboratory [2].
default: False, p3d will not check for a 5th letter in the PDB file name.
BSPTree
usage: BSPTree=True, will create the BSP tree for faster 3D queries.
default: True.


Example <source lang="python">

Python 3.0.1 (r301:69556, Mar  1 2009, 17:09:34) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from p3d.protein import Protein
>>> pdb = Protein('../pdbs/1MZ4.pdb')

</source>

Attributes

filename

returns '../pdbs/1MZ4.pdb'

fullname

return '1MZ4.pdb'

header

returns a list of the original pdb file header, e.g. JRNL

<source lang="python"> Python 2.6.1 (r261:67515, Apr 3 2009, 17:01:23) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from p3d.protein import Protein >>> pdb = Protein('1AE9.pdb') >>> for line in pdb.header: ... print line.strip() ... HEADER DNA RECOMBINATION 06-MAR-97 1AE9 TITLE STRUCTURE OF THE LAMBDA INTEGRASE CATALYTIC CORE COMPND MOL_ID: 1; COMPND 2 MOLECULE: LAMBDA INTEGRASE; COMPND 3 CHAIN: A, B; COMPND 4 FRAGMENT: CATALYTIC DOMAIN; COMPND 5 ENGINEERED: YES; COMPND 6 MUTATION: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: BACTERIOPHAGE LAMBDA; SOURCE 3 EXPRESSION_SYSTEM: ESCHERICHIA COLI KEYWDS DNA RECOMBINATION, INTEGRASE, SITE-SPECIFIC RECOMBINATION EXPDTA X-RAY DIFFRACTION AUTHOR H.J.KWON,R.TIRUMALAI,A.LANDY,T.ELLENBERGER REVDAT 1 19-NOV-97 1AE9 0 JRNL AUTH H.J.KWON,R.TIRUMALAI,A.LANDY,T.ELLENBERGER JRNL TITL FLEXIBILITY IN DNA RECOMBINATION: STRUCTURE OF THE JRNL TITL 2 LAMBDA INTEGRASE CATALYTIC CORE. JRNL REF SCIENCE V. 276 126 1997 JRNL REFN ASTM SCIEAS US ISSN 0036-8075 REMARK 1 </source>

resolution

Resolution of crystal, set to 0.01 if structure was determined by NMR

atoms

list of all atom objects

dunbrackChain

if filename contains an additional letter specifying a chain in the dunbrack/Pisces convention, p3d will only read this chain.

hash

holds a dictionary with all the hashed lookup tables. Each hash dictionary has a subdictionary that holds a set of pointers that point to the actual atom objects.
Current hashes are:
  • chain
  • atype
  • resid
  • resname
  • non-aa-resname
  • aa-resname
  • model
  • bkb
  • oxygen
  • nitrogen
  • non-protein
  • alpha
  • protein

The p3d.parser uses these hashes and translates them into algebra set operations. They can be done manually if needed, e.g. <source lang="python">

for atom in pdb.hash["resid"][20] & pdb.hash["oxygen"][""]:
...  print(atom.info())
... 
O   HOH     20
O   THR A   20
OG1 THR A   20

</source>

Methods

info()

returns a list of strings with information about the protein, the hashes and the BSP Tree

<source lang="python">

>>> for line in pdb.info():
...  print(line)
... 

</source> A sample of the output can be found here.

query(query-string)

returns a set of atoms that matches the query string.
The syntax of the query string can be found under p3d.parser.

lookUpAtom(query-string)

returns an atom type if the string used to query the protein yields one element, otherwise an error message is printed.
The syntax of the query string can be found under p3d.parser.

collectSphereAtoms(centre=Atom|Vector, radius=float)

returns a set of atoms within a radius of a given vector or atom.

writeToFile(filename,includeOrgHeader=False)

this function will write the protein object in pdb format into a new file.
if includeOrgHeader is set to True the original pdb header will be included

firstResidueOfChain(chain,idOnly=False)

returns a set of atoms or the residue id of the first residue in the given chain.

lastResidueOfChain(chain,idOnly=False)

returns a set of atoms or the residue id of the last residue in the given chain.