Discussion:
[Biopython-dev] [Bug 3096] New: PPBuilder build_peptides bugs
bugzilla-daemon
2010-06-08 22:52:28 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096

Summary: PPBuilder build_peptides bugs
Product: Biopython
Version: Not Applicable
Platform: Other
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: skong at zymeworks.com


Given a chain of backbone connected residues 'IXRGXTGL' that contains two
non-standard amino acids 'X' in between, building peptide with only standard
amino acid builder should return two peptides 'RG' and 'TGL'. 'I' should not be
returned as a peptide since it is just one residue. Currently biopython would
return 'IXGXGL', with two bugs in between:

1. Skipping a standard amino acid R and T after each X, while keeping X (Should
skip X instead not R or T). Related to
http://bugzilla.open-bio.org/show_bug.cgi?id=2910 and
http://lists.open-bio.org/pipermail/biopython/2009-September/005532.html
2. Return one peptide even though after filtering the two X residues which
connect 'I', 'RG', 'TGL' are no longer present and fragment 'IRGTGL' cannot be
considered as a valid peptide without the two Xs connecting them.

The above sequence 'IXRGXTGL' are taken from 1bfe and mutated. The 'mutation'
referred here is simply renaming the residue name to something that is not
standard and represented as 'X'.

Each solution proposed below is meant to fix respective bug above:
1. Insert (not accept(prev) or not accept(next)) after if aa_only check at line
299 of Bio/PDB/Polypeptide.py
2. Insert pp=None when either of the residues compared are filtered at line 300
or Bio/PDB/Polypeptide.py


Amino acids filtering bug in method build_peptides() of class _PPBuilder ofin
Bio/PDB/Polypeptide.py:

Original:
for chain in chain_list:
chain_it=iter(chain)
prev=chain_it.next()
pp=None
for next in chain_it:
if aa_only and not accept(prev):
prev=next
continue
if is_connected(prev, next):
if pp is None:
pp=Polypeptide()
pp.append(prev)
pp_list.append(pp)
pp.append(next)
else:
pp=None
prev=next
return pp_list


Fixed:

for chain in chain_list:
chain_it=iter(chain)
prev=chain_it.next()
pp=None
for next in chain_it:
if aa_only and (not accept(prev) or not accept(next)):
prev=next; pp=None
continue
if is_connected(prev, next):
if pp is None:
pp=Polypeptide()
pp.append(prev)
pp_list.append(pp)
pp.append(next)
else:
pp=None
prev=next
return pp_list

Attached here is the code used to test the above case, with and without
mutations, and with and without standard amino acid filtering. The case without
mutation is just to show that the backbone atoms of the mutated version are
connected:

from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import PPBuilder, is_aa

class StandardAABuilder(PPBuilder):
""" Polypeptide builder which accepts only standard amino acids."""
def _accept(self, residue):
return is_aa(residue, standard=True)

def extract_peptides(model):
"""Extracts the peptides from a model.
Returns a list of Peptide object."""
output = []
for peptide in PPBuilder().build_peptides(model):
seq = str(peptide.get_sequence())
output.append(seq)
return output

def extract_peptides_saa(model):
"""Extracts the peptides from a model.
Returns a list of Peptide object."""
output = []
for peptide in StandardAABuilder().build_peptides(model):
seq = str(peptide.get_sequence())
output.append(seq)
return output

if __name__ == '__main__':

oripdb = open('chopped_pdb1bfe.ent')
sto = PDBParser().get_structure('', oripdb)
seqao = extract_peptides(sto)
seqbo = extract_peptides_saa(sto)
print 'ori seq all '
print seqao
print 'ori seq standard only'
print seqbo

pdb = open('chopped_mutated_pdb1bfe.ent')
st = PDBParser().get_structure('', pdb)
seqa = extract_peptides(st)
seqb = extract_peptides_saa(st)
print 'mut seq all'
print seqa
print 'mut seq standard only '
print seqb


Attached below are the two fragments of PDB files, pre and post mutated.

chopped_pdb1bfe.ent
ATOM 85 N ILE A 316 37.386 71.217 31.070 1.00 36.97 N
ATOM 86 CA ILE A 316 38.311 71.290 29.949 1.00 33.71 C
ATOM 87 C ILE A 316 37.634 72.103 28.862 1.00 33.93 C
ATOM 88 O ILE A 316 36.415 72.216 28.839 1.00 36.46 O
ATOM 89 CB ILE A 316 38.651 69.876 29.404 1.00 35.79 C
ATOM 90 CG1 ILE A 316 39.331 69.049 30.501 1.00 36.78 C
ATOM 91 CG2 ILE A 316 39.572 69.979 28.187 1.00 37.71 C
ATOM 92 CD1 ILE A 316 39.881 67.724 30.023 1.00 39.20 C
ATOM 93 N HIS A 317 38.425 72.679 27.969 1.00 35.61 N
ATOM 94 CA HIS A 317 37.880 73.473 26.881 1.00 37.92 C
ATOM 95 C HIS A 317 38.360 72.928 25.540 1.00 37.79 C
ATOM 96 O HIS A 317 39.463 73.240 25.094 1.00 37.44 O
ATOM 97 CB HIS A 317 38.303 74.930 27.052 1.00 35.19 C
ATOM 98 CG HIS A 317 37.888 75.519 28.363 1.00 35.76 C
ATOM 99 ND1 HIS A 317 36.611 75.981 28.602 1.00 37.74 N
ATOM 100 CD2 HIS A 317 38.575 75.701 29.516 1.00 37.59 C
ATOM 101 CE1 HIS A 317 36.529 76.420 29.844 1.00 38.74 C
ATOM 102 NE2 HIS A 317 37.706 76.262 30.421 1.00 36.76 N
ATOM 103 N ARG A 318 37.527 72.109 24.905 1.00 38.78 N
ATOM 104 CA ARG A 318 37.884 71.512 23.627 1.00 42.04 C
ATOM 105 C ARG A 318 38.469 72.559 22.699 1.00 45.14 C
ATOM 106 O ARG A 318 39.592 72.425 22.205 1.00 42.05 O
ATOM 107 CB ARG A 318 36.657 70.880 22.967 1.00 42.93 C
ATOM 108 CG ARG A 318 36.934 70.321 21.576 1.00 38.60 C
ATOM 109 CD ARG A 318 35.654 70.038 20.821 1.00 35.39 C
ATOM 110 NE ARG A 318 34.624 69.538 21.724 1.00 34.96 N
ATOM 111 CZ ARG A 318 34.539 68.278 22.141 1.00 31.51 C
ATOM 112 NH1 ARG A 318 35.419 67.373 21.736 1.00 25.19 N
ATOM 113 NH2 ARG A 318 33.579 67.929 22.983 1.00 29.10 N
ATOM 114 N GLY A 319 37.690 73.604 22.461 1.00 49.96 N
ATOM 115 CA GLY A 319 38.138 74.668 21.592 1.00 55.53 C
ATOM 116 C GLY A 319 38.459 74.219 20.180 1.00 58.85 C
ATOM 117 O GLY A 319 37.583 73.766 19.440 1.00 58.98 O
ATOM 118 N SER A 320 39.734 74.334 19.823 1.00 61.64 N
ATOM 119 CA SER A 320 40.219 73.992 18.493 1.00 63.16 C
ATOM 120 C SER A 320 40.212 72.517 18.110 1.00 65.27 C
ATOM 121 O SER A 320 39.558 72.127 17.145 1.00 65.12 O
ATOM 122 CB SER A 320 41.634 74.542 18.316 1.00 65.36 C
ATOM 123 OG SER A 320 42.124 74.255 17.019 1.00 72.05 O
ATOM 124 N THR A 321 40.955 71.702 18.853 1.00 67.43 N
ATOM 125 CA THR A 321 41.049 70.274 18.562 1.00 67.73 C
ATOM 126 C THR A 321 40.220 69.430 19.529 1.00 66.41 C
ATOM 127 O THR A 321 39.244 69.917 20.095 1.00 70.21 O
ATOM 128 CB THR A 321 42.517 69.810 18.620 1.00 70.22 C
ATOM 129 OG1 THR A 321 42.613 68.453 18.169 1.00 77.03 O
ATOM 130 CG2 THR A 321 43.049 69.915 20.045 1.00 72.07 C
ATOM 131 N GLY A 322 40.608 68.168 19.707 1.00 61.22 N
ATOM 132 CA GLY A 322 39.892 67.286 20.614 1.00 53.23 C
ATOM 133 C GLY A 322 40.037 67.705 22.065 1.00 48.00 C
ATOM 134 O GLY A 322 40.138 68.892 22.372 1.00 50.41 O
ATOM 135 N LEU A 323 40.044 66.734 22.968 1.00 41.92 N
ATOM 136 CA LEU A 323 40.190 67.033 24.385 1.00 35.58 C
ATOM 137 C LEU A 323 41.613 66.738 24.874 1.00 31.41 C
ATOM 138 O LEU A 323 41.932 66.921 26.046 1.00 30.47 O
ATOM 139 CB LEU A 323 39.160 66.240 25.191 1.00 35.76 C
ATOM 140 CG LEU A 323 37.716 66.576 24.802 1.00 39.50 C
ATOM 141 CD1 LEU A 323 36.733 65.796 25.670 1.00 38.15 C
ATOM 142 CD2 LEU A 323 37.493 68.074 24.955 1.00 38.58 C

PDB FILE: mutated_chopped_pdb1bfe.ent
ATOM 85 N ILE A 316 37.386 71.217 31.070 1.00 36.97 N
ATOM 86 CA ILE A 316 38.311 71.290 29.949 1.00 33.71 C
ATOM 87 C ILE A 316 37.634 72.103 28.862 1.00 33.93 C
ATOM 88 O ILE A 316 36.415 72.216 28.839 1.00 36.46 O
ATOM 89 CB ILE A 316 38.651 69.876 29.404 1.00 35.79 C
ATOM 90 CG1 ILE A 316 39.331 69.049 30.501 1.00 36.78 C
ATOM 91 CG2 ILE A 316 39.572 69.979 28.187 1.00 37.71 C
ATOM 92 CD1 ILE A 316 39.881 67.724 30.023 1.00 39.20 C
ATOM 93 N HIE A 317 38.425 72.679 27.969 1.00 35.61 N
ATOM 94 CA HIE A 317 37.880 73.473 26.881 1.00 37.92 C
ATOM 95 C HIE A 317 38.360 72.928 25.540 1.00 37.79 C
ATOM 96 O HIE A 317 39.463 73.240 25.094 1.00 37.44 O
ATOM 97 CB HIE A 317 38.303 74.930 27.052 1.00 35.19 C
ATOM 98 CG HIE A 317 37.888 75.519 28.363 1.00 35.76 C
ATOM 99 ND1 HIE A 317 36.611 75.981 28.602 1.00 37.74 N
ATOM 100 CD2 HIE A 317 38.575 75.701 29.516 1.00 37.59 C
ATOM 101 CE1 HIE A 317 36.529 76.420 29.844 1.00 38.74 C
ATOM 102 NE2 HIE A 317 37.706 76.262 30.421 1.00 36.76 N
ATOM 103 N ARG A 318 37.527 72.109 24.905 1.00 38.78 N
ATOM 104 CA ARG A 318 37.884 71.512 23.627 1.00 42.04 C
ATOM 105 C ARG A 318 38.469 72.559 22.699 1.00 45.14 C
ATOM 106 O ARG A 318 39.592 72.425 22.205 1.00 42.05 O
ATOM 107 CB ARG A 318 36.657 70.880 22.967 1.00 42.93 C
ATOM 108 CG ARG A 318 36.934 70.321 21.576 1.00 38.60 C
ATOM 109 CD ARG A 318 35.654 70.038 20.821 1.00 35.39 C
ATOM 110 NE ARG A 318 34.624 69.538 21.724 1.00 34.96 N
ATOM 111 CZ ARG A 318 34.539 68.278 22.141 1.00 31.51 C
ATOM 112 NH1 ARG A 318 35.419 67.373 21.736 1.00 25.19 N
ATOM 113 NH2 ARG A 318 33.579 67.929 22.983 1.00 29.10 N
ATOM 114 N GLY A 319 37.690 73.604 22.461 1.00 49.96 N
ATOM 115 CA GLY A 319 38.138 74.668 21.592 1.00 55.53 C
ATOM 116 C GLY A 319 38.459 74.219 20.180 1.00 58.85 C
ATOM 117 O GLY A 319 37.583 73.766 19.440 1.00 58.98 O
ATOM 118 N XQQ A 320 39.734 74.334 19.823 1.00 61.64 N
ATOM 119 CA XQQ A 320 40.219 73.992 18.493 1.00 63.16 C
ATOM 120 C XQQ A 320 40.212 72.517 18.110 1.00 65.27 C
ATOM 121 O XQQ A 320 39.558 72.127 17.145 1.00 65.12 O
ATOM 122 CB XQQ A 320 41.634 74.542 18.316 1.00 65.36 C
ATOM 123 OG XQQ A 320 42.124 74.255 17.019 1.00 72.05 O
ATOM 124 N THR A 321 40.955 71.702 18.853 1.00 67.43 N
ATOM 125 CA THR A 321 41.049 70.274 18.562 1.00 67.73 C
ATOM 126 C THR A 321 40.220 69.430 19.529 1.00 66.41 C
ATOM 127 O THR A 321 39.244 69.917 20.095 1.00 70.21 O
ATOM 128 CB THR A 321 42.517 69.810 18.620 1.00 70.22 C
ATOM 129 OG1 THR A 321 42.613 68.453 18.169 1.00 77.03 O
ATOM 130 CG2 THR A 321 43.049 69.915 20.045 1.00 72.07 C
ATOM 131 N GLY A 322 40.608 68.168 19.707 1.00 61.22 N
ATOM 132 CA GLY A 322 39.892 67.286 20.614 1.00 53.23 C
ATOM 133 C GLY A 322 40.037 67.705 22.065 1.00 48.00 C
ATOM 134 O GLY A 322 40.138 68.892 22.372 1.00 50.41 O
ATOM 135 N LEU A 323 40.044 66.734 22.968 1.00 41.92 N
ATOM 136 CA LEU A 323 40.190 67.033 24.385 1.00 35.58 C
ATOM 137 C LEU A 323 41.613 66.738 24.874 1.00 31.41 C
ATOM 138 O LEU A 323 41.932 66.921 26.046 1.00 30.47 O
ATOM 139 CB LEU A 323 39.160 66.240 25.191 1.00 35.76 C
ATOM 140 CG LEU A 323 37.716 66.576 24.802 1.00 39.50 C
ATOM 141 CD1 LEU A 323 36.733 65.796 25.670 1.00 38.15 C
ATOM 142 CD2 LEU A 323 37.493 68.074 24.955 1.00 38.58 C
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-06-09 08:43:02 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096





------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-06-09 04:43 EST -------
(In reply to comment #0)
Post by bugzilla-daemon
Given a chain of backbone connected residues 'IXRGXTGL' that contains two
non-standard amino acids 'X' in between, building peptide with only standard
amino acid builder should return two peptides 'RG' and 'TGL'. 'I' should not
be returned as a peptide since it is just one residue. Currently biopython
What is wrong with returning 'IXGXGL'? The PDB contains a peptide of six
linked residues doesn't it? It looks like Bio.PDB is doing something sensible.

P.S. You didn't fill in which version of Biopython you are using.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-06-08 22:52:28 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096

Summary: PPBuilder build_peptides bugs
Product: Biopython
Version: Not Applicable
Platform: Other
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: skong at zymeworks.com


Given a chain of backbone connected residues 'IXRGXTGL' that contains two
non-standard amino acids 'X' in between, building peptide with only standard
amino acid builder should return two peptides 'RG' and 'TGL'. 'I' should not be
returned as a peptide since it is just one residue. Currently biopython would
return 'IXGXGL', with two bugs in between:

1. Skipping a standard amino acid R and T after each X, while keeping X (Should
skip X instead not R or T). Related to
http://bugzilla.open-bio.org/show_bug.cgi?id=2910 and
http://lists.open-bio.org/pipermail/biopython/2009-September/005532.html
2. Return one peptide even though after filtering the two X residues which
connect 'I', 'RG', 'TGL' are no longer present and fragment 'IRGTGL' cannot be
considered as a valid peptide without the two Xs connecting them.

The above sequence 'IXRGXTGL' are taken from 1bfe and mutated. The 'mutation'
referred here is simply renaming the residue name to something that is not
standard and represented as 'X'.

Each solution proposed below is meant to fix respective bug above:
1. Insert (not accept(prev) or not accept(next)) after if aa_only check at line
299 of Bio/PDB/Polypeptide.py
2. Insert pp=None when either of the residues compared are filtered at line 300
or Bio/PDB/Polypeptide.py


Amino acids filtering bug in method build_peptides() of class _PPBuilder ofin
Bio/PDB/Polypeptide.py:

Original:
for chain in chain_list:
chain_it=iter(chain)
prev=chain_it.next()
pp=None
for next in chain_it:
if aa_only and not accept(prev):
prev=next
continue
if is_connected(prev, next):
if pp is None:
pp=Polypeptide()
pp.append(prev)
pp_list.append(pp)
pp.append(next)
else:
pp=None
prev=next
return pp_list


Fixed:

for chain in chain_list:
chain_it=iter(chain)
prev=chain_it.next()
pp=None
for next in chain_it:
if aa_only and (not accept(prev) or not accept(next)):
prev=next; pp=None
continue
if is_connected(prev, next):
if pp is None:
pp=Polypeptide()
pp.append(prev)
pp_list.append(pp)
pp.append(next)
else:
pp=None
prev=next
return pp_list

Attached here is the code used to test the above case, with and without
mutations, and with and without standard amino acid filtering. The case without
mutation is just to show that the backbone atoms of the mutated version are
connected:

from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import PPBuilder, is_aa

class StandardAABuilder(PPBuilder):
""" Polypeptide builder which accepts only standard amino acids."""
def _accept(self, residue):
return is_aa(residue, standard=True)

def extract_peptides(model):
"""Extracts the peptides from a model.
Returns a list of Peptide object."""
output = []
for peptide in PPBuilder().build_peptides(model):
seq = str(peptide.get_sequence())
output.append(seq)
return output

def extract_peptides_saa(model):
"""Extracts the peptides from a model.
Returns a list of Peptide object."""
output = []
for peptide in StandardAABuilder().build_peptides(model):
seq = str(peptide.get_sequence())
output.append(seq)
return output

if __name__ == '__main__':

oripdb = open('chopped_pdb1bfe.ent')
sto = PDBParser().get_structure('', oripdb)
seqao = extract_peptides(sto)
seqbo = extract_peptides_saa(sto)
print 'ori seq all '
print seqao
print 'ori seq standard only'
print seqbo

pdb = open('chopped_mutated_pdb1bfe.ent')
st = PDBParser().get_structure('', pdb)
seqa = extract_peptides(st)
seqb = extract_peptides_saa(st)
print 'mut seq all'
print seqa
print 'mut seq standard only '
print seqb


Attached below are the two fragments of PDB files, pre and post mutated.

chopped_pdb1bfe.ent
ATOM 85 N ILE A 316 37.386 71.217 31.070 1.00 36.97 N
ATOM 86 CA ILE A 316 38.311 71.290 29.949 1.00 33.71 C
ATOM 87 C ILE A 316 37.634 72.103 28.862 1.00 33.93 C
ATOM 88 O ILE A 316 36.415 72.216 28.839 1.00 36.46 O
ATOM 89 CB ILE A 316 38.651 69.876 29.404 1.00 35.79 C
ATOM 90 CG1 ILE A 316 39.331 69.049 30.501 1.00 36.78 C
ATOM 91 CG2 ILE A 316 39.572 69.979 28.187 1.00 37.71 C
ATOM 92 CD1 ILE A 316 39.881 67.724 30.023 1.00 39.20 C
ATOM 93 N HIS A 317 38.425 72.679 27.969 1.00 35.61 N
ATOM 94 CA HIS A 317 37.880 73.473 26.881 1.00 37.92 C
ATOM 95 C HIS A 317 38.360 72.928 25.540 1.00 37.79 C
ATOM 96 O HIS A 317 39.463 73.240 25.094 1.00 37.44 O
ATOM 97 CB HIS A 317 38.303 74.930 27.052 1.00 35.19 C
ATOM 98 CG HIS A 317 37.888 75.519 28.363 1.00 35.76 C
ATOM 99 ND1 HIS A 317 36.611 75.981 28.602 1.00 37.74 N
ATOM 100 CD2 HIS A 317 38.575 75.701 29.516 1.00 37.59 C
ATOM 101 CE1 HIS A 317 36.529 76.420 29.844 1.00 38.74 C
ATOM 102 NE2 HIS A 317 37.706 76.262 30.421 1.00 36.76 N
ATOM 103 N ARG A 318 37.527 72.109 24.905 1.00 38.78 N
ATOM 104 CA ARG A 318 37.884 71.512 23.627 1.00 42.04 C
ATOM 105 C ARG A 318 38.469 72.559 22.699 1.00 45.14 C
ATOM 106 O ARG A 318 39.592 72.425 22.205 1.00 42.05 O
ATOM 107 CB ARG A 318 36.657 70.880 22.967 1.00 42.93 C
ATOM 108 CG ARG A 318 36.934 70.321 21.576 1.00 38.60 C
ATOM 109 CD ARG A 318 35.654 70.038 20.821 1.00 35.39 C
ATOM 110 NE ARG A 318 34.624 69.538 21.724 1.00 34.96 N
ATOM 111 CZ ARG A 318 34.539 68.278 22.141 1.00 31.51 C
ATOM 112 NH1 ARG A 318 35.419 67.373 21.736 1.00 25.19 N
ATOM 113 NH2 ARG A 318 33.579 67.929 22.983 1.00 29.10 N
ATOM 114 N GLY A 319 37.690 73.604 22.461 1.00 49.96 N
ATOM 115 CA GLY A 319 38.138 74.668 21.592 1.00 55.53 C
ATOM 116 C GLY A 319 38.459 74.219 20.180 1.00 58.85 C
ATOM 117 O GLY A 319 37.583 73.766 19.440 1.00 58.98 O
ATOM 118 N SER A 320 39.734 74.334 19.823 1.00 61.64 N
ATOM 119 CA SER A 320 40.219 73.992 18.493 1.00 63.16 C
ATOM 120 C SER A 320 40.212 72.517 18.110 1.00 65.27 C
ATOM 121 O SER A 320 39.558 72.127 17.145 1.00 65.12 O
ATOM 122 CB SER A 320 41.634 74.542 18.316 1.00 65.36 C
ATOM 123 OG SER A 320 42.124 74.255 17.019 1.00 72.05 O
ATOM 124 N THR A 321 40.955 71.702 18.853 1.00 67.43 N
ATOM 125 CA THR A 321 41.049 70.274 18.562 1.00 67.73 C
ATOM 126 C THR A 321 40.220 69.430 19.529 1.00 66.41 C
ATOM 127 O THR A 321 39.244 69.917 20.095 1.00 70.21 O
ATOM 128 CB THR A 321 42.517 69.810 18.620 1.00 70.22 C
ATOM 129 OG1 THR A 321 42.613 68.453 18.169 1.00 77.03 O
ATOM 130 CG2 THR A 321 43.049 69.915 20.045 1.00 72.07 C
ATOM 131 N GLY A 322 40.608 68.168 19.707 1.00 61.22 N
ATOM 132 CA GLY A 322 39.892 67.286 20.614 1.00 53.23 C
ATOM 133 C GLY A 322 40.037 67.705 22.065 1.00 48.00 C
ATOM 134 O GLY A 322 40.138 68.892 22.372 1.00 50.41 O
ATOM 135 N LEU A 323 40.044 66.734 22.968 1.00 41.92 N
ATOM 136 CA LEU A 323 40.190 67.033 24.385 1.00 35.58 C
ATOM 137 C LEU A 323 41.613 66.738 24.874 1.00 31.41 C
ATOM 138 O LEU A 323 41.932 66.921 26.046 1.00 30.47 O
ATOM 139 CB LEU A 323 39.160 66.240 25.191 1.00 35.76 C
ATOM 140 CG LEU A 323 37.716 66.576 24.802 1.00 39.50 C
ATOM 141 CD1 LEU A 323 36.733 65.796 25.670 1.00 38.15 C
ATOM 142 CD2 LEU A 323 37.493 68.074 24.955 1.00 38.58 C

PDB FILE: mutated_chopped_pdb1bfe.ent
ATOM 85 N ILE A 316 37.386 71.217 31.070 1.00 36.97 N
ATOM 86 CA ILE A 316 38.311 71.290 29.949 1.00 33.71 C
ATOM 87 C ILE A 316 37.634 72.103 28.862 1.00 33.93 C
ATOM 88 O ILE A 316 36.415 72.216 28.839 1.00 36.46 O
ATOM 89 CB ILE A 316 38.651 69.876 29.404 1.00 35.79 C
ATOM 90 CG1 ILE A 316 39.331 69.049 30.501 1.00 36.78 C
ATOM 91 CG2 ILE A 316 39.572 69.979 28.187 1.00 37.71 C
ATOM 92 CD1 ILE A 316 39.881 67.724 30.023 1.00 39.20 C
ATOM 93 N HIE A 317 38.425 72.679 27.969 1.00 35.61 N
ATOM 94 CA HIE A 317 37.880 73.473 26.881 1.00 37.92 C
ATOM 95 C HIE A 317 38.360 72.928 25.540 1.00 37.79 C
ATOM 96 O HIE A 317 39.463 73.240 25.094 1.00 37.44 O
ATOM 97 CB HIE A 317 38.303 74.930 27.052 1.00 35.19 C
ATOM 98 CG HIE A 317 37.888 75.519 28.363 1.00 35.76 C
ATOM 99 ND1 HIE A 317 36.611 75.981 28.602 1.00 37.74 N
ATOM 100 CD2 HIE A 317 38.575 75.701 29.516 1.00 37.59 C
ATOM 101 CE1 HIE A 317 36.529 76.420 29.844 1.00 38.74 C
ATOM 102 NE2 HIE A 317 37.706 76.262 30.421 1.00 36.76 N
ATOM 103 N ARG A 318 37.527 72.109 24.905 1.00 38.78 N
ATOM 104 CA ARG A 318 37.884 71.512 23.627 1.00 42.04 C
ATOM 105 C ARG A 318 38.469 72.559 22.699 1.00 45.14 C
ATOM 106 O ARG A 318 39.592 72.425 22.205 1.00 42.05 O
ATOM 107 CB ARG A 318 36.657 70.880 22.967 1.00 42.93 C
ATOM 108 CG ARG A 318 36.934 70.321 21.576 1.00 38.60 C
ATOM 109 CD ARG A 318 35.654 70.038 20.821 1.00 35.39 C
ATOM 110 NE ARG A 318 34.624 69.538 21.724 1.00 34.96 N
ATOM 111 CZ ARG A 318 34.539 68.278 22.141 1.00 31.51 C
ATOM 112 NH1 ARG A 318 35.419 67.373 21.736 1.00 25.19 N
ATOM 113 NH2 ARG A 318 33.579 67.929 22.983 1.00 29.10 N
ATOM 114 N GLY A 319 37.690 73.604 22.461 1.00 49.96 N
ATOM 115 CA GLY A 319 38.138 74.668 21.592 1.00 55.53 C
ATOM 116 C GLY A 319 38.459 74.219 20.180 1.00 58.85 C
ATOM 117 O GLY A 319 37.583 73.766 19.440 1.00 58.98 O
ATOM 118 N XQQ A 320 39.734 74.334 19.823 1.00 61.64 N
ATOM 119 CA XQQ A 320 40.219 73.992 18.493 1.00 63.16 C
ATOM 120 C XQQ A 320 40.212 72.517 18.110 1.00 65.27 C
ATOM 121 O XQQ A 320 39.558 72.127 17.145 1.00 65.12 O
ATOM 122 CB XQQ A 320 41.634 74.542 18.316 1.00 65.36 C
ATOM 123 OG XQQ A 320 42.124 74.255 17.019 1.00 72.05 O
ATOM 124 N THR A 321 40.955 71.702 18.853 1.00 67.43 N
ATOM 125 CA THR A 321 41.049 70.274 18.562 1.00 67.73 C
ATOM 126 C THR A 321 40.220 69.430 19.529 1.00 66.41 C
ATOM 127 O THR A 321 39.244 69.917 20.095 1.00 70.21 O
ATOM 128 CB THR A 321 42.517 69.810 18.620 1.00 70.22 C
ATOM 129 OG1 THR A 321 42.613 68.453 18.169 1.00 77.03 O
ATOM 130 CG2 THR A 321 43.049 69.915 20.045 1.00 72.07 C
ATOM 131 N GLY A 322 40.608 68.168 19.707 1.00 61.22 N
ATOM 132 CA GLY A 322 39.892 67.286 20.614 1.00 53.23 C
ATOM 133 C GLY A 322 40.037 67.705 22.065 1.00 48.00 C
ATOM 134 O GLY A 322 40.138 68.892 22.372 1.00 50.41 O
ATOM 135 N LEU A 323 40.044 66.734 22.968 1.00 41.92 N
ATOM 136 CA LEU A 323 40.190 67.033 24.385 1.00 35.58 C
ATOM 137 C LEU A 323 41.613 66.738 24.874 1.00 31.41 C
ATOM 138 O LEU A 323 41.932 66.921 26.046 1.00 30.47 O
ATOM 139 CB LEU A 323 39.160 66.240 25.191 1.00 35.76 C
ATOM 140 CG LEU A 323 37.716 66.576 24.802 1.00 39.50 C
ATOM 141 CD1 LEU A 323 36.733 65.796 25.670 1.00 38.15 C
ATOM 142 CD2 LEU A 323 37.493 68.074 24.955 1.00 38.58 C
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-06-09 08:43:02 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096





------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-06-09 04:43 EST -------
(In reply to comment #0)
Post by bugzilla-daemon
Given a chain of backbone connected residues 'IXRGXTGL' that contains two
non-standard amino acids 'X' in between, building peptide with only standard
amino acid builder should return two peptides 'RG' and 'TGL'. 'I' should not
be returned as a peptide since it is just one residue. Currently biopython
What is wrong with returning 'IXGXGL'? The PDB contains a peptide of six
linked residues doesn't it? It looks like Bio.PDB is doing something sensible.

P.S. You didn't fill in which version of Biopython you are using.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-08-13 17:52:49 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096





------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-13 13:52 EST -------
Hi Siong,

I've been going over your example again (and adding some doctests to
Bio/PDB/Polypeptide.py as well).

It seems to me that in order to show this "bug" you have had to override the
builder class' private _accept() method. If in doing so you break the default
build_peptides() method, then you should probably also override that too.

Can you show a problem without subclassing the builder object?

There may be scope for enhancement, but you haven't convinced me there is a
bug here.

Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-08-13 22:23:24 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096


skong at zymeworks.com changed:

What |Removed |Added
----------------------------------------------------------------------------
Version|Not Applicable |1.53




------- Comment #3 from skong at zymeworks.com 2010-08-13 18:23 EST -------
Hi Peter,

I manage to produce the problem without modifying _accept().

DIAGNOSTIC SCRIPT:
from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import PPBuilder, is_aa

def extract_peptides(model):
"""Extracts the peptides from a model.
Returns a list of Peptide object."""
output = []
for peptide in PPBuilder().build_peptides(model):
seq = str(peptide.get_sequence())
output.append(seq)
return output

if __name__ == '__main__':

pdb = open('chopped_pdb1bfe_noca.ent')
st = PDBParser().get_structure('', pdb)
seqa = extract_peptides(st)
print 'no ca seq all'
print seqa


PDB FILE: chopped_pdb1bfe_noca.ent
ATOM 85 N ILE A 316 37.386 71.217 31.070 1.00 36.97 N
ATOM 86 CA ILE A 316 38.311 71.290 29.949 1.00 33.71 C
ATOM 87 C ILE A 316 37.634 72.103 28.862 1.00 33.93 C
ATOM 88 O ILE A 316 36.415 72.216 28.839 1.00 36.46 O
ATOM 89 CB ILE A 316 38.651 69.876 29.404 1.00 35.79 C
ATOM 90 CG1 ILE A 316 39.331 69.049 30.501 1.00 36.78 C
ATOM 91 CG2 ILE A 316 39.572 69.979 28.187 1.00 37.71 C
ATOM 92 CD1 ILE A 316 39.881 67.724 30.023 1.00 39.20 C
ATOM 93 N HIS A 317 38.425 72.679 27.969 1.00 35.61 N
ATOM 94 CA HIS A 317 37.880 73.473 26.881 1.00 37.92 C
ATOM 95 C HIS A 317 38.360 72.928 25.540 1.00 37.79 C
ATOM 96 O HIS A 317 39.463 73.240 25.094 1.00 37.44 O
ATOM 97 CB HIS A 317 38.303 74.930 27.052 1.00 35.19 C
ATOM 98 CG HIS A 317 37.888 75.519 28.363 1.00 35.76 C
ATOM 99 ND1 HIS A 317 36.611 75.981 28.602 1.00 37.74 N
ATOM 100 CD2 HIS A 317 38.575 75.701 29.516 1.00 37.59 C
ATOM 101 CE1 HIS A 317 36.529 76.420 29.844 1.00 38.74 C
ATOM 102 NE2 HIS A 317 37.706 76.262 30.421 1.00 36.76 N
ATOM 103 N ARG A 318 37.527 72.109 24.905 1.00 38.78 N
ATOM 104 CA ARG A 318 37.884 71.512 23.627 1.00 42.04 C
ATOM 105 C ARG A 318 38.469 72.559 22.699 1.00 45.14 C
ATOM 106 O ARG A 318 39.592 72.425 22.205 1.00 42.05 O
ATOM 107 CB ARG A 318 36.657 70.880 22.967 1.00 42.93 C
ATOM 108 CG ARG A 318 36.934 70.321 21.576 1.00 38.60 C
ATOM 109 CD ARG A 318 35.654 70.038 20.821 1.00 35.39 C
ATOM 110 NE ARG A 318 34.624 69.538 21.724 1.00 34.96 N
ATOM 111 CZ ARG A 318 34.539 68.278 22.141 1.00 31.51 C
ATOM 112 NH1 ARG A 318 35.419 67.373 21.736 1.00 25.19 N
ATOM 113 NH2 ARG A 318 33.579 67.929 22.983 1.00 29.10 N
ATOM 114 N XLY A 319 37.690 73.604 22.461 1.00 49.96 N
ATOM 115 CX XLY A 319 38.138 74.668 21.592 1.00 55.53 C
ATOM 116 C XLY A 319 38.459 74.219 20.180 1.00 58.85 C
ATOM 117 O XLY A 319 37.583 73.766 19.440 1.00 58.98 O
ATOM 118 N SER A 320 39.734 74.334 19.823 1.00 61.64 N
ATOM 119 CA SER A 320 40.219 73.992 18.493 1.00 63.16 C
ATOM 120 C SER A 320 40.212 72.517 18.110 1.00 65.27 C
ATOM 121 O SER A 320 39.558 72.127 17.145 1.00 65.12 O
ATOM 122 CB SER A 320 41.634 74.542 18.316 1.00 65.36 C
ATOM 123 OG SER A 320 42.124 74.255 17.019 1.00 72.05 O
ATOM 124 N THR A 321 40.955 71.702 18.853 1.00 67.43 N
ATOM 125 CA THR A 321 41.049 70.274 18.562 1.00 67.73 C
ATOM 126 C THR A 321 40.220 69.430 19.529 1.00 66.41 C
ATOM 127 O THR A 321 39.244 69.917 20.095 1.00 70.21 O
ATOM 128 CB THR A 321 42.517 69.810 18.620 1.00 70.22 C
ATOM 129 OG1 THR A 321 42.613 68.453 18.169 1.00 77.03 O
ATOM 130 CG2 THR A 321 43.049 69.915 20.045 1.00 72.07 C
ATOM 131 N GLY A 322 40.608 68.168 19.707 1.00 61.22 N
ATOM 132 CA GLY A 322 39.892 67.286 20.614 1.00 53.23 C
ATOM 133 C GLY A 322 40.037 67.705 22.065 1.00 48.00 C
ATOM 134 O GLY A 322 40.138 68.892 22.372 1.00 50.41 O
ATOM 135 N LEU A 323 40.044 66.734 22.968 1.00 41.92 N
ATOM 136 CA LEU A 323 40.190 67.033 24.385 1.00 35.58 C
ATOM 137 C LEU A 323 41.613 66.738 24.874 1.00 31.41 C
ATOM 138 O LEU A 323 41.932 66.921 26.046 1.00 30.47 O
ATOM 139 CB LEU A 323 39.160 66.240 25.191 1.00 35.76 C
ATOM 140 CG LEU A 323 37.716 66.576 24.802 1.00 39.50 C
ATOM 141 CD1 LEU A 323 36.733 65.796 25.670 1.00 38.15 C
ATOM 142 CD2 LEU A 323 37.493 68.074 24.955 1.00 38.58 C



The output peptides should be: ['IHR',STGL'] not ['IHRXTGL'] in the current
version. Residue XLY A 319 or X in the fourth position should not be included
since it doesn't have CA atom. Instead the current version includes it and
remove the 'S' next to it, due to the same bug. One can get the right version
using the patch provided before.

Whether the _accept is modified or not the bug remains. Also the user should
not be expected to also modify build_peptides() method whenever PPBuilder
_accept is modified since the accept variable in build_peptides isn't really a
local (private) variable: In line 277 this variable accept is referenced from
self.accept of PPBuilder.

http://www.biopython.org/DIST/docs/api/Bio.PDB.Polypeptide-pysrc.html
277 accept=self._accept


On a side note the "aa_only" optional input variable for build_peptides() and
its comments are very misleading (@param aa_only: if 1, the residue needs to be
a standard AA). "aa_only" is meant as a flag that tells peptide_builder to
start filtering amino acids that are not to be accepted, and by default it is
turned on and without modifying _accept of PeptideBuilder only residues with
"CA" atom are accepted (line 250-264), not standard amino acids as the comment
states. In other words without modifying _accept in PeptideBuilder non standard
amino acid will still be accepted and included in the peptides built. Only when
overriding the _accept method of PeptideBuilder (as I did before) would
build_peptides() not include non-standard amino acids. I suggest renaming
"aa_only" to something more sensible like "filter_aa".

http://www.biopython.org/DIST/docs/api/Bio.PDB.Polypeptide-pysrc.html
266 - def build_peptides(self, entity, aa_only=1):
273 @param aa_only: if 1, the residue needs to be a standard AA
274 @type aa_only: int
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-08-26 13:13:21 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096





------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-26 09:13 EST -------
(In reply to comment #3)
Post by bugzilla-daemon
Hi Peter,
I manage to produce the problem without modifying _accept().
Excellent - that should help.
Post by bugzilla-daemon
The output peptides should be: ['IHR',STGL'] not ['IHRXTGL'] in the current
version...
I agree that ['IHRXTGL'] is definitely wrong (you have convinced me this
is a real bug).

Chain A has residues: ILE, HIS, ARG, XLY, SER, THR, GLY, LEU. Sensible
results are therefore ['IHRXSTGL'] if we include XLY as a modified amino
acid, or ['IHR', 'STGL'] is we exclude XLY (which we probably should).

Was XLY just an artifical example for this bug report? Looking at the
original PDB file for 1BFE, it is a modified GLY where you have switched
CA (alpha carbon) to the non-standard CX.
Post by bugzilla-daemon
Residue XLY A 319 or X in the fourth position should not be included
since it doesn't have CA atom. Instead the current version includes it and
remove the 'S' next to it, due to the same bug. One can get the right version
using the patch provided before.
Whether the _accept is modified or not the bug remains. Also the user should
not be expected to also modify build_peptides() method whenever PPBuilder
_accept is modified since the accept variable in build_peptides isn't really a
local (private) variable: In line 277 this variable accept is referenced from
self.accept of PPBuilder.
http://www.biopython.org/DIST/docs/api/Bio.PDB.Polypeptide-pysrc.html
277 accept=self._accept
I'm assuming you mean the line "accept=self._accept" in the build_peptides
method of the _PPBuilder class in Bio/PDB/Polypeptide.py (the line numbers
have changed). If so, all that does is define a local variable within the
scope of that method - it does not expose the method in any way. I don't
understand what you mean here.

Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-08-26 16:30:00 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096





------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-26 12:30 EST -------
Hi Siong,

Can you test this branch? I've made a change based on your suggestion:

http://github.com/peterjc/biopython/tree/bug3096

Currently there is just this one commit:

http://github.com/peterjc/biopython/commit/d65d2f4dfbedffa2847db0a37984c354586b4cb8

If you don't have git installed, or are not familiar with it, you can just
modified file Bio/PDB/Polypeptide.py from here:

http://github.com/peterjc/biopython/raw/d65d2f4dfbedffa2847db0a37984c354586b4cb8/Bio/PDB/Polypeptide.py

Thanks,

Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-08-13 17:52:49 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096





------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-13 13:52 EST -------
Hi Siong,

I've been going over your example again (and adding some doctests to
Bio/PDB/Polypeptide.py as well).

It seems to me that in order to show this "bug" you have had to override the
builder class' private _accept() method. If in doing so you break the default
build_peptides() method, then you should probably also override that too.

Can you show a problem without subclassing the builder object?

There may be scope for enhancement, but you haven't convinced me there is a
bug here.

Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-08-13 22:23:24 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096


skong at zymeworks.com changed:

What |Removed |Added
----------------------------------------------------------------------------
Version|Not Applicable |1.53




------- Comment #3 from skong at zymeworks.com 2010-08-13 18:23 EST -------
Hi Peter,

I manage to produce the problem without modifying _accept().

DIAGNOSTIC SCRIPT:
from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import PPBuilder, is_aa

def extract_peptides(model):
"""Extracts the peptides from a model.
Returns a list of Peptide object."""
output = []
for peptide in PPBuilder().build_peptides(model):
seq = str(peptide.get_sequence())
output.append(seq)
return output

if __name__ == '__main__':

pdb = open('chopped_pdb1bfe_noca.ent')
st = PDBParser().get_structure('', pdb)
seqa = extract_peptides(st)
print 'no ca seq all'
print seqa


PDB FILE: chopped_pdb1bfe_noca.ent
ATOM 85 N ILE A 316 37.386 71.217 31.070 1.00 36.97 N
ATOM 86 CA ILE A 316 38.311 71.290 29.949 1.00 33.71 C
ATOM 87 C ILE A 316 37.634 72.103 28.862 1.00 33.93 C
ATOM 88 O ILE A 316 36.415 72.216 28.839 1.00 36.46 O
ATOM 89 CB ILE A 316 38.651 69.876 29.404 1.00 35.79 C
ATOM 90 CG1 ILE A 316 39.331 69.049 30.501 1.00 36.78 C
ATOM 91 CG2 ILE A 316 39.572 69.979 28.187 1.00 37.71 C
ATOM 92 CD1 ILE A 316 39.881 67.724 30.023 1.00 39.20 C
ATOM 93 N HIS A 317 38.425 72.679 27.969 1.00 35.61 N
ATOM 94 CA HIS A 317 37.880 73.473 26.881 1.00 37.92 C
ATOM 95 C HIS A 317 38.360 72.928 25.540 1.00 37.79 C
ATOM 96 O HIS A 317 39.463 73.240 25.094 1.00 37.44 O
ATOM 97 CB HIS A 317 38.303 74.930 27.052 1.00 35.19 C
ATOM 98 CG HIS A 317 37.888 75.519 28.363 1.00 35.76 C
ATOM 99 ND1 HIS A 317 36.611 75.981 28.602 1.00 37.74 N
ATOM 100 CD2 HIS A 317 38.575 75.701 29.516 1.00 37.59 C
ATOM 101 CE1 HIS A 317 36.529 76.420 29.844 1.00 38.74 C
ATOM 102 NE2 HIS A 317 37.706 76.262 30.421 1.00 36.76 N
ATOM 103 N ARG A 318 37.527 72.109 24.905 1.00 38.78 N
ATOM 104 CA ARG A 318 37.884 71.512 23.627 1.00 42.04 C
ATOM 105 C ARG A 318 38.469 72.559 22.699 1.00 45.14 C
ATOM 106 O ARG A 318 39.592 72.425 22.205 1.00 42.05 O
ATOM 107 CB ARG A 318 36.657 70.880 22.967 1.00 42.93 C
ATOM 108 CG ARG A 318 36.934 70.321 21.576 1.00 38.60 C
ATOM 109 CD ARG A 318 35.654 70.038 20.821 1.00 35.39 C
ATOM 110 NE ARG A 318 34.624 69.538 21.724 1.00 34.96 N
ATOM 111 CZ ARG A 318 34.539 68.278 22.141 1.00 31.51 C
ATOM 112 NH1 ARG A 318 35.419 67.373 21.736 1.00 25.19 N
ATOM 113 NH2 ARG A 318 33.579 67.929 22.983 1.00 29.10 N
ATOM 114 N XLY A 319 37.690 73.604 22.461 1.00 49.96 N
ATOM 115 CX XLY A 319 38.138 74.668 21.592 1.00 55.53 C
ATOM 116 C XLY A 319 38.459 74.219 20.180 1.00 58.85 C
ATOM 117 O XLY A 319 37.583 73.766 19.440 1.00 58.98 O
ATOM 118 N SER A 320 39.734 74.334 19.823 1.00 61.64 N
ATOM 119 CA SER A 320 40.219 73.992 18.493 1.00 63.16 C
ATOM 120 C SER A 320 40.212 72.517 18.110 1.00 65.27 C
ATOM 121 O SER A 320 39.558 72.127 17.145 1.00 65.12 O
ATOM 122 CB SER A 320 41.634 74.542 18.316 1.00 65.36 C
ATOM 123 OG SER A 320 42.124 74.255 17.019 1.00 72.05 O
ATOM 124 N THR A 321 40.955 71.702 18.853 1.00 67.43 N
ATOM 125 CA THR A 321 41.049 70.274 18.562 1.00 67.73 C
ATOM 126 C THR A 321 40.220 69.430 19.529 1.00 66.41 C
ATOM 127 O THR A 321 39.244 69.917 20.095 1.00 70.21 O
ATOM 128 CB THR A 321 42.517 69.810 18.620 1.00 70.22 C
ATOM 129 OG1 THR A 321 42.613 68.453 18.169 1.00 77.03 O
ATOM 130 CG2 THR A 321 43.049 69.915 20.045 1.00 72.07 C
ATOM 131 N GLY A 322 40.608 68.168 19.707 1.00 61.22 N
ATOM 132 CA GLY A 322 39.892 67.286 20.614 1.00 53.23 C
ATOM 133 C GLY A 322 40.037 67.705 22.065 1.00 48.00 C
ATOM 134 O GLY A 322 40.138 68.892 22.372 1.00 50.41 O
ATOM 135 N LEU A 323 40.044 66.734 22.968 1.00 41.92 N
ATOM 136 CA LEU A 323 40.190 67.033 24.385 1.00 35.58 C
ATOM 137 C LEU A 323 41.613 66.738 24.874 1.00 31.41 C
ATOM 138 O LEU A 323 41.932 66.921 26.046 1.00 30.47 O
ATOM 139 CB LEU A 323 39.160 66.240 25.191 1.00 35.76 C
ATOM 140 CG LEU A 323 37.716 66.576 24.802 1.00 39.50 C
ATOM 141 CD1 LEU A 323 36.733 65.796 25.670 1.00 38.15 C
ATOM 142 CD2 LEU A 323 37.493 68.074 24.955 1.00 38.58 C



The output peptides should be: ['IHR',STGL'] not ['IHRXTGL'] in the current
version. Residue XLY A 319 or X in the fourth position should not be included
since it doesn't have CA atom. Instead the current version includes it and
remove the 'S' next to it, due to the same bug. One can get the right version
using the patch provided before.

Whether the _accept is modified or not the bug remains. Also the user should
not be expected to also modify build_peptides() method whenever PPBuilder
_accept is modified since the accept variable in build_peptides isn't really a
local (private) variable: In line 277 this variable accept is referenced from
self.accept of PPBuilder.

http://www.biopython.org/DIST/docs/api/Bio.PDB.Polypeptide-pysrc.html
277 accept=self._accept


On a side note the "aa_only" optional input variable for build_peptides() and
its comments are very misleading (@param aa_only: if 1, the residue needs to be
a standard AA). "aa_only" is meant as a flag that tells peptide_builder to
start filtering amino acids that are not to be accepted, and by default it is
turned on and without modifying _accept of PeptideBuilder only residues with
"CA" atom are accepted (line 250-264), not standard amino acids as the comment
states. In other words without modifying _accept in PeptideBuilder non standard
amino acid will still be accepted and included in the peptides built. Only when
overriding the _accept method of PeptideBuilder (as I did before) would
build_peptides() not include non-standard amino acids. I suggest renaming
"aa_only" to something more sensible like "filter_aa".

http://www.biopython.org/DIST/docs/api/Bio.PDB.Polypeptide-pysrc.html
266 - def build_peptides(self, entity, aa_only=1):
273 @param aa_only: if 1, the residue needs to be a standard AA
274 @type aa_only: int
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-08-26 13:13:21 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096





------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-26 09:13 EST -------
(In reply to comment #3)
Post by bugzilla-daemon
Hi Peter,
I manage to produce the problem without modifying _accept().
Excellent - that should help.
Post by bugzilla-daemon
The output peptides should be: ['IHR',STGL'] not ['IHRXTGL'] in the current
version...
I agree that ['IHRXTGL'] is definitely wrong (you have convinced me this
is a real bug).

Chain A has residues: ILE, HIS, ARG, XLY, SER, THR, GLY, LEU. Sensible
results are therefore ['IHRXSTGL'] if we include XLY as a modified amino
acid, or ['IHR', 'STGL'] is we exclude XLY (which we probably should).

Was XLY just an artifical example for this bug report? Looking at the
original PDB file for 1BFE, it is a modified GLY where you have switched
CA (alpha carbon) to the non-standard CX.
Post by bugzilla-daemon
Residue XLY A 319 or X in the fourth position should not be included
since it doesn't have CA atom. Instead the current version includes it and
remove the 'S' next to it, due to the same bug. One can get the right version
using the patch provided before.
Whether the _accept is modified or not the bug remains. Also the user should
not be expected to also modify build_peptides() method whenever PPBuilder
_accept is modified since the accept variable in build_peptides isn't really a
local (private) variable: In line 277 this variable accept is referenced from
self.accept of PPBuilder.
http://www.biopython.org/DIST/docs/api/Bio.PDB.Polypeptide-pysrc.html
277 accept=self._accept
I'm assuming you mean the line "accept=self._accept" in the build_peptides
method of the _PPBuilder class in Bio/PDB/Polypeptide.py (the line numbers
have changed). If so, all that does is define a local variable within the
scope of that method - it does not expose the method in any way. I don't
understand what you mean here.

Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-08-26 16:30:00 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096





------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-08-26 12:30 EST -------
Hi Siong,

Can you test this branch? I've made a change based on your suggestion:

http://github.com/peterjc/biopython/tree/bug3096

Currently there is just this one commit:

http://github.com/peterjc/biopython/commit/d65d2f4dfbedffa2847db0a37984c354586b4cb8

If you don't have git installed, or are not familiar with it, you can just
modified file Bio/PDB/Polypeptide.py from here:

http://github.com/peterjc/biopython/raw/d65d2f4dfbedffa2847db0a37984c354586b4cb8/Bio/PDB/Polypeptide.py

Thanks,

Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-09-09 17:42:29 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096





------- Comment #6 from skong at zymeworks.com 2010-09-09 13:42 EST -------
Hi Peter,

I tested out the code (on the script directly, not using git) and it works
fine. I only have minor concerns that the additional input variable
"standard_aa_only" for _accept() method in class _PPBuilder might break other
codes that assumes it still has two instead of three input variables.

Also within the same script there are three different naming and default value
for the same flag (standard amino acid):

1. named "standard" with default False in is_aa() method
2. named "aa_only" with default 1 in build_peptides() method of class
_PPBuilder
3. named "standard_aa_only" with no default value in _accept() method of class
_PPBuilder

Which is again minor.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-09-09 17:58:01 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096





------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-09 13:58 EST -------
(In reply to comment #6)
Post by bugzilla-daemon
Hi Peter,
I tested out the code (on the script directly, not using git) and it works
fine.
Excellent - thank you.
Post by bugzilla-daemon
I only have minor concerns that the additional input variable
"standard_aa_only" for _accept() method in class _PPBuilder might break
other codes that assumes it still has two instead of three input variables.
True, but I think that is a low risk and it is intended as a private API.
It could be made an optional argument I suppose.
Post by bugzilla-daemon
Also within the same script there are three different naming and default value
1. named "standard" with default False in is_aa() method
2. named "aa_only" with default 1 in build_peptides() method of class
_PPBuilder
3. named "standard_aa_only" with no default value in _accept() method of class
_PPBuilder
Which is again minor.
We can change the new argument ("standard_aa_only") added to _accept() without
breaking backwards compatibility. I was trying to make it explicit - would you
prefer "standard" instead? We both agreed that "aa_only" is very misleading.

Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-09-10 09:40:40 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096


biopython-bugzilla at maubp.freeserve.co.uk changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED




------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-10 05:40 EST -------
Fix cherry-picked from that branch and committed:
http://github.com/biopython/biopython/commit/544e4855e219cfbce813a50fa183683a7b0e4b3e

I've also added you as a contributor (let me know if you want your email
address included in the CONTRIB file, or would prefer not to be named):
http://github.com/biopython/biopython/commit/993d58eb8e49a32d6821471421050720b88bfeeb

Marking bug as fixed.

Thank you :)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-09-09 17:42:29 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096





------- Comment #6 from skong at zymeworks.com 2010-09-09 13:42 EST -------
Hi Peter,

I tested out the code (on the script directly, not using git) and it works
fine. I only have minor concerns that the additional input variable
"standard_aa_only" for _accept() method in class _PPBuilder might break other
codes that assumes it still has two instead of three input variables.

Also within the same script there are three different naming and default value
for the same flag (standard amino acid):

1. named "standard" with default False in is_aa() method
2. named "aa_only" with default 1 in build_peptides() method of class
_PPBuilder
3. named "standard_aa_only" with no default value in _accept() method of class
_PPBuilder

Which is again minor.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-09-09 17:58:01 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096





------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-09 13:58 EST -------
(In reply to comment #6)
Post by bugzilla-daemon
Hi Peter,
I tested out the code (on the script directly, not using git) and it works
fine.
Excellent - thank you.
Post by bugzilla-daemon
I only have minor concerns that the additional input variable
"standard_aa_only" for _accept() method in class _PPBuilder might break
other codes that assumes it still has two instead of three input variables.
True, but I think that is a low risk and it is intended as a private API.
It could be made an optional argument I suppose.
Post by bugzilla-daemon
Also within the same script there are three different naming and default value
1. named "standard" with default False in is_aa() method
2. named "aa_only" with default 1 in build_peptides() method of class
_PPBuilder
3. named "standard_aa_only" with no default value in _accept() method of class
_PPBuilder
Which is again minor.
We can change the new argument ("standard_aa_only") added to _accept() without
breaking backwards compatibility. I was trying to make it explicit - would you
prefer "standard" instead? We both agreed that "aa_only" is very misleading.

Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon
2010-09-10 09:40:40 UTC
Permalink
http://bugzilla.open-bio.org/show_bug.cgi?id=3096


biopython-bugzilla at maubp.freeserve.co.uk changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED




------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2010-09-10 05:40 EST -------
Fix cherry-picked from that branch and committed:
http://github.com/biopython/biopython/commit/544e4855e219cfbce813a50fa183683a7b0e4b3e

I've also added you as a contributor (let me know if you want your email
address included in the CONTRIB file, or would prefer not to be named):
http://github.com/biopython/biopython/commit/993d58eb8e49a32d6821471421050720b88bfeeb

Marking bug as fixed.

Thank you :)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
Loading...