Discussion:
Python 2 and 3 migration thoughts
(too old to reply)
Peter Cock
2013-05-30 13:33:22 UTC
Permalink
Splitting off from this thread:
http://lists.open-bio.org/pipermail/biopython/2013-May/008601.html
Thank you for all the comments so far, don't stop yet :)
On Thu, May 30, 2013 at 1:51 PM, Wibowo Arindrarto
Hi everyone,
I'm leaning towards insisting on Python >=3.3 support (I'm running
3.3.2). I suppose that even if Python3.3 is not available on a machine
or through the default package manager, it's always installable on its
own. If that's not the case, I imagine Python2.x is most likely
present in these machines (so Biopython can still be used).
True.
So far everyone who has replied (including some off list) have said
they are using Python 3.3 which is encouraging. Thank you for
the comments so far.
It looks like we can forget about Python 3.1, and just need to
decide if it is worth including Python 3.2.5 in the short term.
On a related note, do we have a defined timeline on when we
would drop support for Python2.x? Are there any plans to have
our codebase written in Python3.x instead of Python2.x?
Nothing concrete planned, no. I'll reply in more detail on the
biopython-dev list as I do have some thoughts about this.
Good question Bow,

I think people will still be using Python 2 a year or two from
now, so we must support both for some time.

Biopython 1.62 (next week perhaps?)
- Final release with Python 2.5 support
- Official support for Python 2.5, 2.6, 2.7 and 3.3
- Possibly official support for Python 3.2.5+ as well?

(Exactly which versions of Python 3 we'll include to be
decided, see the other thread for that discussion.)

Short term we will continue with developing using Python 2
syntax and running 2to3 for Python 3. As far as I know,
the reverse process with 3to2 is not well established. If
anyone wants to investigate that would be useful as
another option. However, dropping Python 2.5 support
makes things more flexible...

Medium term I believe it would be possible to have a single
code base which is both valid Python 2 and 3 at the same
time. This may require us to target 2.7 and 3.3+ only - we'll
have to try it and see if Python 2.6 will hold us back.

I've actually done this with lzma.backports, a small but
non-trivial module with Python and C code:

https://pypi.python.org/pypi/backports.lzma/
https://github.com/peterjc/backports.lzma

Python 3.3 reintroduces some features designed to make
this more straightforward, like unicode literals (missing in
the early versions of Python 3). This is why I'd like to drop
Python 3.2 as soon as possible.

What I was thinking is we can start migrating modules on a
case by case basis from "Python 2 syntax" to "Dual syntax"
one by one, with a white-list in the do2to3.py script. That
way over time less and less modules need to be converted
via 2to3, and "python3 setup.py install" will get faster,
until eventually we can stop using 2to3 at all.

This conversion could consider the code and doctests
separately. However, using using print(example) we can
hopefully get most of the doctests and Tutorial examples
to work under both Python 2 and 3 at the same time.

That's my current thinking anyway - and I think the fact
that it would be a gradual migration from writing Python 2
specific code to writing dual 2/3 code makes it low risk
(as long as we're continuing to run regular testing).

Regards,

Peter
Peter Cock
2013-05-30 13:33:22 UTC
Permalink
Splitting off from this thread:
http://lists.open-bio.org/pipermail/biopython/2013-May/008601.html
Thank you for all the comments so far, don't stop yet :)
On Thu, May 30, 2013 at 1:51 PM, Wibowo Arindrarto
Hi everyone,
I'm leaning towards insisting on Python >=3.3 support (I'm running
3.3.2). I suppose that even if Python3.3 is not available on a machine
or through the default package manager, it's always installable on its
own. If that's not the case, I imagine Python2.x is most likely
present in these machines (so Biopython can still be used).
True.
So far everyone who has replied (including some off list) have said
they are using Python 3.3 which is encouraging. Thank you for
the comments so far.
It looks like we can forget about Python 3.1, and just need to
decide if it is worth including Python 3.2.5 in the short term.
On a related note, do we have a defined timeline on when we
would drop support for Python2.x? Are there any plans to have
our codebase written in Python3.x instead of Python2.x?
Nothing concrete planned, no. I'll reply in more detail on the
biopython-dev list as I do have some thoughts about this.
Good question Bow,

I think people will still be using Python 2 a year or two from
now, so we must support both for some time.

Biopython 1.62 (next week perhaps?)
- Final release with Python 2.5 support
- Official support for Python 2.5, 2.6, 2.7 and 3.3
- Possibly official support for Python 3.2.5+ as well?

(Exactly which versions of Python 3 we'll include to be
decided, see the other thread for that discussion.)

Short term we will continue with developing using Python 2
syntax and running 2to3 for Python 3. As far as I know,
the reverse process with 3to2 is not well established. If
anyone wants to investigate that would be useful as
another option. However, dropping Python 2.5 support
makes things more flexible...

Medium term I believe it would be possible to have a single
code base which is both valid Python 2 and 3 at the same
time. This may require us to target 2.7 and 3.3+ only - we'll
have to try it and see if Python 2.6 will hold us back.

I've actually done this with lzma.backports, a small but
non-trivial module with Python and C code:

https://pypi.python.org/pypi/backports.lzma/
https://github.com/peterjc/backports.lzma

Python 3.3 reintroduces some features designed to make
this more straightforward, like unicode literals (missing in
the early versions of Python 3). This is why I'd like to drop
Python 3.2 as soon as possible.

What I was thinking is we can start migrating modules on a
case by case basis from "Python 2 syntax" to "Dual syntax"
one by one, with a white-list in the do2to3.py script. That
way over time less and less modules need to be converted
via 2to3, and "python3 setup.py install" will get faster,
until eventually we can stop using 2to3 at all.

This conversion could consider the code and doctests
separately. However, using using print(example) we can
hopefully get most of the doctests and Tutorial examples
to work under both Python 2 and 3 at the same time.

That's my current thinking anyway - and I think the fact
that it would be a gradual migration from writing Python 2
specific code to writing dual 2/3 code makes it low risk
(as long as we're continuing to run regular testing).

Regards,

Peter
Peter Cock
2013-09-06 15:44:44 UTC
Permalink
Post by Peter Cock
http://lists.open-bio.org/pipermail/biopython/2013-May/008601.html
Thank you for all the comments so far, don't stop yet :)
On Thu, May 30, 2013 at 1:51 PM, Wibowo Arindrarto
Hi everyone,
I'm leaning towards insisting on Python >=3.3 support (I'm running
3.3.2). I suppose that even if Python3.3 is not available on a machine
or through the default package manager, it's always installable on its
own. If that's not the case, I imagine Python2.x is most likely
present in these machines (so Biopython can still be used).
True.
So far everyone who has replied (including some off list) have said
they are using Python 3.3 which is encouraging. Thank you for
the comments so far.
It looks like we can forget about Python 3.1, and just need to
decide if it is worth including Python 3.2.5 in the short term.
On a related note, do we have a defined timeline on when we
would drop support for Python2.x? Are there any plans to have
our codebase written in Python3.x instead of Python2.x?
Nothing concrete planned, no. I'll reply in more detail on the
biopython-dev list as I do have some thoughts about this.
Good question Bow,
I think people will still be using Python 2 a year or two from
now, so we must support both for some time.
Biopython 1.62 (next week perhaps?)
- Final release with Python 2.5 support
- Official support for Python 2.5, 2.6, 2.7 and 3.3
- Possibly official support for Python 3.2.5+ as well?
(Exactly which versions of Python 3 we'll include to be
decided, see the other thread for that discussion.)
Short term we will continue with developing using Python 2
syntax and running 2to3 for Python 3. As far as I know,
the reverse process with 3to2 is not well established. If
anyone wants to investigate that would be useful as
another option. However, dropping Python 2.5 support
makes things more flexible...
Medium term I believe it would be possible to have a single
code base which is both valid Python 2 and 3 at the same
time. This may require us to target 2.7 and 3.3+ only - we'll
have to try it and see if Python 2.6 will hold us back.
I've actually done this with lzma.backports, a small but
https://pypi.python.org/pypi/backports.lzma/
https://github.com/peterjc/backports.lzma
Python 3.3 reintroduces some features designed to make
this more straightforward, like unicode literals (missing in
the early versions of Python 3). This is why I'd like to drop
Python 3.2 as soon as possible.
What I was thinking is we can start migrating modules on a
case by case basis from "Python 2 syntax" to "Dual syntax"
one by one, with a white-list in the do2to3.py script. That
way over time less and less modules need to be converted
via 2to3, and "python3 setup.py install" will get faster,
until eventually we can stop using 2to3 at all.
This conversion could consider the code and doctests
separately. However, using using print(example) we can
hopefully get most of the doctests and Tutorial examples
to work under both Python 2 and 3 at the same time.
That's my current thinking anyway - and I think the fact
that it would be a gradual migration from writing Python 2
specific code to writing dual 2/3 code makes it low risk
(as long as we're continuing to run regular testing).
Regards,
Peter
This branch is trying out marking individual Python files
as dual coding (Python 2 and Python 3) or as Python 2
only requiring conversion via 2to3 for use on Python 3:

https://github.com/peterjc/biopython/tree/tag2to3

Currently the tags are two special hash comment lines
expected near the start of the file itself (rather than a
list within the do2to3.py script). The actual text of the
marker isn't critical - perhaps these need full stops?

# This file targets both Python 2 and Python 3 at the same time
# TODO - Targets Python 2 only (use 2to3 to run under Python 3)

The first main issues thus far have been print statements,
where we will either need to use the __future__ import or
restrict ourselves to simple single argument calls - I have
been using the later. This should not be a big problem on the
main code, and we ought to update the print-and-compare
unit tests anyway,

The next common issue is import statements, for
example StringIO (another bytes versus unicode issue).
That can be handled via Bio._py3k in some cases.

A third major class of issues in the unit tests so
far is iterators versus lists, for example dictionary
methods and the map function's return value. These
can be tackled on a case by case basis I think - often
by adding the occasional list(...) or sorted(x) instead
of trying x.sorted() is enough.

There are also quite a few instances of 'basestring'
which might be handled via _py3k?

As of right now, on this branch there are only 8 files under
Tests which require conversion via 2to3 :

Tests/common_BioSQL.py
Tests/seq_tests_common.py
Tests/test_NCBI_qblast.py
Tests/test_SCOP_Cla.py
Tests/test_seq.py
Tests/test_SeqIO.py
Tests/test_SeqIO_index.py
Tests/test_Uniprot.py

Having I hope demonstrated this will work, I'd like some
feedback before applying this (or a modified version of
it) to the master branch.

Any thoughts? Thanks,

Peter
Peter Cock
2013-09-07 11:30:50 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
Short term we will continue with developing using Python 2
syntax and running 2to3 for Python 3. As far as I know,
the reverse process with 3to2 is not well established. If
anyone wants to investigate that would be useful as
another option. However, dropping Python 2.5 support
makes things more flexible...
Medium term I believe it would be possible to have a single
code base which is both valid Python 2 and 3 at the same
time. This may require us to target 2.7 and 3.3+ only - we'll
have to try it and see if Python 2.6 will hold us back.
I've actually done this with lzma.backports, a small but
https://pypi.python.org/pypi/backports.lzma/
https://github.com/peterjc/backports.lzma
Python 3.3 reintroduces some features designed to make
this more straightforward, like unicode literals (missing in
the early versions of Python 3). This is why I'd like to drop
Python 3.2 as soon as possible.
What I was thinking is we can start migrating modules on a
case by case basis from "Python 2 syntax" to "Dual syntax"
one by one, with a white-list in the do2to3.py script. That
way over time less and less modules need to be converted
via 2to3, and "python3 setup.py install" will get faster,
until eventually we can stop using 2to3 at all.
This conversion could consider the code and doctests
separately. However, using using print(example) we can
hopefully get most of the doctests and Tutorial examples
to work under both Python 2 and 3 at the same time.
That's my current thinking anyway - and I think the fact
that it would be a gradual migration from writing Python 2
specific code to writing dual 2/3 code makes it low risk
(as long as we're continuing to run regular testing).
Regards,
Peter
This branch is trying out marking individual Python files
as dual coding (Python 2 and Python 3) or as Python 2
https://github.com/peterjc/biopython/tree/tag2to3
Currently the tags are two special hash comment lines
expected near the start of the file itself (rather than a
list within the do2to3.py script). The actual text of the
marker isn't critical - perhaps these need full stops?
# This file targets both Python 2 and Python 3 at the same time
# TODO - Targets Python 2 only (use 2to3 to run under Python 3)
The first main issues thus far have been print statements,
where we will either need to use the __future__ import or
restrict ourselves to simple single argument calls - I have
been using the later. This should not be a big problem on the
main code, and we ought to update the print-and-compare
unit tests anyway,
e.g.
https://github.com/biopython/biopython/commit/6fa766e2348eae4e083503885f4ea5b66f531d7a
Post by Peter Cock
The next common issue is import statements, for
example StringIO (another bytes versus unicode issue).
That can be handled via Bio._py3k in some cases.
For StringIO,
https://github.com/biopython/biopython/commit/b09ebbf6f8c4032f874d89a91d199d8697c2d381

For commands.getoutput used in many tests,
https://github.com/biopython/biopython/commit/11a1eca60e7a1491dbe54204ad3103e013bfebc5
Post by Peter Cock
A third major class of issues in the unit tests so
far is iterators versus lists, for example dictionary
methods and the map function's return value. These
can be tackled on a case by case basis I think - often
by adding the occasional list(...) or sorted(x) instead
of trying x.sorted() is enough.
e.g. for sorting dictionary keys,
https://github.com/biopython/biopython/commit/b27f30012af6e66f6f143ecde719bf72609af8f2

e.g. for avoiding iterators from map function,
https://github.com/biopython/biopython/commit/730850e3f4e88a70860e56abafbb579b25414f06
Post by Peter Cock
There are also quite a few instances of 'basestring'
which might be handled via _py3k?
As of right now, on this branch there are only 8 files under
Down to six files under Tests now if I rebase the branch
to include the recent fixes on the master.
Post by Peter Cock
Having I hope demonstrated this will work, I'd like some
feedback before applying this (or a modified version of
it) to the master branch.
I've started applying individual code fixes to the master
to improve Python 2 and 3 compatibility already.

I'm specifically looking for thoughts on how to handle
the transition period when some of our code will still
need 2to3, while other code will not.

Does the special comment line seem like a good solution?
On the plus side, it tracks any changes with the file being
updated (which wouldn't happen with a list in the do2to3.py
file).

Peter
Eric Talevich
2013-09-07 19:17:08 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
This branch is trying out marking individual Python files
as dual coding (Python 2 and Python 3) or as Python 2
https://github.com/peterjc/biopython/tree/tag2to3
Currently the tags are two special hash comment lines
expected near the start of the file itself (rather than a
list within the do2to3.py script). The actual text of the
marker isn't critical - perhaps these need full stops?
# This file targets both Python 2 and Python 3 at the same time
# TODO - Targets Python 2 only (use 2to3 to run under Python 3)
[...]
Post by Peter Cock
Post by Peter Cock
As of right now, on this branch there are only 8 files under
Down to six files under Tests now if I rebase the branch
to include the recent fixes on the master.
Post by Peter Cock
Having I hope demonstrated this will work, I'd like some
feedback before applying this (or a modified version of
it) to the master branch.
I've started applying individual code fixes to the master
to improve Python 2 and 3 compatibility already.
I'm specifically looking for thoughts on how to handle
the transition period when some of our code will still
need 2to3, while other code will not.
Does the special comment line seem like a good solution?
On the plus side, it tracks any changes with the file being
updated (which wouldn't happen with a list in the do2to3.py
file).
Peter
Hi Peter,

This looks like a good way to move forward overall. Regarding the special
comment lines -- since these are only used in do2to3.py, would it be
cleaner to keep a hard-coded list of filenames in do2to3.py and leave the
modules and scripts alone? Are there any characteristics that would make it
difficult to determine whether a given module or script is Py3-compliant?

-Eric
Peter Cock
2013-09-08 20:52:40 UTC
Permalink
Post by Eric Talevich
Post by Peter Cock
Post by Peter Cock
# This file targets both Python 2 and Python 3 at the same time
# TODO - Targets Python 2 only (use 2to3 to run under Python 3)
Does the special comment line seem like a good solution?
On the plus side, it tracks any changes with the file being
updated (which wouldn't happen with a list in the do2to3.py
file).
Hi Peter,
This looks like a good way to move forward overall. Regarding the special
comment lines -- since these are only used in do2to3.py, would it be
cleaner to keep a hard-coded list of filenames in do2to3.py and leave the
modules and scripts alone? Are there any characteristics that would make it
difficult to determine whether a given module or script is Py3-compliant?
Hi Eric,

There are import time problems which are easy to spot - in particular
SyntaxError is a good clue. However, many of the issues are only
really found at run time (e.g. different method names). This means
that the tests (which I started with) are actually the easiest to check.

Right now I don't have a feel for what fraction of the main Bio/* and
BioSQL/* files can be made dual-coding, and that would have an
influence on how best to tag things needing 2to3 or not. I'm happy
to continue this on branches for a while longer and find out.

I do like the idea of a special #TODO comment line where 2to3
is still needed - it is symbolic of where I want the code base to go ;)

Regards,

Peter
Peter Cock
2013-09-29 23:22:52 UTC
Permalink
Post by Peter Cock
Post by Eric Talevich
Post by Peter Cock
Post by Peter Cock
# This file targets both Python 2 and Python 3 at the same time
# TODO - Targets Python 2 only (use 2to3 to run under Python 3)
Does the special comment line seem like a good solution?
On the plus side, it tracks any changes with the file being
updated (which wouldn't happen with a list in the do2to3.py
file).
Hi Peter,
This looks like a good way to move forward overall. Regarding the special
comment lines -- since these are only used in do2to3.py, would it be
cleaner to keep a hard-coded list of filenames in do2to3.py and leave the
modules and scripts alone? Are there any characteristics that would make it
difficult to determine whether a given module or script is Py3-compliant?
Hi Eric,
There are import time problems which are easy to spot - in particular
SyntaxError is a good clue. However, many of the issues are only
really found at run time (e.g. different method names). This means
that the tests (which I started with) are actually the easiest to check.
Right now I don't have a feel for what fraction of the main Bio/* and
BioSQL/* files can be made dual-coding, and that would have an
influence on how best to tag things needing 2to3 or not. I'm happy
to continue this on branches for a while longer and find out.
Assuming my methodology isn't flawed, we're about half way
in terms of getting every file in Biopython do be dual Python 2
and Python 3 code:

262 no change, 290 need fixers
Troublesome ones at 52.5%

This is based on there being a difference between the pre-
and post-2to3 conversion (discounting removing future imports)
This is an over estimate as often the 2to3 script makes
unnecessary changes.

This is after applying a *lot* of little changes to our codebase,
things like removing unneeded use of my_dict.keys() which
the 2to3 fixers are over cautious in wrapping as list(my_dict.keys())
- I would like to do a beta before the next release.
Post by Peter Cock
I do like the idea of a special #TODO comment line where 2to3
is still needed - it is symbolic of where I want the code base to go ;)
That's what is going on in this revised branch - if the special
#TODO comment is there, then 2to3 is used, otherwise we
assume the file is already OK to use under Python 3:

https://github.com/peterjc/biopython/tree/mark2to3

This is now quicker to install under Python 3, but there is
plenty of scope for speed optimisation (e.g. requiring the
magic marker is in the first (say) 20 lines of the file, and
expanding the magic marker to list the specific 2to3 fixers
required and running just those.

Regards,

Peter
Peter Cock
2013-09-30 16:18:21 UTC
Permalink
Post by Peter Cock
Assuming my methodology isn't flawed, we're about half way
in terms of getting every file in Biopython do be dual Python 2
262 no change, 290 need fixers
Troublesome ones at 52.5%
New numbers with Bio._py3k.urllib changes which should
have dropped the number of troublesome files by at most
13 files:

374 no change, 177 need fixers
Troublesome ones 32.1%

I think my markup script is a bit fragile in terms of the exact
sequence of steps with do2to3.py etc. But much better
numbers than Sunday night :)

Revised branch here:
https://github.com/peterjc/biopython/tree/mark2to3a
https://github.com/peterjc/biopython/commit/14f9ff121532ff92ec7bacc1867bdd058a6e8f74

Build and test times on the master vs this branch are
looking a lot better for Python 3 (although the numbers
for different TravisCI runs are not directly comparable),
and there is still a lot of room for improvement:

master:
https://travis-ci.org/biopython/biopython/builds/11965000

branch:
https://travis-ci.org/peterjc/biopython/builds/11968132

So that's good. However, are these urllib import fixes
an acceptable way forwards? Included in the above
branch and here:

https://github.com/peterjc/biopython/tree/urllib
https://github.com/peterjc/biopython/commit/1305387a5d98a5f3c7b83ca3db580b9e63dba851

Thanks,

Peter
Peter Cock
2013-10-05 19:02:47 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
Assuming my methodology isn't flawed, we're about half way
in terms of getting every file in Biopython do be dual Python 2
262 no change, 290 need fixers
Troublesome ones at 52.5%
New numbers with Bio._py3k.urllib changes which should
have dropped the number of troublesome files by at most
374 no change, 177 need fixers
Troublesome ones 32.1%
I think my markup script is a bit fragile in terms of the exact
sequence of steps with do2to3.py etc. But much better
numbers than Sunday night :)
I wasn't using the -B switch in diff until now, that makes
things easier:

383 no change, 171 need fixers
Troublesome ones 30.9%

Revised branch here:

https://github.com/peterjc/biopython/tree/mark2to3b
https://travis-ci.org/peterjc/biopython/builds/12175589

This is rebased on the master where I've also cut down the
number of fixers in use, so together we get a good speed
up for the Python 3 install time.

I've rebased the urllib changes (include in the above
test branch) and made a pull request for comment:
https://github.com/biopython/biopython/pull/245

Peter
Peter Cock
2013-10-05 21:36:25 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
Post by Peter Cock
Assuming my methodology isn't flawed, we're about half way
in terms of getting every file in Biopython do be dual Python 2
262 no change, 290 need fixers
Troublesome ones at 52.5%
New numbers with Bio._py3k.urllib changes which should
have dropped the number of troublesome files by at most
374 no change, 177 need fixers
Troublesome ones 32.1%
I think my markup script is a bit fragile in terms of the exact
sequence of steps with do2to3.py etc. But much better
numbers than Sunday night :)
I wasn't using the -B switch in diff until now, that makes
383 no change, 171 need fixers
Troublesome ones 30.9%
https://github.com/peterjc/biopython/tree/mark2to3b
https://travis-ci.org/peterjc/biopython/builds/12175589
This is rebased on the master where I've also cut down the
number of fixers in use, so together we get a good speed
up for the Python 3 install time.
I've rebased the urllib changes (include in the above
https://github.com/biopython/biopython/pull/245
Peter
Incorporating another new feature branch gives:

387 no change, 161 need fixers
Troublesome ones 29.4%

The new batch of 2to3 issues solved is changes to
built in functions like range, zip, map, filter. Branch:
https://github.com/peterjc/biopython/tree/builtins
https://github.com/biopython/biopython/pull/246

Peter
Peter Cock
2013-10-06 14:03:00 UTC
Permalink
Post by Peter Cock
387 no change, 161 need fixers
Troublesome ones 29.4%
The new batch of 2to3 issues solved is changes to
https://github.com/peterjc/biopython/tree/builtins
https://github.com/biopython/biopython/pull/246
I've added basestring and input to the builtins branch
(pull request updated), helps even more.

However, I realised I am effectively reimplementing the
MIT licensed 'six' library with 'Bio._py3k' and it would
be simpler to just use that instead (and that would make
life easier for contributors already using 'six' on other
projects):

https://pypi.python.org/pypi/six/
https://bitbucket.org/gutworth/six
http://pythonhosted.org/six/

Expect a slight reworking of these branches to appear
later, bundling a copy of 'six' as Bio/_py3k/__init__.py

Peter
Peter Cock
2013-10-06 21:50:18 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
387 no change, 161 need fixers
Troublesome ones 29.4%
The new batch of 2to3 issues solved is changes to
https://github.com/peterjc/biopython/tree/builtins
https://github.com/biopython/biopython/pull/246
I've added basestring and input to the builtins branch
(pull request updated), helps even more.
However, I realised I am effectively reimplementing the
MIT licensed 'six' library with 'Bio._py3k' and it would
be simpler to just use that instead (and that would make
life easier for contributors already using 'six' on other
https://pypi.python.org/pypi/six/
https://bitbucket.org/gutworth/six
http://pythonhosted.org/six/
Expect a slight reworking of these branches to appear
later, bundling a copy of 'six' ...
New branch is https://github.com/peterjc/biopython/tree/six
with 'six' bundled and using this for more import fixes.

Using that work, we're now at under a quarter of the files
needing 2to3 changes using the modified do2to3.py,
https://github.com/peterjc/biopython/tree/mark2to3c
https://travis-ci.org/peterjc/biopython/builds/12208302

416 no change, 132 need fixers
Troublesome ones 24.1%

Progress :)

Peter
Peter Cock
2013-10-14 15:00:46 UTC
Permalink
Hello all,

Despite a nasty cold, I've made further progress over the
weekend. Switching to assuming Python 3 style dictionaries
is a single biggest step forward - and as long as we have
good test coverage I think this is low risk. I think a dual
code base without needing 2to3 may be attainable for
the next Biopython release.

However, before that, I'd like to take a moment to discuss
changing imports, e.g. Doc/examples/getgene.py

Do people prefer something explicit like this,

try:
import gdbm # Python 2
except ImportError:
from dbm import gnu as gdbm # Python 3

Or something via a helper library (e.g. our Bio._py3k or
a bundled copy of the six library):

from six import dbm_gnu as gdbm

That's a rare example, something far more common is
StringIO, which also crops up in our doctests. e.g.
from StringIO import StringIO
... from StringIO import StringIO # Python 2
... except ImportError:
... from io import StringIO # Python 3
...

Both via Bio._py3k, not ideal for a doctest as it is a
from Bio._py3k import StringIO
Both via six, not ideal if we're bundling it as Bio._six
from six import StringIO
Or, for a more common and more complex example, have a
look at how urllib has changed under Python 3. See some
of the commits here:

https://github.com/peterjc/biopython/tree/six

For docstrings, I actually prefer the explicit commented
version with the try/except. For the main code, using a
central helper like Bio._py3k or a bundled copy of six
makes sense from a code management perspective -
it would ensure consistency (and be easy to remove
once we drop Python 2 support).

Any thoughts?

Thanks,

Peter
Eric Talevich
2013-10-14 20:34:00 UTC
Permalink
Post by Peter Cock
Hello all,
Despite a nasty cold, I've made further progress over the
weekend. Switching to assuming Python 3 style dictionaries
is a single biggest step forward - and as long as we have
good test coverage I think this is low risk. I think a dual
code base without needing 2to3 may be attainable for
the next Biopython release.
Nice :)
Post by Peter Cock
[...] something far more common is
StringIO, which also crops up in our doctests. e.g.
from StringIO import StringIO
... from StringIO import StringIO # Python 2
... from io import StringIO # Python 3
...
For the case of StringIO and BytesIO, the top-level io module was added in
Python 2.6:
http://docs.python.org/2/library/io.html

In Py2.6, the implementation of io.StringIO is slow, and io.BytesIO not
meaningfully different from io.StringIO, but both should be fine for
doctests.

-Eric
Peter Cock
2013-10-14 23:36:21 UTC
Permalink
Post by Eric Talevich
Post by Peter Cock
[...] something far more common is
StringIO, which also crops up in our doctests. e.g.
from StringIO import StringIO
... from StringIO import StringIO # Python 2
... from io import StringIO # Python 3
...
For the case of StringIO and BytesIO, the top-level
http://docs.python.org/2/library/io.html
Yes, but under Python 2 io.StringIO is unicode
based, while StringIO.StringIO & cStringIO.StringIO
are (bytes) string based. It isn't a drop in replacement
for our text based parsers.

Where we do want byte strings (e.g. binary formats
like SFF) then we can and now do use io.BytesIO
in place of the old StringIO usage - and that then
works nicely without change under Python3 as
well.
Post by Eric Talevich
In Py2.6, the implementation of io.StringIO is
slow, and io.BytesIO not meaningfully different
from io.StringIO, but both should be fine for
doctests.
Yeah - for doctests the speed of the different
StringIO options is immaterial.

Peter
Peter Cock
2013-10-20 14:32:16 UTC
Permalink
Hi all,

I've just made a pull request on dictionary method handling:
https://github.com/biopython/biopython/pull/248

Some comments over on GitHub (or here) would be great.

Thanks,

Peter
Peter Cock
2013-10-22 13:59:58 UTC
Permalink
Post by Peter Cock
Hi all,
https://github.com/biopython/biopython/pull/248
Some comments over on GitHub (or here) would be great.
Thanks,
Peter
Thanks for looking over that Eric, if there are no objections
I intend to rebase and apply the dictionary changes later this
week: https://github.com/biopython/biopython/pull/248

Separately, regarding the imports issue - do people have
a preference on the try/except as demonstrated here
https://github.com/biopython/biopython/pull/249 versus
a compatibility layer in Bio._py3k, or a bundled copy of
'six'?

e.g. https://github.com/peterjc/biopython/tree/builtins
e.g. https://github.com/peterjc/biopython/tree/six

Thanks,

Peter
Eric Talevich
2013-10-22 16:35:25 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
Hi all,
https://github.com/biopython/biopython/pull/248
Some comments over on GitHub (or here) would be great.
Thanks,
Peter
Thanks for looking over that Eric, if there are no objections
I intend to rebase and apply the dictionary changes later this
week: https://github.com/biopython/biopython/pull/248
Separately, regarding the imports issue - do people have
a preference on the try/except as demonstrated here
https://github.com/biopython/biopython/pull/249 versus
a compatibility layer in Bio._py3k, or a bundled copy of
'six'?
e.g. https://github.com/peterjc/biopython/tree/builtins
e.g. https://github.com/peterjc/biopython/tree/six
Thanks,
Peter
I just looked at the source code for six:
https://bitbucket.org/gutworth/six/src/db5564076aa8/six.py?at=default

It's very compact, much shorter than I expected but also quite dense. I get
the sense they've had enough eyes on the codebase to sort out performance
issues and edge cases, e.g. sys.MAXSIZE on Jython.

For docstrings, I agree that directly showing the try/except block is more
informative for users on either genus of Python. For the rest of the
codebase, I would favor using a bundled copy of six (e.g. Bio._six). The
benefits are (a) not having to discover and fix all the subtle bugs
ourselves, (b) to be explicit about where we've done something for Py2/3
compatibility and not as an essential part of the way the code is supposed
to work, and (c) six has its own documentation.

I also see some virtue in not relying on six/Bio._py3k where it's not
necessary, since six is compatible back to Python 2.4 and we only go back
to Python 2.6 now. Halfway approach: just look at six and copy only the
bits we need into _py3k?

-Eric
Peter Cock
2013-10-22 16:42:49 UTC
Permalink
Post by Eric Talevich
Post by Peter Cock
Separately, regarding the imports issue - do people have
a preference on the try/except as demonstrated here
https://github.com/biopython/biopython/pull/249 versus
a compatibility layer in Bio._py3k, or a bundled copy of
'six'?
e.g. https://github.com/peterjc/biopython/tree/builtins
e.g. https://github.com/peterjc/biopython/tree/six
Thanks,
Peter
https://bitbucket.org/gutworth/six/src/db5564076aa8/six.py?at=default
It's very compact, much shorter than I expected but also quite dense. I get
the sense they've had enough eyes on the codebase to sort out performance
issues and edge cases, e.g. sys.MAXSIZE on Jython.
They've fixed two little bugs I reported, but this remains open:
https://bitbucket.org/gutworth/six/issue/41/from-sixmovestkinter-import-and-similar

I'm avoiding it though:
https://github.com/biopython/biopython/commit/c36fdbaad432d477c64ad5768df7062340530176
Post by Eric Talevich
For docstrings, I agree that directly showing the try/except block is more
informative for users on either genus of Python.
Agreed.
Post by Eric Talevich
For the rest of the
codebase, I would favor using a bundled copy of six (e.g. Bio._six). The
benefits are (a) not having to discover and fix all the subtle bugs
ourselves, (b) to be explicit about where we've done something for Py2/3
compatibility and not as an essential part of the way the code is supposed
to work, and (c) six has its own documentation.
I also see some virtue in not relying on six/Bio._py3k where it's not
necessary, since six is compatible back to Python 2.4 and we only go back to
Python 2.6 now. Halfway approach: just look at six and copy only the bits we
need into _py3k?
OK, I'll focus in that direction then. Six is MIT licensed so we should
be fine bundling it or extracting snippets.

Thanks,

Peter
Peter Cock
2013-10-26 18:44:59 UTC
Permalink
Post by Peter Cock
Post by Eric Talevich
For docstrings, I agree that directly showing the try/except block is more
informative for users on either genus of Python.
Agreed.
Post by Eric Talevich
For the rest of the
codebase, I would favor using a bundled copy of six (e.g. Bio._six). The
benefits are (a) not having to discover and fix all the subtle bugs
ourselves, (b) to be explicit about where we've done something for Py2/3
compatibility and not as an essential part of the way the code is supposed
to work, and (c) six has its own documentation.
I also see some virtue in not relying on six/Bio._py3k where it's not
necessary, since six is compatible back to Python 2.4 and we only go back to
Python 2.6 now. Halfway approach: just look at six and copy only the bits we
need into _py3k?
OK, I'll focus in that direction then. Six is MIT licensed so we should
be fine bundling it or extracting snippets.
A new pull request for people to comment on, which eliminates
all but two important fixers. As a bonus this makes installation
under Python 3 much much quicker:

https://github.com/biopython/biopython/pull/250

I've not (yet) needed anything from the 'six' library.

Peter
Peter Cock
2013-11-02 14:39:20 UTC
Permalink
Post by Peter Cock
A new pull request for people to comment on, which eliminates
all but two important fixers. As a bonus this makes installation
https://github.com/biopython/biopython/pull/250
I take it there are no objections to me merging this work?
If not, I will try to do it tomorrow (Sunday) or early next week,
and move on to the final issues before we can drop 2to3
(which I do not anticipate being problematic).

Once we've dropped 2to3, I would like us to a do a beta
release (by mid November?), and then bar any problems,
ship that as Biopython 1.63 (hopefully by late November).

(There are plenty of things pending which would be great
to have in the next release, but these changes are so wide
ranging that I would prefer we focus primarily on Python 3
for the next release.)

Regards,

Peter
Peter Cock
2013-11-02 17:51:10 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
A new pull request for people to comment on, which eliminates
all but two important fixers. As a bonus this makes installation
https://github.com/biopython/biopython/pull/250
https://github.com/peterjc/biopython/tree/py3fixes
Post by Peter Cock
I take it there are no objections to me merging this work?
If not, I will try to do it tomorrow (Sunday) or early next week,
and move on to the final issues before we can drop 2to3
(which I do not anticipate being problematic).
They are not troublesome, perhaps worth applying soon too?
(Can anyone propose a more elegant solution to __nonzero__
versus __bool__ than to just define both?)

This allows us to stop using 2to3 (something NumPy have
also managed in their recent NumPy 1.8 release), making
installation of Biopython from source under Python 3.3
faster and much simpler:

https://github.com/peterjc/biopython/tree/py3more

Regards,

Peter
Eric Talevich
2013-11-02 18:23:31 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
Post by Peter Cock
A new pull request for people to comment on, which eliminates
all but two important fixers. As a bonus this makes installation
https://github.com/biopython/biopython/pull/250
https://github.com/peterjc/biopython/tree/py3fixes
Post by Peter Cock
I take it there are no objections to me merging this work?
If not, I will try to do it tomorrow (Sunday) or early next week,
and move on to the final issues before we can drop 2to3
(which I do not anticipate being problematic).
They are not troublesome, perhaps worth applying soon too?
(Can anyone propose a more elegant solution to __nonzero__
versus __bool__ than to just define both?)
This allows us to stop using 2to3 (something NumPy have
also managed in their recent NumPy 1.8 release), making
installation of Biopython from source under Python 3.3
https://github.com/peterjc/biopython/tree/py3more
Regards,
Peter
Cool, no objections here. Maybe after the next stable release we can look
into finalizing and merging the new GSoC work?

As far as I can tell the best way to handle __nonzero__ vs. __bool__ is
what you indicated:

class Foo(object):
def __bool__(self): return True
__nonzero__ = __bool__

And presumably the 2to3 converter would need to be disabled at the same
time, or it might try to rename __nonzero__ to __bool__ and get confused.

Cheers,
Eric
Peter Cock
2013-11-03 14:03:51 UTC
Permalink
Post by Eric Talevich
Post by Peter Cock
Post by Peter Cock
I take it there are no objections to me merging this work?
If not, I will try to do it tomorrow (Sunday) or early next week,
Applied https://github.com/biopython/biopython/pull/250
plus a follow up fix for Python 3 on Windows,
https://github.com/biopython/biopython/commit/9a265285e587f243814ffc05772b639051b31be1

(The buildbot server is earning its keep)
Post by Eric Talevich
Post by Peter Cock
Post by Peter Cock
and move on to the final issues before we can drop 2to3
(which I do not anticipate being problematic).
They are not troublesome, perhaps worth applying soon too?
(Can anyone propose a more elegant solution to __nonzero__
versus __bool__ than to just define both?)
This allows us to stop using 2to3 (something NumPy have
also managed in their recent NumPy 1.8 release), making
installation of Biopython from source under Python 3.3
https://github.com/peterjc/biopython/tree/py3more
Cool, no objections here. Maybe after the next stable release
we can look into finalizing and merging the new GSoC work?
Yes please.
Post by Eric Talevich
As far as I can tell the best way to handle __nonzero__ vs. __bool__ is what
def __bool__(self): return True
__nonzero__ = __bool__
OK, I've updated my change to use that style rather than
my initial version which literally defined it twice:

class Foo(object):
def __bool__(self): return True
def __nonzero__(self): return True
Post by Eric Talevich
And presumably the 2to3 converter would need to be disabled at the same
time, or it might try to rename __nonzero__ to __bool__ and get confused.
Yes indeed. This is all done as one commit:

https://github.com/biopython/biopython/commit/d6aa77c43bf9bab6302d880ad458b46f80e21c5e

That leaves one remaining fixer, unicode, which is addressed
here https://github.com/peterjc/biopython/tree/py3more in part
by dropping Python 3.2 so that we can use unicode literals.
We deliberately didn't claim to support Python 3.2 in order to
allow us this option.

Tiago or I will need to update the buildbot server to drop
Python 3.2 before applying that change...

Regards,

Peter
Tiago Antao
2013-11-03 16:30:15 UTC
Permalink
At Sun, 3 Nov 2013 14:03:51 +0000,
Post by Peter Cock
Tiago or I will need to update the buildbot server to drop
Python 3.2 before applying that change...
Ready to roll
Peter Cock
2013-11-03 19:28:05 UTC
Permalink
Post by Tiago Antao
At Sun, 3 Nov 2013 14:03:51 +0000,
Post by Peter Cock
Tiago or I will need to update the buildbot server to drop
Python 3.2 before applying that change...
Ready to roll
Lovely, thank you :)

Unicode changes pushed to master:
https://github.com/biopython/biopython/commit/0f85dd5f5a0ffd2e8c1f0c45a890e43c4d689f49

At this point we're not using any 2to3 fixers, so we could
remove the do2to3.py script etc - which is what the remaining
fixes on this branch do:

https://github.com/peterjc/biopython/tree/py3more

However, before I do that, are there any pending commits
which would still need 2to3 in the short term?

Thanks,

Peter
Peter Cock
2013-11-05 20:58:12 UTC
Permalink
Post by Peter Cock
At this point we're not using any 2to3 fixers, so we could
remove the do2to3.py script etc - which is what the remaining
https://github.com/peterjc/biopython/tree/py3more
However, before I do that, are there any pending commits
which would still need 2to3 in the short term?
Pushed to master, Biopython now runs on Python 2 and
Python 3 natively without using the 2to3 converter :)

https://github.com/biopython/biopython/commit/6b24a509b13e9df3c41fc211646d24382685e050
https://github.com/biopython/biopython/commit/a9cb8c10f68fe6d8bdccf34d8217838cc9d4db7f

Now let's focus on Biopython 1.63 (beta), which Tiago has
kindly offered to managed :)

http://lists.open-bio.org/pipermail/biopython-dev/2013-November/010955.html

Peter
Peter Cock
2013-11-05 20:58:12 UTC
Permalink
Post by Peter Cock
At this point we're not using any 2to3 fixers, so we could
remove the do2to3.py script etc - which is what the remaining
https://github.com/peterjc/biopython/tree/py3more
However, before I do that, are there any pending commits
which would still need 2to3 in the short term?
Pushed to master, Biopython now runs on Python 2 and
Python 3 natively without using the 2to3 converter :)

https://github.com/biopython/biopython/commit/6b24a509b13e9df3c41fc211646d24382685e050
https://github.com/biopython/biopython/commit/a9cb8c10f68fe6d8bdccf34d8217838cc9d4db7f

Now let's focus on Biopython 1.63 (beta), which Tiago has
kindly offered to managed :)

http://lists.open-bio.org/pipermail/biopython-dev/2013-November/010955.html

Peter

Peter Cock
2013-11-03 19:28:05 UTC
Permalink
Post by Tiago Antao
At Sun, 3 Nov 2013 14:03:51 +0000,
Post by Peter Cock
Tiago or I will need to update the buildbot server to drop
Python 3.2 before applying that change...
Ready to roll
Lovely, thank you :)

Unicode changes pushed to master:
https://github.com/biopython/biopython/commit/0f85dd5f5a0ffd2e8c1f0c45a890e43c4d689f49

At this point we're not using any 2to3 fixers, so we could
remove the do2to3.py script etc - which is what the remaining
fixes on this branch do:

https://github.com/peterjc/biopython/tree/py3more

However, before I do that, are there any pending commits
which would still need 2to3 in the short term?

Thanks,

Peter
Tiago Antao
2013-11-03 16:30:15 UTC
Permalink
At Sun, 3 Nov 2013 14:03:51 +0000,
Post by Peter Cock
Tiago or I will need to update the buildbot server to drop
Python 3.2 before applying that change...
Ready to roll
Peter Cock
2013-11-03 14:03:51 UTC
Permalink
Post by Eric Talevich
Post by Peter Cock
Post by Peter Cock
I take it there are no objections to me merging this work?
If not, I will try to do it tomorrow (Sunday) or early next week,
Applied https://github.com/biopython/biopython/pull/250
plus a follow up fix for Python 3 on Windows,
https://github.com/biopython/biopython/commit/9a265285e587f243814ffc05772b639051b31be1

(The buildbot server is earning its keep)
Post by Eric Talevich
Post by Peter Cock
Post by Peter Cock
and move on to the final issues before we can drop 2to3
(which I do not anticipate being problematic).
They are not troublesome, perhaps worth applying soon too?
(Can anyone propose a more elegant solution to __nonzero__
versus __bool__ than to just define both?)
This allows us to stop using 2to3 (something NumPy have
also managed in their recent NumPy 1.8 release), making
installation of Biopython from source under Python 3.3
https://github.com/peterjc/biopython/tree/py3more
Cool, no objections here. Maybe after the next stable release
we can look into finalizing and merging the new GSoC work?
Yes please.
Post by Eric Talevich
As far as I can tell the best way to handle __nonzero__ vs. __bool__ is what
def __bool__(self): return True
__nonzero__ = __bool__
OK, I've updated my change to use that style rather than
my initial version which literally defined it twice:

class Foo(object):
def __bool__(self): return True
def __nonzero__(self): return True
Post by Eric Talevich
And presumably the 2to3 converter would need to be disabled at the same
time, or it might try to rename __nonzero__ to __bool__ and get confused.
Yes indeed. This is all done as one commit:

https://github.com/biopython/biopython/commit/d6aa77c43bf9bab6302d880ad458b46f80e21c5e

That leaves one remaining fixer, unicode, which is addressed
here https://github.com/peterjc/biopython/tree/py3more in part
by dropping Python 3.2 so that we can use unicode literals.
We deliberately didn't claim to support Python 3.2 in order to
allow us this option.

Tiago or I will need to update the buildbot server to drop
Python 3.2 before applying that change...

Regards,

Peter
Eric Talevich
2013-11-02 18:23:31 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
Post by Peter Cock
A new pull request for people to comment on, which eliminates
all but two important fixers. As a bonus this makes installation
https://github.com/biopython/biopython/pull/250
https://github.com/peterjc/biopython/tree/py3fixes
Post by Peter Cock
I take it there are no objections to me merging this work?
If not, I will try to do it tomorrow (Sunday) or early next week,
and move on to the final issues before we can drop 2to3
(which I do not anticipate being problematic).
They are not troublesome, perhaps worth applying soon too?
(Can anyone propose a more elegant solution to __nonzero__
versus __bool__ than to just define both?)
This allows us to stop using 2to3 (something NumPy have
also managed in their recent NumPy 1.8 release), making
installation of Biopython from source under Python 3.3
https://github.com/peterjc/biopython/tree/py3more
Regards,
Peter
Cool, no objections here. Maybe after the next stable release we can look
into finalizing and merging the new GSoC work?

As far as I can tell the best way to handle __nonzero__ vs. __bool__ is
what you indicated:

class Foo(object):
def __bool__(self): return True
__nonzero__ = __bool__

And presumably the 2to3 converter would need to be disabled at the same
time, or it might try to rename __nonzero__ to __bool__ and get confused.

Cheers,
Eric
Peter Cock
2013-11-02 17:51:10 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
A new pull request for people to comment on, which eliminates
all but two important fixers. As a bonus this makes installation
https://github.com/biopython/biopython/pull/250
https://github.com/peterjc/biopython/tree/py3fixes
Post by Peter Cock
I take it there are no objections to me merging this work?
If not, I will try to do it tomorrow (Sunday) or early next week,
and move on to the final issues before we can drop 2to3
(which I do not anticipate being problematic).
They are not troublesome, perhaps worth applying soon too?
(Can anyone propose a more elegant solution to __nonzero__
versus __bool__ than to just define both?)

This allows us to stop using 2to3 (something NumPy have
also managed in their recent NumPy 1.8 release), making
installation of Biopython from source under Python 3.3
faster and much simpler:

https://github.com/peterjc/biopython/tree/py3more

Regards,

Peter
Peter Cock
2013-11-02 14:39:20 UTC
Permalink
Post by Peter Cock
A new pull request for people to comment on, which eliminates
all but two important fixers. As a bonus this makes installation
https://github.com/biopython/biopython/pull/250
I take it there are no objections to me merging this work?
If not, I will try to do it tomorrow (Sunday) or early next week,
and move on to the final issues before we can drop 2to3
(which I do not anticipate being problematic).

Once we've dropped 2to3, I would like us to a do a beta
release (by mid November?), and then bar any problems,
ship that as Biopython 1.63 (hopefully by late November).

(There are plenty of things pending which would be great
to have in the next release, but these changes are so wide
ranging that I would prefer we focus primarily on Python 3
for the next release.)

Regards,

Peter
Peter Cock
2013-10-26 18:44:59 UTC
Permalink
Post by Peter Cock
Post by Eric Talevich
For docstrings, I agree that directly showing the try/except block is more
informative for users on either genus of Python.
Agreed.
Post by Eric Talevich
For the rest of the
codebase, I would favor using a bundled copy of six (e.g. Bio._six). The
benefits are (a) not having to discover and fix all the subtle bugs
ourselves, (b) to be explicit about where we've done something for Py2/3
compatibility and not as an essential part of the way the code is supposed
to work, and (c) six has its own documentation.
I also see some virtue in not relying on six/Bio._py3k where it's not
necessary, since six is compatible back to Python 2.4 and we only go back to
Python 2.6 now. Halfway approach: just look at six and copy only the bits we
need into _py3k?
OK, I'll focus in that direction then. Six is MIT licensed so we should
be fine bundling it or extracting snippets.
A new pull request for people to comment on, which eliminates
all but two important fixers. As a bonus this makes installation
under Python 3 much much quicker:

https://github.com/biopython/biopython/pull/250

I've not (yet) needed anything from the 'six' library.

Peter
Peter Cock
2013-10-22 16:42:49 UTC
Permalink
Post by Eric Talevich
Post by Peter Cock
Separately, regarding the imports issue - do people have
a preference on the try/except as demonstrated here
https://github.com/biopython/biopython/pull/249 versus
a compatibility layer in Bio._py3k, or a bundled copy of
'six'?
e.g. https://github.com/peterjc/biopython/tree/builtins
e.g. https://github.com/peterjc/biopython/tree/six
Thanks,
Peter
https://bitbucket.org/gutworth/six/src/db5564076aa8/six.py?at=default
It's very compact, much shorter than I expected but also quite dense. I get
the sense they've had enough eyes on the codebase to sort out performance
issues and edge cases, e.g. sys.MAXSIZE on Jython.
They've fixed two little bugs I reported, but this remains open:
https://bitbucket.org/gutworth/six/issue/41/from-sixmovestkinter-import-and-similar

I'm avoiding it though:
https://github.com/biopython/biopython/commit/c36fdbaad432d477c64ad5768df7062340530176
Post by Eric Talevich
For docstrings, I agree that directly showing the try/except block is more
informative for users on either genus of Python.
Agreed.
Post by Eric Talevich
For the rest of the
codebase, I would favor using a bundled copy of six (e.g. Bio._six). The
benefits are (a) not having to discover and fix all the subtle bugs
ourselves, (b) to be explicit about where we've done something for Py2/3
compatibility and not as an essential part of the way the code is supposed
to work, and (c) six has its own documentation.
I also see some virtue in not relying on six/Bio._py3k where it's not
necessary, since six is compatible back to Python 2.4 and we only go back to
Python 2.6 now. Halfway approach: just look at six and copy only the bits we
need into _py3k?
OK, I'll focus in that direction then. Six is MIT licensed so we should
be fine bundling it or extracting snippets.

Thanks,

Peter
Eric Talevich
2013-10-22 16:35:25 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
Hi all,
https://github.com/biopython/biopython/pull/248
Some comments over on GitHub (or here) would be great.
Thanks,
Peter
Thanks for looking over that Eric, if there are no objections
I intend to rebase and apply the dictionary changes later this
week: https://github.com/biopython/biopython/pull/248
Separately, regarding the imports issue - do people have
a preference on the try/except as demonstrated here
https://github.com/biopython/biopython/pull/249 versus
a compatibility layer in Bio._py3k, or a bundled copy of
'six'?
e.g. https://github.com/peterjc/biopython/tree/builtins
e.g. https://github.com/peterjc/biopython/tree/six
Thanks,
Peter
I just looked at the source code for six:
https://bitbucket.org/gutworth/six/src/db5564076aa8/six.py?at=default

It's very compact, much shorter than I expected but also quite dense. I get
the sense they've had enough eyes on the codebase to sort out performance
issues and edge cases, e.g. sys.MAXSIZE on Jython.

For docstrings, I agree that directly showing the try/except block is more
informative for users on either genus of Python. For the rest of the
codebase, I would favor using a bundled copy of six (e.g. Bio._six). The
benefits are (a) not having to discover and fix all the subtle bugs
ourselves, (b) to be explicit about where we've done something for Py2/3
compatibility and not as an essential part of the way the code is supposed
to work, and (c) six has its own documentation.

I also see some virtue in not relying on six/Bio._py3k where it's not
necessary, since six is compatible back to Python 2.4 and we only go back
to Python 2.6 now. Halfway approach: just look at six and copy only the
bits we need into _py3k?

-Eric
Peter Cock
2013-10-22 13:59:58 UTC
Permalink
Post by Peter Cock
Hi all,
https://github.com/biopython/biopython/pull/248
Some comments over on GitHub (or here) would be great.
Thanks,
Peter
Thanks for looking over that Eric, if there are no objections
I intend to rebase and apply the dictionary changes later this
week: https://github.com/biopython/biopython/pull/248

Separately, regarding the imports issue - do people have
a preference on the try/except as demonstrated here
https://github.com/biopython/biopython/pull/249 versus
a compatibility layer in Bio._py3k, or a bundled copy of
'six'?

e.g. https://github.com/peterjc/biopython/tree/builtins
e.g. https://github.com/peterjc/biopython/tree/six

Thanks,

Peter
Peter Cock
2013-10-20 19:16:32 UTC
Permalink
Hi all,

I've made a pull request which solves all the remaining
Python 2 vs 3 imports using try/except:

https://github.com/biopython/biopython/pull/249

Some comments over on GitHub (or here) would be great.

Thanks,

Peter

[Yes, I've also got a 'six' branch which did this using
a bundled copy of the six library, but I'm not sure we
really need to bother with that. This seems lighter.]
Peter Cock
2013-10-20 14:32:16 UTC
Permalink
Hi all,

I've just made a pull request on dictionary method handling:
https://github.com/biopython/biopython/pull/248

Some comments over on GitHub (or here) would be great.

Thanks,

Peter
Peter Cock
2013-10-20 19:16:32 UTC
Permalink
Hi all,

I've made a pull request which solves all the remaining
Python 2 vs 3 imports using try/except:

https://github.com/biopython/biopython/pull/249

Some comments over on GitHub (or here) would be great.

Thanks,

Peter

[Yes, I've also got a 'six' branch which did this using
a bundled copy of the six library, but I'm not sure we
really need to bother with that. This seems lighter.]
Peter Cock
2013-10-14 23:36:21 UTC
Permalink
Post by Eric Talevich
Post by Peter Cock
[...] something far more common is
StringIO, which also crops up in our doctests. e.g.
from StringIO import StringIO
... from StringIO import StringIO # Python 2
... from io import StringIO # Python 3
...
For the case of StringIO and BytesIO, the top-level
http://docs.python.org/2/library/io.html
Yes, but under Python 2 io.StringIO is unicode
based, while StringIO.StringIO & cStringIO.StringIO
are (bytes) string based. It isn't a drop in replacement
for our text based parsers.

Where we do want byte strings (e.g. binary formats
like SFF) then we can and now do use io.BytesIO
in place of the old StringIO usage - and that then
works nicely without change under Python3 as
well.
Post by Eric Talevich
In Py2.6, the implementation of io.StringIO is
slow, and io.BytesIO not meaningfully different
from io.StringIO, but both should be fine for
doctests.
Yeah - for doctests the speed of the different
StringIO options is immaterial.

Peter
Eric Talevich
2013-10-14 20:34:00 UTC
Permalink
Post by Peter Cock
Hello all,
Despite a nasty cold, I've made further progress over the
weekend. Switching to assuming Python 3 style dictionaries
is a single biggest step forward - and as long as we have
good test coverage I think this is low risk. I think a dual
code base without needing 2to3 may be attainable for
the next Biopython release.
Nice :)
Post by Peter Cock
[...] something far more common is
StringIO, which also crops up in our doctests. e.g.
from StringIO import StringIO
... from StringIO import StringIO # Python 2
... from io import StringIO # Python 3
...
For the case of StringIO and BytesIO, the top-level io module was added in
Python 2.6:
http://docs.python.org/2/library/io.html

In Py2.6, the implementation of io.StringIO is slow, and io.BytesIO not
meaningfully different from io.StringIO, but both should be fine for
doctests.

-Eric
Peter Cock
2013-10-14 15:00:46 UTC
Permalink
Hello all,

Despite a nasty cold, I've made further progress over the
weekend. Switching to assuming Python 3 style dictionaries
is a single biggest step forward - and as long as we have
good test coverage I think this is low risk. I think a dual
code base without needing 2to3 may be attainable for
the next Biopython release.

However, before that, I'd like to take a moment to discuss
changing imports, e.g. Doc/examples/getgene.py

Do people prefer something explicit like this,

try:
import gdbm # Python 2
except ImportError:
from dbm import gnu as gdbm # Python 3

Or something via a helper library (e.g. our Bio._py3k or
a bundled copy of the six library):

from six import dbm_gnu as gdbm

That's a rare example, something far more common is
StringIO, which also crops up in our doctests. e.g.
from StringIO import StringIO
... from StringIO import StringIO # Python 2
... except ImportError:
... from io import StringIO # Python 3
...

Both via Bio._py3k, not ideal for a doctest as it is a
from Bio._py3k import StringIO
Both via six, not ideal if we're bundling it as Bio._six
from six import StringIO
Or, for a more common and more complex example, have a
look at how urllib has changed under Python 3. See some
of the commits here:

https://github.com/peterjc/biopython/tree/six

For docstrings, I actually prefer the explicit commented
version with the try/except. For the main code, using a
central helper like Bio._py3k or a bundled copy of six
makes sense from a code management perspective -
it would ensure consistency (and be easy to remove
once we drop Python 2 support).

Any thoughts?

Thanks,

Peter
Peter Cock
2013-10-06 21:50:18 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
387 no change, 161 need fixers
Troublesome ones 29.4%
The new batch of 2to3 issues solved is changes to
https://github.com/peterjc/biopython/tree/builtins
https://github.com/biopython/biopython/pull/246
I've added basestring and input to the builtins branch
(pull request updated), helps even more.
However, I realised I am effectively reimplementing the
MIT licensed 'six' library with 'Bio._py3k' and it would
be simpler to just use that instead (and that would make
life easier for contributors already using 'six' on other
https://pypi.python.org/pypi/six/
https://bitbucket.org/gutworth/six
http://pythonhosted.org/six/
Expect a slight reworking of these branches to appear
later, bundling a copy of 'six' ...
New branch is https://github.com/peterjc/biopython/tree/six
with 'six' bundled and using this for more import fixes.

Using that work, we're now at under a quarter of the files
needing 2to3 changes using the modified do2to3.py,
https://github.com/peterjc/biopython/tree/mark2to3c
https://travis-ci.org/peterjc/biopython/builds/12208302

416 no change, 132 need fixers
Troublesome ones 24.1%

Progress :)

Peter
Peter Cock
2013-10-06 14:03:00 UTC
Permalink
Post by Peter Cock
387 no change, 161 need fixers
Troublesome ones 29.4%
The new batch of 2to3 issues solved is changes to
https://github.com/peterjc/biopython/tree/builtins
https://github.com/biopython/biopython/pull/246
I've added basestring and input to the builtins branch
(pull request updated), helps even more.

However, I realised I am effectively reimplementing the
MIT licensed 'six' library with 'Bio._py3k' and it would
be simpler to just use that instead (and that would make
life easier for contributors already using 'six' on other
projects):

https://pypi.python.org/pypi/six/
https://bitbucket.org/gutworth/six
http://pythonhosted.org/six/

Expect a slight reworking of these branches to appear
later, bundling a copy of 'six' as Bio/_py3k/__init__.py

Peter
Peter Cock
2013-10-05 21:36:25 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
Post by Peter Cock
Assuming my methodology isn't flawed, we're about half way
in terms of getting every file in Biopython do be dual Python 2
262 no change, 290 need fixers
Troublesome ones at 52.5%
New numbers with Bio._py3k.urllib changes which should
have dropped the number of troublesome files by at most
374 no change, 177 need fixers
Troublesome ones 32.1%
I think my markup script is a bit fragile in terms of the exact
sequence of steps with do2to3.py etc. But much better
numbers than Sunday night :)
I wasn't using the -B switch in diff until now, that makes
383 no change, 171 need fixers
Troublesome ones 30.9%
https://github.com/peterjc/biopython/tree/mark2to3b
https://travis-ci.org/peterjc/biopython/builds/12175589
This is rebased on the master where I've also cut down the
number of fixers in use, so together we get a good speed
up for the Python 3 install time.
I've rebased the urllib changes (include in the above
https://github.com/biopython/biopython/pull/245
Peter
Incorporating another new feature branch gives:

387 no change, 161 need fixers
Troublesome ones 29.4%

The new batch of 2to3 issues solved is changes to
built in functions like range, zip, map, filter. Branch:
https://github.com/peterjc/biopython/tree/builtins
https://github.com/biopython/biopython/pull/246

Peter
Peter Cock
2013-10-05 19:02:47 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
Assuming my methodology isn't flawed, we're about half way
in terms of getting every file in Biopython do be dual Python 2
262 no change, 290 need fixers
Troublesome ones at 52.5%
New numbers with Bio._py3k.urllib changes which should
have dropped the number of troublesome files by at most
374 no change, 177 need fixers
Troublesome ones 32.1%
I think my markup script is a bit fragile in terms of the exact
sequence of steps with do2to3.py etc. But much better
numbers than Sunday night :)
I wasn't using the -B switch in diff until now, that makes
things easier:

383 no change, 171 need fixers
Troublesome ones 30.9%

Revised branch here:

https://github.com/peterjc/biopython/tree/mark2to3b
https://travis-ci.org/peterjc/biopython/builds/12175589

This is rebased on the master where I've also cut down the
number of fixers in use, so together we get a good speed
up for the Python 3 install time.

I've rebased the urllib changes (include in the above
test branch) and made a pull request for comment:
https://github.com/biopython/biopython/pull/245

Peter
Peter Cock
2013-09-30 16:18:21 UTC
Permalink
Post by Peter Cock
Assuming my methodology isn't flawed, we're about half way
in terms of getting every file in Biopython do be dual Python 2
262 no change, 290 need fixers
Troublesome ones at 52.5%
New numbers with Bio._py3k.urllib changes which should
have dropped the number of troublesome files by at most
13 files:

374 no change, 177 need fixers
Troublesome ones 32.1%

I think my markup script is a bit fragile in terms of the exact
sequence of steps with do2to3.py etc. But much better
numbers than Sunday night :)

Revised branch here:
https://github.com/peterjc/biopython/tree/mark2to3a
https://github.com/peterjc/biopython/commit/14f9ff121532ff92ec7bacc1867bdd058a6e8f74

Build and test times on the master vs this branch are
looking a lot better for Python 3 (although the numbers
for different TravisCI runs are not directly comparable),
and there is still a lot of room for improvement:

master:
https://travis-ci.org/biopython/biopython/builds/11965000

branch:
https://travis-ci.org/peterjc/biopython/builds/11968132

So that's good. However, are these urllib import fixes
an acceptable way forwards? Included in the above
branch and here:

https://github.com/peterjc/biopython/tree/urllib
https://github.com/peterjc/biopython/commit/1305387a5d98a5f3c7b83ca3db580b9e63dba851

Thanks,

Peter
Peter Cock
2013-09-29 23:22:52 UTC
Permalink
Post by Peter Cock
Post by Eric Talevich
Post by Peter Cock
Post by Peter Cock
# This file targets both Python 2 and Python 3 at the same time
# TODO - Targets Python 2 only (use 2to3 to run under Python 3)
Does the special comment line seem like a good solution?
On the plus side, it tracks any changes with the file being
updated (which wouldn't happen with a list in the do2to3.py
file).
Hi Peter,
This looks like a good way to move forward overall. Regarding the special
comment lines -- since these are only used in do2to3.py, would it be
cleaner to keep a hard-coded list of filenames in do2to3.py and leave the
modules and scripts alone? Are there any characteristics that would make it
difficult to determine whether a given module or script is Py3-compliant?
Hi Eric,
There are import time problems which are easy to spot - in particular
SyntaxError is a good clue. However, many of the issues are only
really found at run time (e.g. different method names). This means
that the tests (which I started with) are actually the easiest to check.
Right now I don't have a feel for what fraction of the main Bio/* and
BioSQL/* files can be made dual-coding, and that would have an
influence on how best to tag things needing 2to3 or not. I'm happy
to continue this on branches for a while longer and find out.
Assuming my methodology isn't flawed, we're about half way
in terms of getting every file in Biopython do be dual Python 2
and Python 3 code:

262 no change, 290 need fixers
Troublesome ones at 52.5%

This is based on there being a difference between the pre-
and post-2to3 conversion (discounting removing future imports)
This is an over estimate as often the 2to3 script makes
unnecessary changes.

This is after applying a *lot* of little changes to our codebase,
things like removing unneeded use of my_dict.keys() which
the 2to3 fixers are over cautious in wrapping as list(my_dict.keys())
- I would like to do a beta before the next release.
Post by Peter Cock
I do like the idea of a special #TODO comment line where 2to3
is still needed - it is symbolic of where I want the code base to go ;)
That's what is going on in this revised branch - if the special
#TODO comment is there, then 2to3 is used, otherwise we
assume the file is already OK to use under Python 3:

https://github.com/peterjc/biopython/tree/mark2to3

This is now quicker to install under Python 3, but there is
plenty of scope for speed optimisation (e.g. requiring the
magic marker is in the first (say) 20 lines of the file, and
expanding the magic marker to list the specific 2to3 fixers
required and running just those.

Regards,

Peter
Peter Cock
2013-09-08 20:52:40 UTC
Permalink
Post by Eric Talevich
Post by Peter Cock
Post by Peter Cock
# This file targets both Python 2 and Python 3 at the same time
# TODO - Targets Python 2 only (use 2to3 to run under Python 3)
Does the special comment line seem like a good solution?
On the plus side, it tracks any changes with the file being
updated (which wouldn't happen with a list in the do2to3.py
file).
Hi Peter,
This looks like a good way to move forward overall. Regarding the special
comment lines -- since these are only used in do2to3.py, would it be
cleaner to keep a hard-coded list of filenames in do2to3.py and leave the
modules and scripts alone? Are there any characteristics that would make it
difficult to determine whether a given module or script is Py3-compliant?
Hi Eric,

There are import time problems which are easy to spot - in particular
SyntaxError is a good clue. However, many of the issues are only
really found at run time (e.g. different method names). This means
that the tests (which I started with) are actually the easiest to check.

Right now I don't have a feel for what fraction of the main Bio/* and
BioSQL/* files can be made dual-coding, and that would have an
influence on how best to tag things needing 2to3 or not. I'm happy
to continue this on branches for a while longer and find out.

I do like the idea of a special #TODO comment line where 2to3
is still needed - it is symbolic of where I want the code base to go ;)

Regards,

Peter
Eric Talevich
2013-09-07 19:17:08 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
This branch is trying out marking individual Python files
as dual coding (Python 2 and Python 3) or as Python 2
https://github.com/peterjc/biopython/tree/tag2to3
Currently the tags are two special hash comment lines
expected near the start of the file itself (rather than a
list within the do2to3.py script). The actual text of the
marker isn't critical - perhaps these need full stops?
# This file targets both Python 2 and Python 3 at the same time
# TODO - Targets Python 2 only (use 2to3 to run under Python 3)
[...]
Post by Peter Cock
Post by Peter Cock
As of right now, on this branch there are only 8 files under
Down to six files under Tests now if I rebase the branch
to include the recent fixes on the master.
Post by Peter Cock
Having I hope demonstrated this will work, I'd like some
feedback before applying this (or a modified version of
it) to the master branch.
I've started applying individual code fixes to the master
to improve Python 2 and 3 compatibility already.
I'm specifically looking for thoughts on how to handle
the transition period when some of our code will still
need 2to3, while other code will not.
Does the special comment line seem like a good solution?
On the plus side, it tracks any changes with the file being
updated (which wouldn't happen with a list in the do2to3.py
file).
Peter
Hi Peter,

This looks like a good way to move forward overall. Regarding the special
comment lines -- since these are only used in do2to3.py, would it be
cleaner to keep a hard-coded list of filenames in do2to3.py and leave the
modules and scripts alone? Are there any characteristics that would make it
difficult to determine whether a given module or script is Py3-compliant?

-Eric
Peter Cock
2013-09-07 11:30:50 UTC
Permalink
Post by Peter Cock
Post by Peter Cock
Short term we will continue with developing using Python 2
syntax and running 2to3 for Python 3. As far as I know,
the reverse process with 3to2 is not well established. If
anyone wants to investigate that would be useful as
another option. However, dropping Python 2.5 support
makes things more flexible...
Medium term I believe it would be possible to have a single
code base which is both valid Python 2 and 3 at the same
time. This may require us to target 2.7 and 3.3+ only - we'll
have to try it and see if Python 2.6 will hold us back.
I've actually done this with lzma.backports, a small but
https://pypi.python.org/pypi/backports.lzma/
https://github.com/peterjc/backports.lzma
Python 3.3 reintroduces some features designed to make
this more straightforward, like unicode literals (missing in
the early versions of Python 3). This is why I'd like to drop
Python 3.2 as soon as possible.
What I was thinking is we can start migrating modules on a
case by case basis from "Python 2 syntax" to "Dual syntax"
one by one, with a white-list in the do2to3.py script. That
way over time less and less modules need to be converted
via 2to3, and "python3 setup.py install" will get faster,
until eventually we can stop using 2to3 at all.
This conversion could consider the code and doctests
separately. However, using using print(example) we can
hopefully get most of the doctests and Tutorial examples
to work under both Python 2 and 3 at the same time.
That's my current thinking anyway - and I think the fact
that it would be a gradual migration from writing Python 2
specific code to writing dual 2/3 code makes it low risk
(as long as we're continuing to run regular testing).
Regards,
Peter
This branch is trying out marking individual Python files
as dual coding (Python 2 and Python 3) or as Python 2
https://github.com/peterjc/biopython/tree/tag2to3
Currently the tags are two special hash comment lines
expected near the start of the file itself (rather than a
list within the do2to3.py script). The actual text of the
marker isn't critical - perhaps these need full stops?
# This file targets both Python 2 and Python 3 at the same time
# TODO - Targets Python 2 only (use 2to3 to run under Python 3)
The first main issues thus far have been print statements,
where we will either need to use the __future__ import or
restrict ourselves to simple single argument calls - I have
been using the later. This should not be a big problem on the
main code, and we ought to update the print-and-compare
unit tests anyway,
e.g.
https://github.com/biopython/biopython/commit/6fa766e2348eae4e083503885f4ea5b66f531d7a
Post by Peter Cock
The next common issue is import statements, for
example StringIO (another bytes versus unicode issue).
That can be handled via Bio._py3k in some cases.
For StringIO,
https://github.com/biopython/biopython/commit/b09ebbf6f8c4032f874d89a91d199d8697c2d381

For commands.getoutput used in many tests,
https://github.com/biopython/biopython/commit/11a1eca60e7a1491dbe54204ad3103e013bfebc5
Post by Peter Cock
A third major class of issues in the unit tests so
far is iterators versus lists, for example dictionary
methods and the map function's return value. These
can be tackled on a case by case basis I think - often
by adding the occasional list(...) or sorted(x) instead
of trying x.sorted() is enough.
e.g. for sorting dictionary keys,
https://github.com/biopython/biopython/commit/b27f30012af6e66f6f143ecde719bf72609af8f2

e.g. for avoiding iterators from map function,
https://github.com/biopython/biopython/commit/730850e3f4e88a70860e56abafbb579b25414f06
Post by Peter Cock
There are also quite a few instances of 'basestring'
which might be handled via _py3k?
As of right now, on this branch there are only 8 files under
Down to six files under Tests now if I rebase the branch
to include the recent fixes on the master.
Post by Peter Cock
Having I hope demonstrated this will work, I'd like some
feedback before applying this (or a modified version of
it) to the master branch.
I've started applying individual code fixes to the master
to improve Python 2 and 3 compatibility already.

I'm specifically looking for thoughts on how to handle
the transition period when some of our code will still
need 2to3, while other code will not.

Does the special comment line seem like a good solution?
On the plus side, it tracks any changes with the file being
updated (which wouldn't happen with a list in the do2to3.py
file).

Peter
Peter Cock
2013-09-06 15:44:44 UTC
Permalink
Post by Peter Cock
http://lists.open-bio.org/pipermail/biopython/2013-May/008601.html
Thank you for all the comments so far, don't stop yet :)
On Thu, May 30, 2013 at 1:51 PM, Wibowo Arindrarto
Hi everyone,
I'm leaning towards insisting on Python >=3.3 support (I'm running
3.3.2). I suppose that even if Python3.3 is not available on a machine
or through the default package manager, it's always installable on its
own. If that's not the case, I imagine Python2.x is most likely
present in these machines (so Biopython can still be used).
True.
So far everyone who has replied (including some off list) have said
they are using Python 3.3 which is encouraging. Thank you for
the comments so far.
It looks like we can forget about Python 3.1, and just need to
decide if it is worth including Python 3.2.5 in the short term.
On a related note, do we have a defined timeline on when we
would drop support for Python2.x? Are there any plans to have
our codebase written in Python3.x instead of Python2.x?
Nothing concrete planned, no. I'll reply in more detail on the
biopython-dev list as I do have some thoughts about this.
Good question Bow,
I think people will still be using Python 2 a year or two from
now, so we must support both for some time.
Biopython 1.62 (next week perhaps?)
- Final release with Python 2.5 support
- Official support for Python 2.5, 2.6, 2.7 and 3.3
- Possibly official support for Python 3.2.5+ as well?
(Exactly which versions of Python 3 we'll include to be
decided, see the other thread for that discussion.)
Short term we will continue with developing using Python 2
syntax and running 2to3 for Python 3. As far as I know,
the reverse process with 3to2 is not well established. If
anyone wants to investigate that would be useful as
another option. However, dropping Python 2.5 support
makes things more flexible...
Medium term I believe it would be possible to have a single
code base which is both valid Python 2 and 3 at the same
time. This may require us to target 2.7 and 3.3+ only - we'll
have to try it and see if Python 2.6 will hold us back.
I've actually done this with lzma.backports, a small but
https://pypi.python.org/pypi/backports.lzma/
https://github.com/peterjc/backports.lzma
Python 3.3 reintroduces some features designed to make
this more straightforward, like unicode literals (missing in
the early versions of Python 3). This is why I'd like to drop
Python 3.2 as soon as possible.
What I was thinking is we can start migrating modules on a
case by case basis from "Python 2 syntax" to "Dual syntax"
one by one, with a white-list in the do2to3.py script. That
way over time less and less modules need to be converted
via 2to3, and "python3 setup.py install" will get faster,
until eventually we can stop using 2to3 at all.
This conversion could consider the code and doctests
separately. However, using using print(example) we can
hopefully get most of the doctests and Tutorial examples
to work under both Python 2 and 3 at the same time.
That's my current thinking anyway - and I think the fact
that it would be a gradual migration from writing Python 2
specific code to writing dual 2/3 code makes it low risk
(as long as we're continuing to run regular testing).
Regards,
Peter
This branch is trying out marking individual Python files
as dual coding (Python 2 and Python 3) or as Python 2
only requiring conversion via 2to3 for use on Python 3:

https://github.com/peterjc/biopython/tree/tag2to3

Currently the tags are two special hash comment lines
expected near the start of the file itself (rather than a
list within the do2to3.py script). The actual text of the
marker isn't critical - perhaps these need full stops?

# This file targets both Python 2 and Python 3 at the same time
# TODO - Targets Python 2 only (use 2to3 to run under Python 3)

The first main issues thus far have been print statements,
where we will either need to use the __future__ import or
restrict ourselves to simple single argument calls - I have
been using the later. This should not be a big problem on the
main code, and we ought to update the print-and-compare
unit tests anyway,

The next common issue is import statements, for
example StringIO (another bytes versus unicode issue).
That can be handled via Bio._py3k in some cases.

A third major class of issues in the unit tests so
far is iterators versus lists, for example dictionary
methods and the map function's return value. These
can be tackled on a case by case basis I think - often
by adding the occasional list(...) or sorted(x) instead
of trying x.sorted() is enough.

There are also quite a few instances of 'basestring'
which might be handled via _py3k?

As of right now, on this branch there are only 8 files under
Tests which require conversion via 2to3 :

Tests/common_BioSQL.py
Tests/seq_tests_common.py
Tests/test_NCBI_qblast.py
Tests/test_SCOP_Cla.py
Tests/test_seq.py
Tests/test_SeqIO.py
Tests/test_SeqIO_index.py
Tests/test_Uniprot.py

Having I hope demonstrated this will work, I'd like some
feedback before applying this (or a modified version of
it) to the master branch.

Any thoughts? Thanks,

Peter
Continue reading on narkive:
Loading...