[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Obsolescence Python 3.5 support plan
From: |
Eric L. Zolf |
Subject: |
Re: Obsolescence Python 3.5 support plan |
Date: |
Wed, 10 Jun 2020 07:31:46 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 |
Hi Derek,
On 09/06/2020 16:24, Derek Atkins wrote:
> EricZolf <ewl+rdiffbackup@lavar.de> writes:
>
>> Actually, while writing this e-mail, I checked the code and noticed that
>> it's enforcing pickle version 1 so the issue isn't the pickle protocol,
>> the issue is solely the bytes vs. str vs. unicode change between python
>> 2 and 3. There is not much we could have done about it IMHO.
>
> I am not a python guru, but I am a long-time programmer in multiple
> languages and lived through the ISO-8859 -> UTF-8 changes a couple
> decades ago and remember all the hoops we had to jump through to keep
> compatibility. Could you explain what the "bytes vs str vs unicode"
> issue is/was and why there is (was?) no way to perform a conversion?
Nothing is impossible, the question is rather how much effort you put
into it.
You can find zillions of explanations on the web about what happened
between Python 2 and Python 3 (py2 and py3 for short), but my view in short:
- despite the same name, str in py2 is not str in py3 but rather bytes
(but not quite, sometimes they're the same conceptually, but still
different objects)
- unicode in py2 disappeared in py3 and became a sub-case of str in py3
- so unicode + str in py2 became str + bytes in py3 with an imperfect
overlap, and no way to absolutely decide how to convert.
- i.e. the context needs to be considered to decide if a py2-str needs
to become a py3-str or a py3-bytes, this would need analyzing the code
line by line and deciding based on the context. Effort goes through the
rough!
- to make things even more complicated, as they are not the same objects
even if they are called the same, pickle interprets them differently:
* Python 3:
>>> pickle.dumps('xxx',1)
b'X\x03\x00\x00\x00xxxq\x00.'
>>> pickle.dumps(b'xxx',1)
b'c_codecs\nencode\nq\x00(X\x03\x00\x00\x00xxxq\x01X\x06\x00\x00\x00latin1q\x02tq\x03Rq\x04.'
>>> pickle.dumps(u'xxx',1)
b'X\x03\x00\x00\x00xxxq\x00.'
* Python 2:
>>> pickle.dumps('xxx',1)
'U\x03xxxq\x00.'
>>> pickle.dumps(b'xxx',1)
'U\x03xxxq\x00.'
>>> pickle.dumps(u'xxx',1)
'X\x03\x00\x00\x00xxxq\x00.'
And then there was the requirement to also support files with broken
encoding, which came up during the migration, and the misery was
complete, as this forced usage of bytes, completely unknown to py2...
Hope this helps understand the complexity of what you are/were asking.
KR, Eric
>
> -derek
>