espressomd-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ESPResSo-users] Checkpoint Read-Fail in Python Version


From: Georg Rempfer
Subject: Re: [ESPResSo-users] Checkpoint Read-Fail in Python Version
Date: Wed, 17 May 2017 15:42:33 +0200

If every checkpoint automatically contained the git commit id, the user would at least have a way of running their old simulations.

On Wed, May 17, 2017 at 3:20 PM, Michael Kuron <address@hidden> wrote:
Dear Joost,

this appears to be related to my recent change, https://github.com/espressomd/espresso/pull/1176, which was necessary to fix some issues with the documentation. I am able to reproduce the issue in isolation with the code below. You seem to have run into https://stackoverflow.com/questions/10036565/possible-to-unpickle-class-instances-after-being-converted-from-old-to-new-style, for which there is no usable workaround.

###################################################
import pickle

class Newstyle(object):
        def __init__(self, num):
                self.num = num

class Oldstyle:
        def __init__(self, num):
                self.num = num

o = Oldstyle(1)
n = Newstyle(1)

c = pickle.loads(pickle.dumps(o).replace("Oldstyle", "Newstyle"))
###################################################

In general, checkpoints obtained via pickle should currently be considered incompatible between different Espresso versions. This isn't even limited to the specific problem you ran into (where you got an error message), but can also occur when the data contents of a class change between versions: if a new member variable is added to a class, and you load a checkpoint that was created without that variable present, the variable will be missing (if it is created from inside a member function) or set to its default value (if it is created from the class body). The former would lead to a NameError, while the latter could silently lead to behavior different from what you expect.

The short-term solution for you would be to perform "git revert a53109b" on your local repository. This fixes the specific error you are seeing, but there is a low chance you may run into the (hypothetical) issue I described in the previous paragraph.

There are three solutions to fix the issue going forward that I can think of:
1. We can accept that checkpoints are version-specific. This means that checkpoint.load should display a warning if a checkpoint from a different version is loaded.
2. We can switch to a different storage mechanism not based on pickle.
3. We can make a policy that requires everyone who wants to add a new member variable to a class to ensure that the behavior of the class is unchanged if the variable is either missing or set to its default value.
#1 violates common expectations. #2 would reinstate something like the old blockfile format and be a lot less convenient than pickle. #3 puts a big burden on pull request reviewers and it may not always be obvious when something breaks compatibility.

Michael


On 17.05.2017 12:50, Joost de Graaf wrote:
Dear All,

I recently updated to the latest version of ESPResSo (commit a08b6817a9ca54bb1a88516642b251f191400a17), after having had to recompile ESPResSo and to deal with issues in the older version that I was running -- sorry, don't recall the commit, but it is from last year -- after the latest update of the OS. Using this new pypresso version I have tried to restart my runs. However, I receive the following error message:

=======================================

Traceback (most recent call last):
   File "gel_coll_long_bulk.py", line 470, in <module>
     checkpoint.load(checkpoint_index=1)
   File "/home/jgraaf/PYTHON/gitgel/src/python/espressomd/checkpointing.py", line 159, in load
     checkpoint_data = pickle.load(checkpoint_file)
   File "espressomd/analyze.pyx", line 44, in espressomd.analyze.Analysis.__init__ (/home/jgraaf/PYTHON/gitgel/src/python/espressomd/analyze.cpp:2402)
TypeError: ('__init__() takes exactly 2 positional arguments (1 given)', <class 'espressomd.analyze.Analysis'>, ())

=======================================

Checkpoints that are created using the current version of ESPResSo work just fine, i.e., I can stop and restart my simulation using them. So, my guess is that something must have changed in the way checkpoints are written away or read in, between my old version and the latest one. I checked the version history on github and saw that Flo made changes to the checkpointing file about 3 months ago, but did not get much further than that.

This is, of course, a bit annoying, as it would lead to significant loss of data/compute time. Since the checkpoints are in binary, it would also be difficult to edit them manually, even if I knew what I was looking for to change. Besides there are a huge number of them, so manual editing is not really an option. Perhaps one of you has already had to deal with checkpoint conversion? Alternatively, I can go back in commits until the checkpoints work, but that would require fixing compilation issues that crop up along the way, so not the most desirable of choices. Do you have any suggestions?

I hope to hear from you soon.

Best Wishes,

Joost



reply via email to

[Prev in Thread] Current Thread [Next in Thread]