espressomd-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Discussion: Switching Esprseso to shared memory parallelization


From: Rudolf Weeber
Subject: Re: Discussion: Switching Esprseso to shared memory parallelization
Date: Tue, 13 Jul 2021 08:23:24 +0200

Hi Steffen,

thank you for the detailed feedback and the points you raise.

> 1) You asked about scenarios that need MPI parallelism. At an IPVS+ITV (U. 
> Stuttgart) collaboration, we perform large simulations
> that subject particles to a background flow field. Within that flow field, we 
> want include multiple scales of turbulence.
> These simulations need MPI parallelism as they have millions of particles and 
> millions of bonds. This data does simply not fit into the RAM of a single 
> node. 
This project was, to my knowledge, the biggest system that was ever run on 
Espresso by a very wide margin. The systems for which Espresso is typically 
used at the ICP are, to my understanding, all below 100k particles.

Those of you running big systems, please speak up, so we are aware of it.
Also, if you would like to run bigger systems but don't due to performance 
issues, that would be of interest.


> 2) You talk about "HPC nodes" having about 20-64 cores. This is certainly 
> true. I just want to make the remark that with a shared-memory paralleization 
> there will be no more HPC nodes for ESPResSo users. When applying for runtime 
> at an HPC center, you have to detail about the parallelization and the 
> scalability of your code. If you run on one node only they will most likely 
> turn you down and you are left with your local workstation.
I should have written "node on a cluster". Espresso simulations would typicall 
go to tier 3 systems (i.e., university or regional clusters). But you are right 
that actually removing the possibility to run bigger systems and therefore not 
being seen as part of the HPC community may well be an issue.

> 3) While I see your point that the current MPI parallelization might not be 
> the easiest to understand and roll out, I want to make it clear that devising 
> a well-performing shared-memory parallelization is not a trivial matter, too. 
> "Sprinkling in" a couple of "#pragma omp parallel for" will certainly not be 
> enough. As with the distributed-memory parallelization you will have to 
> devise a spatial domain decomposition and come up with a workload 
> distribution between the threads. You will have to know which threads imports 
> data from others and devise locking mechanisms to guard these accesses. 
> Reasoning about this code and debugging it might turn out to be as hard as 
> for the MPI-based code. If you want to go down this path, I strongly suggest 
> not reinventing the wheel and taking a look at, e.g., the AutoPAS [1] project.
This is a valid point. Before we make any decision, we will definitely have 
some sort of technical preview/prototype to see what can be achieved at 
acceptable levels of complexity and performance.

In my personal opinion, with Esprseso, the aim is for ease of extensibility 
rather than for best performance. 
How do other people in the community see this?


Some areas where I would hope for simplifications in a purely shared memory 
code:

* Relation between Python and C++-objects: In a shared memory code, the Python 
object could directly own the core object. In an MPI-simulation, an 
intermediate layer creates and manages mirror objects on the remote processes.
* Due to the intermediate layer and the mirror objects, checkpointing and 
restoring a simulation is very difficult. We have currently disabled it for 
certain features.
* We may not need (or have to replace by PFFT) the custom 3D FFt in the 
electrostatic and dipolar P3M. To my understanding, there is a thread-parallel 
drop-in replacement for FFTW. (1000 lines of code, currently)
* Bonds and virtual sites with a range much larger than the Lennard-Jones 
cutoff would not force a larger cell size (thereby slowing the short-range 
calculation)
* We would probably not need most of the ghost communication code (about 600 
lines) as cells across boundaries can be linked directly.
* We would probably not need most of the parallel callback and particle setup 
code (about 1500 lines of code, + some 90 callbacks scattered throughotu the 
codebase)

Of course, all of this would need to be investigated in more detail, before a 
decision was made.

 
> One particular problem that I encountered in the past and that I want to 
> briefly mention here is bonds: They are only stored on one of the two (or 
> more) involved particles. This is one of the reasons, why ESPResSo currently 
> needs to communicate the forces back after calculating them and you will 
> certainly need measures that deal with this circumstance in a shared-memory 
> parallel code. Such details will increase the complexity of a shared-memory 
> parallel code and it might end up not being easy to understand for newcomers 
> or make it hard to implement new features, too.
I agree. Although, in a shared memory code, there is the option to run some 
stuff serially and still getting the benefit of the parallelized short-range 
and bond loop and electrostatics.

You are right. The bond storage will almost certainly have to be changed. 
Otherwise the bond loop cannot be executed in parallel without requiring all 
access to particle force to be atomic.
If we stay with MPI, as you point out, this would eliminate one ghost 
communication per time step.
By now, the bond storage has been abstracted somewhat, so this change is 
probably doable now.

Thank you for pointing out AutoPas. Changing particle storage to 
struct-of-arrays would be extremely beneficial for performance. 


Thank you again for sharing your thoughts! Hearing different points of view is 
very important for us to make good decisions on Espresso development.

There is certainly the need for further discussion and experimentation before 
we decide on the future parallelization paragdigm.
The purpose of my post was to get an idea of how many use cases there actually 
are for very big systems, in the hope that it might help us to direct our 
(limited) resources to where they are most needed.


Regards, Rudolf




reply via email to

[Prev in Thread] Current Thread [Next in Thread]