[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Mpi and cuda
From: |
Martin Kaiser |
Subject: |
Re: Mpi and cuda |
Date: |
Wed, 23 Sep 2020 15:02:04 +0200 |
Dear Rudolf,
Do I understand correctly that I should check whether it is worth to use MPI to
speed up my integration? If so, then on 4 cores I get a speed up of around a
factor of 2, just from a simple measurement of using LB_GPU and MPI at the same
time, so it would be worth for me to make that work.
I will look for a rather simple solution to free up the memory of the GPU if
the job is done or breaks, that’s for now my only concern, as simulation
results seem fine otherwise.
Thanks for the answer!
Best, Martin
> On 22.09.2020, at 16:05, Rudolf Weeber <weeber@icp.uni-stuttgart.de> wrote:
>
> Hi Martin,
>
> all the GPU stuff runs from the head node. The other nodes probably still
> load the driver, so that's why you see them in the profile.
> The GPU work overlaps with the CPU work in time, but some extra communication
> is needed to gather the full system on the head node and send it to the GPU.
> Before using LB GPU with MPI parallel simulation, it might be worthwhile to
> put timings around the integration
>
> ```python
> import time
>
>
> ...
> tick =time.time()
> system.integrator.run(steps)
> tock = time.time()
> print("Time per step (s):",(tock-tick)/steps)
> ```
>
>
> Regards, Rudolf
>
>
>
>
> On Tue, Sep 22, 2020 at 03:31:33PM +0200, Martin Kaiser wrote:
>> Hello everybody,
>>
>> I have a technical question about using the open MPI and CUDA
>> implementations at the same time.
>> If I start my GPU accelerated espresso script in MPI, with the standard
>> command like this:
>>
>> mpirun -n 4 espresso script.py;
>>
>> then 4 instances of the same job are started on my GPU, of which only one is
>> actually doing some work on the GPU. If I monitor the usage with
>> "nvidia-smi”, I get something like this:
>>
>> GPU GI CI PID Type Process name GPU Memory
>> 1 N/A N/A 26365 C /usr/bin/python3 207MiB
>> 1 N/A N/A 26366 C /usr/bin/python3 129MiB
>> 1 N/A N/A 26367 C /usr/bin/python3 129MiB
>> 1 N/A N/A 26368 C /usr/bin/python3 129MiB
>>
>> Additionally, if I kill this job, not all of the instances on the GPU are
>> aborted, meaning that it is not freeing the memory on the card.
>> Is there something I am doing wrong with how I compile or call Espresso? Or
>> is it that the MPI implementation is not “aware of cuda” and instancing
>> copies of the same job on the GPU.
>>
>> Thanks for the help,
>> Martin
>
> --
> Dr. Rudolf Weeber
> Institute for Computational Physics
> Universität Stuttgart
> Allmandring 3
> 70569 Stuttgart
> Germany
> Phone: +49(0)711/685-67717
> Email: weeber@icp.uni-stuttgart.de
> http://www.icp.uni-stuttgart.de/~icp/Rudolf_Weeber
>