bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

GNU Parallel Bug Reports memfree option and memory starvation


From: Olivier Bilodeau
Subject: GNU Parallel Bug Reports memfree option and memory starvation
Date: Fri, 24 Feb 2017 22:23:09 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1

Hi,

--memfree documentation states:

> If the jobs take up very different amount of RAM, GNU parallel will
> only start as many as there is memory for. If less than size bytes
> are free, no more jobs will be started. If less than 50% size bytes
> are free, the youngest job will be killed, and put back on the queue
> to be run later.

However, without the --retries option it doesn't seem to be the case.

Python script used for memory testing (see attached use-mem.py).

Parallel command:
parallel -n1 -j 80% --memfree 2G --joblog joblog.log ./use-mem.py {} :::
{1..15}

joblog
Seq     Host    Starttime       JobRuntime      Send    Receive Exitval
Signal  Command
10      :       1487989705.822      99.963      0       0       -1
15      ./use-mem.py 10
5       :       1487989705.781     214.866      0       18      0
9       ./use-mem.py 5
8       :       1487989705.806     272.305      0       18      0
9       ./use-mem.py 8
9       :       1487989705.814     283.846      0       18      -1
15      ./use-mem.py 9
1       :       1487989705.744     293.559      0       18      0
0       ./use-mem.py 1
2       :       1487989705.755     293.921      0       18      0
0       ./use-mem.py 2
3       :       1487989705.764     293.913      0       18      0
0       ./use-mem.py 3
4       :       1487989705.773     294.098      0       18      0
0       ./use-mem.py 4
6       :       1487989705.789     294.106      0       18      0
0       ./use-mem.py 6
7       :       1487989705.798     294.892      0       18      0
0       ./use-mem.py 7
15      :       1487989999.631      17.871      0       0       -1
15      ./use-mem.py 15
14      :       1487989999.304      43.278      0       19      0
0       ./use-mem.py 14
13      :       1487989999.182      43.855      0       19      0
0       ./use-mem.py 13
12      :       1487989999.171      47.854      0       19      0
0       ./use-mem.py 12
11      :       1487989998.949      53.799      0       19      0
0       ./use-mem.py 11

output
seq 5 / pid 15772
seq 8 / pid 15787
seq 9 / pid 15792
seq 1 / pid 15752
seq 2 / pid 15757
seq 3 / pid 15762
seq 4 / pid 15767
seq 6 / pid 15777
seq 7 / pid 15782
seq 14 / pid 16645
seq 13 / pid 16640
seq 12 / pid 16635
seq 11 / pid 16630

Jobs were killed and not requeued.

When I added --retries, then no more failed jobs (in the logs) but I'm
pretty sure some jobs were killed and requeued.

parallel command:
parallel -n1 -j 80% --memfree 2G --joblog joblog.log --retries 3
./use-mem.py {} ::: {1..15}

joblog
Seq     Host    Starttime       JobRuntime      Send    Receive Exitval
Signal  Command
1       :       1487991228.329     134.463      0       17      0
0       ./use-mem.py 1
2       :       1487991228.339     136.082      0       17      0
0       ./use-mem.py 2
5       :       1487991228.368     136.065      0       17      0
0       ./use-mem.py 5
3       :       1487991228.349     138.911      0       17      0
0       ./use-mem.py 3
4       :       1487991228.358     145.251      0       17      0
0       ./use-mem.py 4
9       :       1487991362.468     256.723      0       17      0
0       ./use-mem.py 9
13      :       1487991364.479     255.554      0       18      0
0       ./use-mem.py 13
12      :       1487991364.433     255.689      0       18      0
0       ./use-mem.py 12
10      :       1487991362.489     259.118      0       18      0
0       ./use-mem.py 10
8       :       1487991362.314     263.891      0       17      0
0       ./use-mem.py 8
7       :       1487991362.383     299.328      0       17      0
0       ./use-mem.py 7
11      :       1487991618.786      43.341      0       18      0
0       ./use-mem.py 11
14      :       1487991620.233      42.884      0       18      0
0       ./use-mem.py 14
6       :       1487991618.888      52.900      0       17      0
0       ./use-mem.py 6
15      :       1487991658.350      25.601      0       18      0
0       ./use-mem.py 15

output
seq 1 / pid 3125
seq 2 / pid 3130
seq 5 / pid 3145
seq 3 / pid 3135
seq 4 / pid 3140
seq 9 / pid 4278
seq 13 / pid 4310
seq 12 / pid 4305
seq 10 / pid 4283
seq 8 / pid 4268
seq 7 / pid 4273
seq 11 / pid 4633
seq 14 / pid 4648
seq 6 / pid 4638
seq 15 / pid 4931

I would either clarify the documentation or implement the retry behavior
without --retries n.

Also, --retries doc is a bit confusing since it focuses on a remote use
case even though it clearly helps on a single machine in a memory
starvation context or any context where a job would be killed.

As a side note, it would be nice if the joblog gave information about
retried jobs.

By the way I love parallel! Been telling all my friends about it. Great
flexibility and interface!

--
Olivier Bilodeau

Attachment: use-mem.py
Description: Text Data

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]