coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Multithreaded sort hangs on Solaris


From: McFarland, Jeffrey
Subject: Multithreaded sort hangs on Solaris
Date: Mon, 11 Mar 2013 15:47:58 +0000

I have come across some odd results regarding the sort utility in coreutils version 8.20.  I’ve looked through the archives and don’t see any similar issues so it may be something specific to our systems.

 

System:  SunOS 5.10 Generic_147440-26 sun4u sparc SUNW,Sun-Fire-V890

 

Issue:  When running sort on a 22.5 GB file I found that about 1 out of 10 times the process seems to hang (out of 100+ tests).  The process is still running but the temp files are no longer changing and the final file either has not been created or is a 0 byte file.  When this happens the temp files are never in the exact same state as a previous run.  On this machine a complete sort normally takes about 20 minutes.  On one occasion the process hung for over 48 hours before I killed it.  Running top shows no significant load on the system. 

 

Command run: 

./sort -t\n -S 256M --batch-size=100 -T /disk/craiwk01/prod/SORTWK -T /disk/craiwk02/prod/SORTWK -T /disk/craiwk03/prod/SORTWK -T /disk/craiwk04/prod/SORTWK -T /disk/craiwk06/prod/SORTWK -k1.1,1.10 infile -o infile.sorted

 

>: ps

   PID TTY         TIME CMD

16328 pts/3       5:06 sort

        12697 pts/3       0:00 ps

 

>: sudo truss -rall -wall -f -p 16328

16328:  lwp_park(0x00000000, 0)         (sleeping...)

 

>: sudo pstack 16328

16328:  /usr/local/abacus/etsort/sort -tn -S 295063 --batch-size=100 -T /disk/

-----------------  lwp# 1 / thread# 1  --------------------

ffffffff7d4d8818 lwp_park (0, 0, 0)

0000000100009c74 sortlines (111b56580, 111c56080, ffffffff7fffeab0, 10012a321, ffffffff7fffead0, 10012a328) + 514

000000010000a5cc sortlines (111558380, 2, ffffffff7fffeab0, 1121765e0, 0, ffffffff7fffeab0) + e6c

000000010000a5cc sortlines (111956f80, 4, ffffffff7fffeab0, 112176420, 0, ffffffff7fffeab0) + e6c

000000010000a5cc sortlines (112154760, 8, ffffffff7fffeab0, 1121760a0, 1, ffffffff7fffeab0) + e6c

000000010000c070 sort (10012a740, 0, ffffffff7fffead0, 23, 10012cddd, 112154760) + 350

000000010000e6e8 main (13, ffffffff7ffff148, 0, 10012c220, fffd, 10012b1e0) + 1ee8

00000001000041bc _start (0, 0, 0, 0, 0, 0) + 7c

-----------------  lwp# 240 / thread# 240  --------------------

000000010000a600 sortlines_thread(), exit value = 0x0000000000000000

        ** zombie (exited, not detached, not yet joined) **

-----------------  lwp# 241 / thread# 241  --------------------

000000010000a600 sortlines_thread(), exit value = 0x0000000000000000

        ** zombie (exited, not detached, not yet joined) **

-----------------  lwp# 242 / thread# 242  --------------------

000000010000a600 sortlines_thread(), exit value = 0x0000000000000000

        ** zombie (exited, not detached, not yet joined) **

 

If I change the sort to run as a single threaded process (add “--parallel=1” to above command) then it doesn’t hang.  This makes me think that it’s most likely a threading issue.  I ran the same tests on a LINUX machine and it did not have the same hanging issue so it’s most likely only an issue with Solaris. 

 

I initially found this issue using coreutils 8.9 and I changed to 8.20 to see if a fix had been made but no luck.

 

Is this a known issue?  Are there any additional tests I should run to further narrow down this issue?

 

Thanks,

 

Jeff




This e-mail and files transmitted with it are confidential, and are intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you are not one of the named recipient(s) or otherwise have reason to believe that you received this message in error, please immediately notify sender by e-mail, and destroy the original message. Thank You.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]