[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] libports: implement lockless management of threads
From: |
Justus Winter |
Subject: |
Re: [PATCH] libports: implement lockless management of threads |
Date: |
Tue, 12 Nov 2013 16:53:13 +0100 |
User-agent: |
alot/0.3.4 |
Quoting Neal H. Walfield (2013-11-11 22:02:46)
> Yes, this is what I was thinking of.
Awesome :)
> I recall there being type defs for appropriate atomic types. If that
> is still the recommended approach, please update your patch
> appropriately.
Right. I knew next to nothing about the gcc atomic builtins, so I read
up on them in the gcc docs and wiki. I'll briefly document my findings.
According to [0], the __atomic* functions should be preferred over the
__sync* ones.
0: http://gcc.gnu.org/wiki/Atomic/GCCMM
According to [1], "GCC allows any integral scalar or pointer type that
is 1, 2, 4, or 8 bytes in length."
1: http://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
> The most important thing, however, is ensuring that the semantics are
> preserved.
The __atomic* functions allows one to specify a memory model. I chose
__ATOMIC_RELAXED. This model does not impose any happens-before
relation on the events, but only atomic access to the variable.
2: http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync
> That is, was the use of the values also protected by the lock?
It was. I believe that is okay though, as the atomic operations will
ensure the consistency of the values retrieved by the different
threads, e.g. only one thread will see NREQTHREADS (tc) decremented to
zero and stick around, while the others may exit.
> Does moving to atomic updates introduce a possible inconsistency?
>
> I haven't looked at the code. Before this is checked in, however,
> someone should.
Yes please :)
I'm carefully optimistic about this approach and the patch. I built a
Hurd package with this patch and I am currently using it on almost all
of my Hurd machines. They do seem to perform as expected.
The patch actually improves the performance of one micro benchmark I
run. The benchmark pipes(2) data from /dev/zero to /dev/null and
measures the throughput. If the data is piped byte-wise, then this is
an measurement of the rpc overhead. The general form of the test is
dd if=/dev/zero bs=$bs count=$count 2>/dev/null \
| pipebench -q >/dev/null
(pipebench is from the pipebench Debian package.)
Here are the results on my VIA Epia (1.3GHz iirc) box I mentioned
earlier [3]:
3: http://lists.debian.org/debian-hurd/2013/10/msg00069.html
--- without-lockless-benchmark.1384259790.log 2013-11-12 13:43:36.000000000
+0100
+++ with-lockless-benchmark.1384264893.log 2013-11-12 15:09:39.000000000
+0100
@@ -2,31 +2,31 @@
Testing pipe throughput, blocksize 1 count 64k...
Summary:
-Piped 64.00 kB in 00h00m10.63s: 6.02 kB/second
+Piped 64.00 kB in 00h00m06.76s: 9.46 kB/second
These are the results of the test on my Hurd installation running on
kvm on an 'Intel(R) Core(TM)2 Duo CPU L7500 @ 1.60GHz'. I extended the
test so that it creates more processes piping the data in parallel:
--- without-lockless-benchmark.1384264129.log 2013-11-12
14:57:13.000000000 +0100
+++ with-lockless-benchmark.1384266387.log 2013-11-12 15:31:17.000000000
+0100
@@ -3,16 +3,16 @@
Stopping MTA: exim4_listener.
Testing pipe throughput, blocksize 1 count 64k...
Summary:
-Piped 64.00 kB in 00h00m30.81s: 2.07 kB/second
+Piped 64.00 kB in 00h00m26.26s: 2.43 kB/second
Testing pipe throughput, blocksize 1 count 64k, 2 processes...
Summary:
-Piped 64.00 kB in 00h00m45.11s: 1.41 kB/second
+Piped 64.00 kB in 00h00m41.79s: 1.53 kB/second
Waiting for children...
So the situation seems to improve here as well. It strikes me as
strange though, that the VIA embedded system running on bare metal
would outperform the kvm installation on the faster Intel cpu.
Testing pipe throughput, blocksize 1 count 64k, 8 processes...
Summary:
-Piped 64.00 kB in 00h01m57.18s: 559.00 B/second
+Piped 64.00 kB in 00h02m12.56s: 494.00 B/second
Waiting for children...
Testing pipe throughput, blocksize 1 count 64k, 32 processes...
Summary:
-Piped 64.00 kB in 00h04m27.28s: 245.00 B/second
+Piped 64.00 kB in 00h01m00.39s: 1.05 kB/second
Waiting for children...
I believe this to be an artifact of my testing methodology. I start
several dd processes, and some are finished way earlier than others. I
guess that is to be expected in the absence of any fairness
constraints in msg/task scheduling decisions. But I don't know tbh.
Cheers,
Justus
- [PATCH] libports: fix the thread counts in case the thread creation fails, Justus Winter, 2013/11/09
- Re: [PATCH] libports: fix the thread counts in case the thread creation fails, Samuel Thibault, 2013/11/09
- Re: [PATCH] libports: fix the thread counts in case the thread creation fails, Neal H. Walfield, 2013/11/10
- Re: [PATCH] libports: fix the thread counts in case the thread creation fails, Samuel Thibault, 2013/11/10
- Re: [PATCH] libports: fix the thread counts in case the thread creation fails, Neal H. Walfield, 2013/11/10
- lockless thread management in libports (was: [PATCH] libports: fix the thread counts in case the thread creation fails), Justus Winter, 2013/11/11
- [PATCH] libports: implement lockless management of threads, Justus Winter, 2013/11/11
- Re: [PATCH] libports: implement lockless management of threads, Neal H. Walfield, 2013/11/11
- Re: [PATCH] libports: implement lockless management of threads,
Justus Winter <=
- Re: [PATCH] libports: implement lockless management of threads, Samuel Thibault, 2013/11/12
- Re: [PATCH] libports: implement lockless management of threads, Emilio Pozuelo Monfort, 2013/11/13
- Re: [PATCH] libports: implement lockless management of threads, Justus Winter, 2013/11/15
- [PATCH] libports: implement lockless management of threads, Justus Winter, 2013/11/15
- Re: [PATCH] libports: implement lockless management of threads, Samuel Thibault, 2013/11/16
- Re: [PATCH] libports: implement lockless management of threads, Justus Winter, 2013/11/16
- Re: [PATCH] libports: implement lockless management of threads, Richard Braun, 2013/11/16