[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120403075245.GD26826@gmail.com>
Date: Tue, 3 Apr 2012 09:52:45 +0200
From: Ingo Molnar <mingo@...nel.org>
To: "Chen, Dennis (SRDC SW)" <Dennis1.Chen@....com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"mingo@...hat.com" <mingo@...hat.com>
Subject: Re: semaphore and mutex in current Linux kernel (3.2.2)
* Chen, Dennis (SRDC SW) <Dennis1.Chen@....com> wrote:
> > Well, a way to reproduce that would be to find a lock_mutex
> > intense workload ('perf top -g', etc.), and then changing back
> > the underlying mutex to a semaphore, and measure the performance
> > of the two primitives.
>
> Why not the 'test-mutex' tool in the document?
>
> I guess it should be the private tool from you, if you can
> share it to me then I can help to make a new round performance
> check of the two primitives in the latest kernel... make a
> deal?
I think I posted it back then - but IIRC it was really just
open-coded mutex fastpath executed in user-space by a couple of
threads. To do that today you'd have to create it anew: just
copy the current mutex fast-path to user-space and measure it.
I'm not sure what the point of comparative measurements with
semaphores would be: for example we don't have per architecture
optimized semaphores anymore, we switched the legacy semaphores
to a generic version and are phasing them out.
Mutexes have various advantages (such as lockdep coverage and in
general tighter semantics that makes their usage more robust)
and we aren't going back to semaphores.
What would make a ton of sense would be to create a 'perf bench'
module that would use the kernel's mutex code and would measure
it in user-space. 'perf bench mem' already does a simplified
form of that: it measures the kernel's memcpy and memset
routines:
$ perf bench mem memcpy -r help
# Running mem/memcpy benchmark...
Unknown routine:help
Available routines...
default ... Default memcpy() provided by glibc
x86-64-unrolled ... unrolled memcpy() in arch/x86/lib/memcpy_64.S
x86-64-movsq ... movsq-based memcpy() in arch/x86/lib/memcpy_64.S
x86-64-movsb ... movsb-based memcpy() in arch/x86/lib/memcpy_64.S
$ perf bench mem memcpy -r x86-64-movsq
# Running mem/memcpy benchmark...
# Copying 1MB Bytes ...
2.229595 GB/Sec
10.850694 GB/Sec (with prefault)
$ perf bench mem memcpy -r x86-64-movsb
# Running mem/memcpy benchmark...
# Copying 1MB Bytes ...
2.055921 GB/Sec
2.447525 GB/Sec (with prefault)
So what could be done is to add something like:
perf bench locking mutex
Which would, similarly to tools/perf/bench/mem-memcpy-x86-64-asm.S
et al inline the mutex routines, would build user-space glue
around them and would allow them to be measured.
Such a new feature could then be used to improve mutex
performance in the future. Likewise:
perf bench locking spinlock
could be used to do something similar to spinlocks - measuring
them contended and uncontended, cache-cold and cache-hot, etc.
This would then be used by all future kernel developer
generations to improve these locking primitives - avoiding the
test-mutex kind of obscolescence and bitrot :-)
So if you'd be interested in writing that brand new benchmarking
feature and need help then let the perf people know.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists