linux-kernel - Re: lock's trace events can improve mutex's performance in userspace?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <n2k8d20b11a1004050214m529fa5e7ke31a2213728ea511@mail.gmail.com>
Date:	Mon, 5 Apr 2010 02:14:06 -0700
From:	Michel Lespinasse <walken@...gle.com>
To:	Hitoshi Mitake <mitake@....info.waseda.ac.jp>
Cc:	Xiao Guangrong <xiaoguangrong@...fujitsu.com>,
	Ingo Molnar <mingo@...e.hu>,
	Steven Rostedt <rostedt@...dmis.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Peter Zijlstra <peterz@...radead.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Ming Lei <tom.leiming@...il.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: lock's trace events can improve mutex's performance in userspace?

Sorry for the late reply...

One thing to consider in locking micro-benchmarks is that often, code
changes that slow down parts of the contended code path where the lock
is not held, will result in an increase of the reported
micro-benchmark metric. This effect is particularly marked for
micro-benchmarks that consist of multiple threads doing empty
acquire/release loops.

As a thought experiment, imagine what would happen if you added a
one-millisecond sleep in the contended code path for mutex
acquisition. Soon all but one of your benchmark threads would be
sleeping, and the only non-sleeping thread would be able to spin on
that lock/unlock loop with no contention, resulting in very nice
results for the micro-benchmark. Remove the sleep and the lock/unlock
threads will have to contend, resulting in lower reported performance
metrics.

I think what you're seeing with lockdep events is a smaller case of
this - enabling lockdep slows down the contended code path, but also
reduces the number of times that it is taken...

On Fri, Mar 12, 2010 at 8:27 PM, Hitoshi Mitake
<mitake@....info.waseda.ac.jp> wrote:
> On 03/11/10 18:43, Xiao Guangrong wrote:
>> We found that if enable lock's trace events, the 'sysbench mutex'
>> benchmark program can run quicker.
>>
>> The simple program that is attached can reproduce it, the system info,
>> kernel config, and the script are also attached.
>>
>> The test step is below:
>>
>> # tar -zxvf test-mutex.tar.bz
>> # cd test-mutex
>> # make
>> # ./tscript.sh>&  log
>> # cat log | grep "real"
>> real    0m46.765s<  all trace events are disabled>
>> real    0m47.073s
>> real    0m47.402s
>> real    0m46.458s
>> real    0m47.433s
>> real    0m47.395s
>> real    0m47.010s
>> real    0m47.454s
>> real    0m47.044s
>> real    0m47.464s
>> real    0m39.245s<  enable lock's trace events>
>> real    0m40.822s
>> real    0m40.779s
>> real    0m40.549s
>> real    0m40.605s
>> real    0m40.923s
>> real    0m40.560s
>> real    0m41.050s
>> real    0m40.757s
>> real    0m40.715s
>>
>> [ "<  ...>" is my comments ]
>>
>>  From the result, we can see the program's runtime is less if enable
> lock's
>> trace events.
>>
>> The conclusion is weird but i don't know why.
>
> Hi Xiao,
>
> It's hard to believe, but...
>
> % sudo ./tscript.sh &> log
> % grep real log
> real    0m24.132s
> real    0m23.535s
> real    0m20.064s
> real    0m16.636s <- enabled from here
> real    0m16.435s
> real    0m17.339s
>
> I could reproduce your surprising result.
> (I only execed your benchmark 3 times.)
>
> I rewrote your mainc.c and checked contended count of each test like
> this way,
>                if (pthread_mutex_trylock(&mutex) == EBUSY) {
>                        pthread_mutex_lock(&mutex);
>                        atomic_inc(&contended);
>                }
> # I'll attach my new mainc.c
>
> % cat log
> Run mutex with trace events disabled...
> contended:25191221
>
> real    0m24.132s
> user    0m17.149s
> sys     1m18.933s
> contended:25360563
>
> real    0m23.535s
> user    0m17.233s
> sys     1m16.213s
> contended:23813911
>
> real    0m20.064s
> user    0m15.561s
> sys     1m4.332s
> Run mutex with lockdep events enabled...
> contended:11458318
>
> real    0m16.636s
> user    0m10.173s
> sys     0m55.595s
> contended:11881095
>
> real    0m16.435s
> user    0m10.273s
> sys     0m54.911s
> contended:11261650
>
> real    0m17.339s
> user    0m10.225s
> sys     0m58.556s
>
> It seems that num of contention decreased to about half.
> I don't know why this happened and effect to performance of it,
> but this result is worth to consider.
>
> Thanks,
>        Hitoshi
>



-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/