[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87zj7l6sdo.fsf@rasmusvillemoes.dk>
Date: Tue, 10 Mar 2015 11:47:47 +0100
From: Rasmus Villemoes <linux@...musvillemoes.dk>
To: Tejun Heo <tj@...nel.org>
Cc: Joe Perches <joe@...ches.com>, linux-kernel@...r.kernel.org,
"Peter Zijlstra \(Intel\)" <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC] lib/vsprintf.c: Even faster decimal conversion
On Thu, Mar 05 2015, Rasmus Villemoes <linux@...musvillemoes.dk> wrote:
> On Thu, Mar 05 2015, Tejun Heo <tj@...nel.org> wrote:
>
>> I'd like to see how this actually affects larger operations - sth
>> along the line of top consumes D% less CPU cycles w/ N processes - if
>> for nothing else, just to get the sense of scale,
>
> That makes sense. I'll see if I can get some reproducible numbers, but
> I'm afraid the effect drowns in all the syscall overhead. Which would be
> a valid argument against touching the code.
I wasn't able to come up with a way to measure the absolute %cpu
reliably enough (neither from top's own output or using something like
watch -n1 ps -p $toppid -o %cpu) - it fluctuates too much to see any
difference. But using perf I was able to get somewhat stable numbers,
which suggest an improvement in the 0.5-1.0% range [1]. Measured with
10000 [2] sleeping processes in an idle virtual machine (and on mostly
idle host), patch on top of 3.19.0. Extracting the functions involved in
the decimal conversion I get
new1.txt: 2.35% top [kernel.kallsyms] [k] num_to_str
new2.txt: 2.70% top [kernel.kallsyms] [k] num_to_str
old1.txt: 2.25% top [kernel.kallsyms] [k] num_to_str
old2.txt: 2.18% top [kernel.kallsyms] [k] num_to_str
new1.txt: 0.63% top [kernel.kallsyms] [k] put_dec
new2.txt: 0.71% top [kernel.kallsyms] [k] put_dec
old1.txt: 0.67% top [kernel.kallsyms] [k] put_dec
old2.txt: 0.59% top [kernel.kallsyms] [k] put_dec
new1.txt: 0.53% top [kernel.kallsyms] [k] put_dec_full8
new2.txt: 0.55% top [kernel.kallsyms] [k] put_dec_full8
old1.txt: 1.09% top [kernel.kallsyms] [k] put_dec_full9
old2.txt: 1.15% top [kernel.kallsyms] [k] put_dec_full9
new1.txt: 1.12% top [kernel.kallsyms] [k] put_dec_trunc8
new2.txt: 1.22% top [kernel.kallsyms] [k] put_dec_trunc8
old1.txt: 1.64% top [kernel.kallsyms] [k] put_dec_trunc8
old2.txt: 1.65% top [kernel.kallsyms] [k] put_dec_trunc8
I can't explain why num_to_str apparently becomes slightly slower (the
patch essentially didn't touch it), but the put_dec_ helpers in any case
make up for that.
If someone has a suggestion for a better way of measuring this I'm all
ears.
Thanks,
Rasmus
[1] in terms of #cycles
[2] numbers for 2000 and 5000 processes are quite similar.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists