linux-kernel - Re: [RFC] lib/vsprintf.c: Even faster decimal conversion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87zj7l6sdo.fsf@rasmusvillemoes.dk>
Date:	Tue, 10 Mar 2015 11:47:47 +0100
From:	Rasmus Villemoes <linux@...musvillemoes.dk>
To:	Tejun Heo <tj@...nel.org>
Cc:	Joe Perches <joe@...ches.com>, linux-kernel@...r.kernel.org,
	"Peter Zijlstra \(Intel\)" <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC] lib/vsprintf.c: Even faster decimal conversion

On Thu, Mar 05 2015, Rasmus Villemoes <linux@...musvillemoes.dk> wrote:

> On Thu, Mar 05 2015, Tejun Heo <tj@...nel.org> wrote:
>
>> I'd like to see how this actually affects larger operations - sth
>> along the line of top consumes D% less CPU cycles w/ N processes - if
>> for nothing else, just to get the sense of scale,
>
> That makes sense. I'll see if I can get some reproducible numbers, but
> I'm afraid the effect drowns in all the syscall overhead. Which would be
> a valid argument against touching the code.

I wasn't able to come up with a way to measure the absolute %cpu
reliably enough (neither from top's own output or using something like
watch -n1 ps -p $toppid -o %cpu) - it fluctuates too much to see any
difference. But using perf I was able to get somewhat stable numbers,
which suggest an improvement in the 0.5-1.0% range [1]. Measured with
10000 [2] sleeping processes in an idle virtual machine (and on mostly
idle host), patch on top of 3.19.0. Extracting the functions involved in
the decimal conversion I get

new1.txt:     2.35%  top      [kernel.kallsyms]   [k] num_to_str                 
new2.txt:     2.70%  top      [kernel.kallsyms]   [k] num_to_str                 
old1.txt:     2.25%  top      [kernel.kallsyms]   [k] num_to_str                 
old2.txt:     2.18%  top      [kernel.kallsyms]   [k] num_to_str                 

new1.txt:     0.63%  top      [kernel.kallsyms]   [k] put_dec                    
new2.txt:     0.71%  top      [kernel.kallsyms]   [k] put_dec                    
old1.txt:     0.67%  top      [kernel.kallsyms]   [k] put_dec                    
old2.txt:     0.59%  top      [kernel.kallsyms]   [k] put_dec                    

new1.txt:     0.53%  top      [kernel.kallsyms]   [k] put_dec_full8              
new2.txt:     0.55%  top      [kernel.kallsyms]   [k] put_dec_full8              
old1.txt:     1.09%  top      [kernel.kallsyms]   [k] put_dec_full9              
old2.txt:     1.15%  top      [kernel.kallsyms]   [k] put_dec_full9              

new1.txt:     1.12%  top      [kernel.kallsyms]   [k] put_dec_trunc8             
new2.txt:     1.22%  top      [kernel.kallsyms]   [k] put_dec_trunc8             
old1.txt:     1.64%  top      [kernel.kallsyms]   [k] put_dec_trunc8             
old2.txt:     1.65%  top      [kernel.kallsyms]   [k] put_dec_trunc8             

I can't explain why num_to_str apparently becomes slightly slower (the
patch essentially didn't touch it), but the put_dec_ helpers in any case
make up for that.

If someone has a suggestion for a better way of measuring this I'm all
ears.

Thanks,
Rasmus

[1] in terms of #cycles

[2] numbers for 2000 and 5000 processes are quite similar.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/