linux-kernel - Re: [RFC] speeding up the stat() family of system calls...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFxpWpAMWC218n8czLCo=+V=ykmKWtpaBzLRJijET31QPA@mail.gmail.com>
Date:	Sat, 21 Dec 2013 16:11:32 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	John Stoffel <john@...ffel.org>
Cc:	Peter Anvin <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Al Viro <viro@...iv.linux.org.uk>,
	"the arch/x86 maintainers" <x86@...nel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] speeding up the stat() family of system calls...

On Sat, Dec 21, 2013 at 2:54 PM, John Stoffel <john@...ffel.org> wrote:
>
> Any numbers of how much better this is?  I'm travelling
> tomorrow, so I won't have time to spin up a VM and play, though it's
> tempting to do so.

On most _real_ loads, the kernel footprint is pretty small, and
obviously if you do IO or other stuff, the cost of some wasted CPU
cycles in stat() is negligible. And even if you loop doing normal
'stat()' (and everything is cached), over a filesystem, the actual
path walk itself is still dominant, although at that point the CPU
overhead of the stat data copy starts to at least show up in profiles.
Showing up in profiles is why I started looking at it.

And then, for the extreme case, my test program went from 0.92s to
0.80s. But that stupid test program just does ten million fstat(0,&st)
calls, in order to show the highest cp_new_stat() CPU overhead (with
minimal cache footprint). So at that point it's about a 13%
difference, when the only other overhead really is just the system
call itself. The profile went from

  10.42%  a.out  [k] copy_user_enhanced_fast_string
   5.91%  a.out  [k] cp_new_stat

(the "copy_user_enhanced_fast_string" is the "rep movsb" that copies
things to user space) to

   6.69%  a.out  [k] cp_new_stat

(and here there is no separate user-copy, since it's all done directly
inside the optimized cp_new_stat).

It's worth pointing out that the 5.91% -> 6.69% profile change of
cp_new_stat() is *not* because cp_new_stat() got slower: it's simply a
direct result of the 13% performance improvements - 13% of the cycles
went away, and so 5.91% becomes 6.69%.

But the above 13% is really the extreme case with hot caches etc.
Normally it's not really noticeable in any bigger picture load. For
example, on a big "git diff", the cp_new_stat() function shows up at
0.63% of the load, so making it faster isn't going to make any very
noticeable difference.

The reason I care really is that that stat overhead always does show
up on the kinds of profiles I care about, even if it's never all that
high. In other words, it's more of an annoyance over how stupid that
code was than a huge performance increase.

                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/