lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFxZ2u+M72u3HSD7TVY2+WRRi27pYC=_4Wawr5y1m8DfnQ@mail.gmail.com>
Date:	Sat, 21 Dec 2013 12:27:57 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Peter Anvin <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Al Viro <viro@...iv.linux.org.uk>
Cc:	"the arch/x86 maintainers" <x86@...nel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: [RFC] speeding up the stat() family of system calls...

Here's both x86 people and filesystem people involved, because this
hacky RFC patch touches both.

NOTE NOTE NOTE! I've modified "cp_new_stat()" in place, in a way that
is x86-64 specific. So the attached patch *only* works on x86-64, and
will very actively break on anything else. That's intentional, because
that way it's more obvious how the code changes, but a real patch
would create a *new* cp_new_stat() for x86-64, and conditionalize the
existing generic "cp_new_stat()" on not already having an
architecture-optimized one.

Basically, under some filesystem loads, "stat()" and friends are the
most common ops (think tree traversal, but also things like git
verifying the index). And our "cp_new_stat()" function (which is the
common interface, ignoring 32-bit legacy stuff) is generic, but
actually pretty disgusting. It copies things to a temporary 'struct
stat' buffer on the kernel stack, and then uses copy_to_user() to copy
it to user space. The double copy is quite noticeable in profiles, and
it generates a big stack frame too.

By doing an architecture-specific cp_new_stat() function, we can
improve on that.

HOWEVER. On x86, doing an efficient field-at-a-time copy also requires
us to use put_user_try() and put_user_catch() in order to not have
tons of clac/stac instructions for the extended permission testing.
And the implementation of that was actually fairly non-optimal, so to
actually get the code I wanted, I had to change how that all worked
too, using "asm_volatile_goto()".

Thus both x86 and FS people on the list.

Comments? This would obviously be a 3.14 issue, I'm not suggesting
we'd do this now. I just want to lay the ground-work..

It's tested in the sense that "it works for me", and profiles look nice, but..

               Linus

Download attachment "vfs-stat-improvement" of type "application/octet-stream" (6155 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ