lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1376089460-5459-1-git-send-email-andi@firstfloor.org>
Date:	Fri,  9 Aug 2013 16:04:07 -0700
From:	Andi Kleen <andi@...stfloor.org>
To:	linux-kernel@...r.kernel.org
Cc:	x86@...nel.org, mingo@...nel.org, torvalds@...ux-foundation.org
Subject: Re-tune x86 uaccess code for PREEMPT_VOLUNTARY

The x86 user access functions (*_user) were originally very well tuned,
with partial inline code and other optimizations.

Then over time various new checks -- particularly the sleep checks for
a voluntary preempt kernel -- destroyed a lot of the tunings

A typical user access operation is now doing multiple useless
function calls. Also the without force inline gcc's inlining
policy makes it even worse, with adding more unnecessary calls.

Here's a typical example from ftrace:

     10)               |    might_fault() {
     10)               |      _cond_resched() {
     10)               |        should_resched() {
     10)               |          need_resched() {
     10)   0.063 us    |            test_ti_thread_flag();
     10)   0.643 us    |          }
     10)   1.238 us    |        }
     10)   1.845 us    |      }
     10)   2.438 us    |    }

So we spent 2.5us doing nothing (ok it's a bit less without
ftrace, but still pretty bad)

Then in other cases we would have an out of line function,
but would actually do the might_sleep() checks in the inlined
caller. This doesn't make any sense at all.

There were also a few other problems, for example the x86-64 uaccess
code regularly falls back to string functions, even though a simple
mov would be enough. For example every futex access to the lock
variable would actually use string instructions, even though 
it's just 4 bytes.

This patch kit is an attempt to get us back to sane code, 
mostly by doing proper inlining and doing sleep checks in the right
place. Unfortunately I had to add one tree sweep to avoid an nasty
include loop.

It costs a bit of text space, but I think it's worth it
(if only to keep my blood pressure down while reading ftrace logs...)

I haven't done any particular benchmarks, but important low level
functions just ought to be fast.

64bit:
13249492        1881328 1159168 16289988         f890c4 vmlinux-before-uaccess
13260877        1877232 1159168 16297277         f8ad3d vmlinux-uaccess
+ 11k, +0.08%

32bit:
11223248         899512 1916928 14039688         d63a88 vmlinux-before-uaccess
11230358         895416 1916928 14042702         d6464e vmlinux-uaccess
+ 7k, +0.06%

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ