lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFzhv1v9KatxQ3GUGBZf+zTb_hdx2fJZKT+m5tEcAkWsSQ@mail.gmail.com>
Date:	Sat, 10 Aug 2013 11:51:40 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	"H. Peter Anvin" <hpa@...or.com>
Cc:	Mike Galbraith <bitbucket@...ine.de>,
	Andi Kleen <andi@...stfloor.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"the arch/x86 maintainers" <x86@...nel.org>,
	Ingo Molnar <mingo@...nel.org>
Subject: Re: Re-tune x86 uaccess code for PREEMPT_VOLUNTARY

On Sat, Aug 10, 2013 at 10:18 AM, H. Peter Anvin <hpa@...or.com> wrote:
>
> We could then play a really ugly stunt by marking NEED_RESCHED by adding
> 0x7fffffff to the counter.  Then the whole sequence becomes something like:
>
>         subl $1,%fs:preempt_count
>         jno 1f
>         call __naked_preempt_schedule   /* Or a trap */

This is indeed one of the few cases where we probably *could* use
trapv or something like that in theory, but those instructions tend to
be slow enough that even if you don't take the trap, you'd be better
off just testing by hand.

However, it's worse than you think. Preempt count is per-thread, not
per-cpu. So to access preempt-count, we currently have to look up
thread_info (which is per-cpu or stack-based).

I'd *like* to make preempt-count be per-cpu, and then copy it at
thread switch time, and it's been discussed. But as things are now,
preemption enable is quite expensive, and looks something like

        movq %gs:kernel_stack,%rdx      #, pfo_ret__
        subl    $1, -8124(%rdx) #, ti_22->preempt_count
        movq %gs:kernel_stack,%rdx      #, pfo_ret__
        movq    -8136(%rdx), %rdx       # MEM[(const long unsigned int
*)ti_27 + 16B], D.
        andl    $8, %edx        #, D.34545
        jne     .L139   #,

and that's actually the *good* case (ie not counting any extra costs
of turning leaf functions into non-leaf ones).

That "kernel_stack" thing is actually getting the thread_info pointer,
and it doesn't get cached because gcc thinks the preempt_count value
might alias. Sad, sad, sad. We actually used to do better back when we
used actual tricks with the stack registers and used a const inline
asm to let gcc know it could re-use the value etc.

It would be *lovely* if we
 (a) made preempt-count per-cpu and just copied it at thread-switch
 (b) made the NEED_RESCHED bit be part of preempt-count (rather than
thread flags) and just made it the high bit

adn then maybe we could just do

        subl    $1, %fs:preempt_count
        js .L139

with the actual schedule call being done as an

        asm volatile("call user_schedule": : :"memory");

that Andi introduced that doesn't pollute the register space. Note
that you still want the *test* to be done in C code, because together
with "unlikely()" you'd likely do pretty close to optimal code
generation, and hiding the decrement and test and conditional jump in
asm you wouldn't get the proper instruction scheduling and branch
following that gcc does.

I dunno. It looks like a fair amount of effort. But as things are now,
the code generation difference between PREEMPT_NONE and PREEMPT is
actually fairly noticeable. And PREEMPT_VOLUNTARY - which is supposed
to be almost as cheap as PREEMPT_NONE - has lots of bad cases too, as
Andi noticed.

                   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ