linux-kernel - Re: [RFC][PATCH] spin loop arch primitives for busy waiting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 3 Apr 2017 17:43:05 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Nicholas Piggin <npiggin@...il.com>
Cc:     "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Anton Blanchard <anton@...ba.org>,
        linuxppc-dev <linuxppc-dev@...abs.org>
Subject: Re: [RFC][PATCH] spin loop arch primitives for busy waiting

On Mon, Apr 3, 2017 at 4:50 PM, Nicholas Piggin <npiggin@...il.com> wrote:
>
> POWER does not have an instruction like pause. We can only set current
> thread priority, and current implementations do something like allocate
> issue cycles to threads based on relative priorities. So there should
> be at least one or two issue cycles at low priority, but ideally we
> would not be changing priority in the busy-wait loop because it can
> impact other threads in the core.
>
> I couldn't think of a good way to improve cpu_relax. Our (open source)
> firmware has a cpu_relax, and it puts a bunch of nops between low and
> normal priority instructions so we get some fetch cycles at low prio.
> That isn't ideal though.
>
> If you have any ideas, I'd be open to them.

So the idea would be that maybe we can just make those things
explicit. IOW, instead of having that magical looping construct that
does other magical hidden things as part of the loop, maybe we can
just have a

   begin_cpu_relax();
   while (!cond)
       cpu_relax();
   end_cpu_relax();

and then architectures can decide how they implement it. So for x86,
the begin/end macros would be empty. For ppc, maybe begin/end would be
the "lower and raise priority", while cpu_relax() itself is an empty
thing.

Or maybe "begin" just clears a counter, while "cpu_relax()" does some
"increase iterations, and lower priority after X iterations", and then
"end" raises the priority again.

The "do magic having a special loop" approach disturbs me. I'd much
rather have more explicit hooks that allow people to do their own loop
semantics (including having a "return" to exit early).

But that depends on architectures having some pattern that we *can*
abstract. Would some "begin/in-loop/end" pattern like the above be
sufficient? The pure "in-loop" case we have now (ie "cpu_relax()"
clearly isn't sufficient.

I think s390 might have issues too, since they tried to have that
"cpu_relax_yield" thing (which is only used by stop_machine), and
they've tried cpu_relax_lowlatency() and other games.

                    Linus