lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 2 Feb 2022 15:29:32 +0000
From:   Mark Rutland <mark.rutland@....com>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        linux-arm-kernel@...ts.infradead.org, ardb@...nel.org,
        catalin.marinas@....com, juri.lelli@...hat.com,
        linux-kernel@...r.kernel.org, mingo@...hat.com, will@...nel.org
Subject: Re: [PATCH 5/6] sched/preempt: add PREEMPT_DYNAMIC using static keys

On Mon, Dec 13, 2021 at 11:05:01PM +0100, Frederic Weisbecker wrote:
> On Tue, Nov 09, 2021 at 05:24:07PM +0000, Mark Rutland wrote:
> > Where an architecture selects HAVE_STATIC_CALL but not
> > HAVE_STATIC_CALL_INLINE, each static call has an out-of-line trampoline
> > which will either branch to a callee or return to the caller.
> > 
> > On such architectures, a number of constraints can conspire to make
> > those trampolines more complicated and potentially less useful than we'd
> > like. For example:
> > 
> > * Hardware and software control flow integrity schemes can require the
> >   additition of "landing pad" instructions (e.g. `BTI` for arm64), which
> >   will also be present at the "real" callee.
> > 
> > * Limited branch ranges can require that trampolines generate or load an
> >   address into a registter and perform an indirect brach (or at least
> >   have a slow path that does so). This loses some of the benefits of
> >   having a direct branch.
> > 
> > * Interaction with SW CFI schemes can be complicated and fragile, e.g.
> >   requiring that we can recognise idiomatic codegen and remove
> >   indirections understand, at least until clang proves more helpful
> >   mechanisms for dealing with this.
> > 
> > For PREEMPT_DYNAMIC, we don't need the full power of static calls, as we
> > really only need to enable/disable specific preemption functions. We can
> > achieve the same effect without a number of the pain points above by
> > using static keys to fold early return cases into the preemption
> > functions themselves rather than in an out-of-line trampoline,
> > effectively inlining the trampoline into the start of the function.
> > 
> > For arm64, this results in good code generation, e.g. the
> > dynamic_cond_resched() wrapper looks as follows (with the first `B` being
> > replaced with a `NOP` when the function is disabled):
> > 
> > | <dynamic_cond_resched>:
> > |        bti     c
> > |        b       <dynamic_cond_resched+0x10>
> > |        mov     w0, #0x0                        // #0
> > |        ret
> > |        mrs     x0, sp_el0
> > |        ldr     x0, [x0, #8]
> > |        cbnz    x0, <dynamic_cond_resched+0x8>
> > |        paciasp
> > |        stp     x29, x30, [sp, #-16]!
> > |        mov     x29, sp
> > |        bl      <preempt_schedule_common>
> > |        mov     w0, #0x1                        // #1
> > |        ldp     x29, x30, [sp], #16
> > |        autiasp
> > |        ret
> > 
> > ... compared to the regular form of the function:
> > 
> > | <__cond_resched>:
> > |        bti     c
> > |        mrs     x0, sp_el0
> > |        ldr     x1, [x0, #8]
> > |        cbz     x1, <__cond_resched+0x18>
> > |        mov     w0, #0x0                        // #0
> > |        ret
> > |        paciasp
> > |        stp     x29, x30, [sp, #-16]!
> > |        mov     x29, sp
> > |        bl      <preempt_schedule_common>
> > |        mov     w0, #0x1                        // #1
> > |        ldp     x29, x30, [sp], #16
> > |        autiasp
> > |        ret
> > 
> > Any architecture which implements static keys should be able to use this
> > to implement PREEMPT_DYNAMIC with similar cost to non-inlined static
> > calls.
> > 
> > Signed-off-by: Mark Rutland <mark.rutland@....com>
> > Cc: Ard Biesheuvel <ardb@...nel.org>
> > Cc: Frederic Weisbecker <frederic@...nel.org>
> > Cc: Ingo Molnar <mingo@...hat.com>
> > Cc: Juri Lelli <juri.lelli@...hat.com>
> > Cc: Peter Zijlstra <peterz@...radead.org>
> 
> Anyone has an opinion on that? Can we do better on the arm64 static call side
> or should we resign ourself to using that static keys direction?

>From speaking with other arm64 folk, I think we're agreed that this is
preferable to implementing static calls (especially givne the pain points with
interaction with CFI).

I don't think it's fair to say we're "resigning outselves" to using static keys
-- this is vastly simpler to implement and maintain the static call approach,
should perform no worse than the form of static call trampolines that we'd have
to implement for static calls, and makes it easier for architectures to enable
PREEMPT_DYNAMIC, so it seems like an all-round win.

> Also I assume that, sooner or later, arm64 will eventually need a static call
> implementation....

I really hope not, becuase the current design of static calls (with arbitrary
targets) is not a great fit for arm64.

The only other major use for static keys on the arm64 side is for tracing
hooks, and that's *purely* to avoid the overhead that the current clang CFI
scheme imposes for modules. For that I'd rather fix the CFI scheme, because
that also interacts poorly with static calls to begin with...

Thanks,
Mark.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ