[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <920d4364-52e3-4aaa-9026-0755b65dbf11@paulmck-laptop>
Date: Tue, 19 Mar 2024 16:33:18 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Mark Rutland <mark.rutland@....com>
Cc: Steven Rostedt <rostedt@...dmis.org>,
Ankur Arora <ankur.a.arora@...cle.com>,
linux-kernel@...r.kernel.org, tglx@...utronix.de,
peterz@...radead.org, torvalds@...ux-foundation.org,
akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
willy@...radead.org, mgorman@...e.de, jpoimboe@...nel.org,
jgross@...e.com, andrew.cooper3@...rix.com, bristot@...nel.org,
mathieu.desnoyers@...icios.com, glaubitz@...sik.fu-berlin.de,
anton.ivanov@...bridgegreys.com, mattst88@...il.com,
krypton@...ich-teichert.org, David.Laight@...lab.com,
richard@....at, jon.grimm@....com, bharata@....com,
boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: Tasks RCU, ftrace, and trampolines (was: Re: [PATCH 00/30]
PREEMPT_AUTO: support lazy rescheduling)
On Tue, Mar 19, 2024 at 11:45:15AM +0000, Mark Rutland wrote:
> Hi Paul,
>
> On Fri, Mar 01, 2024 at 05:16:33PM -0800, Paul E. McKenney wrote:
> > The networking NAPI code ends up needing special help to avoid starving
> > Tasks RCU grace periods [1]. I am therefore revisiting trying to make
> > Tasks RCU directly detect trampoline usage, but without quite as much
> > need to identify specific trampolines...
> >
> > I am putting this information in a Google document for future
> > reference [2].
> >
> > Thoughts?
>
> Sorry for the long delay! I've been looking into this general area over the
> last couple of weeks due to the latent bugs I mentioned in:
>
> https://lore.kernel.org/lkml/Zenx_Q0UiwMbSAdP@FVFF77S0Q05N/
>
> I was somewhat hoping that staring at the code for long enough would result in
> an ephinany (and a nice simple-to-backport solution for the latent issues), but
> so far that has eluded me.
>
> I believe some of those cases will need to use synchronize_rcu_tasks() and we
> might be able to make some structural changes to minimize the number of times
> we'd need to synchronize (e.g. having static ftrace call ops->func from the ops
> pointer, so we can switch ops+func atomically), but those look pretty invasive
> so far.
>
> I haven't been able to come up with "a precise and completely reliable way to
> determine whether the current preemption occurred within a trampoline". Since
> preemption might occur within a trampoline's callee that eventually returns
> back to the trampoline, I believe that'll either depend on having a reliable
> stacktrace or requiring the trampoline to dynamically register/unregister
> somewhere around calling other functions. That, and we do also care about those
> callees themselves, and it's not just about the trampolines...
>
> On arm64, we kinda have "permanent trampolines", as our
> DYNAMIC_FTRACE_WILL_CALL_OPS implementation uses a common trampoline. However,
> that will tail-call direct functions (and those could also be directly called
> from ftrace callsites), so we don't have a good way of handling those without a
> change to the direct func calling convention.
>
> I assume that permanent trampolines wouldn't be an option on architectures
> where trampolines are a spectre mitigation.
Thank you for checking! I placed a pointer to this email in the document
and updated the relevant sections accordingly.
> Mark.
>
> > Thanx, Paul
> >
> > [1] https://lore.kernel.org/all/Zd4DXTyCf17lcTfq@debian.debian/
> > [2] https://docs.google.com/document/d/1kZY6AX-AHRIyYQsvUX6WJxS1LsDK4JA2CHuBnpkrR_U/edit?usp=sharing
Powered by blists - more mailing lists