[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231109234736.4kik62ys47ey23ju@treble>
Date: Thu, 9 Nov 2023 15:47:36 -0800
From: Josh Poimboeuf <jpoimboe@...nel.org>
To: Ankur Arora <ankur.a.arora@...cle.com>
Cc: Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
tglx@...utronix.de, peterz@...radead.org,
torvalds@...ux-foundation.org, paulmck@...nel.org,
linux-mm@...ck.org, x86@...nel.org, akpm@...ux-foundation.org,
luto@...nel.org, bp@...en8.de, dave.hansen@...ux.intel.com,
hpa@...or.com, mingo@...hat.com, juri.lelli@...hat.com,
vincent.guittot@...aro.org, willy@...radead.org, mgorman@...e.de,
jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
jgross@...e.com, andrew.cooper3@...rix.com, mingo@...nel.org,
bristot@...nel.org, mathieu.desnoyers@...icios.com,
geert@...ux-m68k.org, glaubitz@...sik.fu-berlin.de,
anton.ivanov@...bridgegreys.com, mattst88@...il.com,
krypton@...ich-teichert.org, David.Laight@...LAB.COM,
richard@....at, mjguzik@...il.com, Jiri Kosina <jikos@...nel.org>,
Miroslav Benes <mbenes@...e.cz>,
Petr Mladek <pmladek@...e.com>,
Joe Lawrence <joe.lawrence@...hat.com>,
live-patching@...r.kernel.org
Subject: Re: [RFC PATCH 07/86] Revert "livepatch,sched: Add livepatch task
switching to cond_resched()"
On Thu, Nov 09, 2023 at 02:50:48PM -0800, Ankur Arora wrote:
> >> I guess I'm not fully understanding what the cond rescheds are for. But
> >> would an IPI to all CPUs setting NEED_RESCHED, fix it?
>
> Yeah. We could just temporarily toggle to full preemption, when
> NEED_RESCHED_LAZY is always upgraded to NEED_RESCHED which will
> then send IPIs.
>
> > If all livepatch arches had the ORC unwinder, yes.
> >
> > The problem is that frame pointer (and similar) unwinders can't reliably
> > unwind past an interrupt frame.
>
> Ah, I wonder if we could just disable the preempt_schedule_irq() path
> temporarily? Hooking into schedule() alongside something like this:
>
> @@ -379,7 +379,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
>
> void irqentry_exit_cond_resched(void)
> {
> - if (!preempt_count()) {
> + if (klp_cond_resched_disable() && !preempt_count()) {
>
> The problem would be tasks that don't go through any preemptible
> sections.
Let me back up a bit and explain what klp is trying to do.
When a livepatch is applied, klp needs to unwind all the tasks,
preferably within a reasonable amount of time.
We can't unwind task A from task B while task A is running, since task A
could be changing the stack during the unwind. So task A needs to be
blocked or asleep. The only exception to that is if the unwind happens
in the context of task A itself.
The problem we were seeing was CPU-bound kthreads (e.g., vhost_worker)
not getting patched within a reasonable amount of time. We fixed it by
hooking the klp unwind into cond_resched() so it can unwind from the
task itself.
It only worked because we had a non-preempted hook (because non-ORC
unwinders can't unwind reliably through preemption) which called klp to
unwind from the context of the task.
Without something to hook into, we have a problem. We could of course
hook into schedule(), but if the kthread never calls schedule() from a
non-preempted context then it still doesn't help.
--
Josh
Powered by blists - more mailing lists