[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220511003331.clfvwfgpmbr5yx6n@treble>
Date: Tue, 10 May 2022 17:33:31 -0700
From: Josh Poimboeuf <jpoimboe@...nel.org>
To: Song Liu <songliubraving@...com>
Cc: Rik van Riel <riel@...com>, "song@...nel.org" <song@...nel.org>,
"joe.lawrence@...hat.com" <joe.lawrence@...hat.com>,
"peterz@...radead.org" <peterz@...radead.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
"pmladek@...e.com" <pmladek@...e.com>,
"live-patching@...r.kernel.org" <live-patching@...r.kernel.org>,
Kernel Team <Kernel-team@...com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"jpoimboe@...hat.com" <jpoimboe@...hat.com>
Subject: Re: [RFC] sched,livepatch: call klp_try_switch_task in __cond_resched
On Tue, May 10, 2022 at 11:57:04PM +0000, Song Liu wrote:
> > If it's a real bug, we should fix it everywhere, not just for Facebook.
> > Otherwise CONFIG_PREEMPT and/or non-x86 arches become second-class
> > citizens.
>
> I think "is it a real bug?" is the top question for me. So maybe we
> should take a step back.
>
> The behavior we see is: A busy kernel thread blocks klp transition
> for more than a minute. But the transition eventually succeeded after
> < 10 retries on most systems. The kernel thread is well-behaved, as
> it calls cond_resched() at a reasonable frequency, so this is not a
> deadlock.
>
> If I understand Petr correctly, this behavior is expected, and thus
> is not a bug or issue for the livepatch subsystem. This is different
> to our original expectation, but if this is what we agree on, we
> will look into ways to incorporate long wait time for patch
> transition in our automations.
That's how we've traditionally looked at it, though apparently Red Hat
and SUSE have implemented different ideas of what a long wait time is.
In practice, one minute has always been enough for all of kpatch's users
-- AFAIK, everybody except SUSE -- up until now.
That timeout is not set in stone, and can be extended if needed, either
with "kpatch load" waiting longer, or with it returning immediately and
instead reporting livepatch stalls via WARN().
The latter option is a bigger change in behavior, since any errors will
be delayed and reported in a different way, but if other users are ok
with that, it's fine with me.
Though, these options might be considered workarounds, as it's
theoretically possible for a kthread to be CPU-bound indefinitely,
beyond any arbitrarily chosen timeout. But maybe that's not realistic
beyond a certain timeout value of X and we don't care? I dunno.
> OTOH, if we think this is a suboptimal behavior (not necessarily a
> bug, IIUC), we should consider improve it. I personally don’t think
> shipping an improvement to one configuration makes other configurations
> second-class citizens. And, it is possible PREEMPT kernel does NOT
> even have such suboptimal behavior, right? (I really don't know.)
No, PREEMPT will have the same problem. Preempting a kthread doesn't
patch it unless the task happens to still be blocked during the next
invocation of klp_transition_work_fn().
> So, if we come back to the same question: is this a bug (or a suboptimal
> behavior that worth fixing)? If so, we are open to any solution that
> would also help PREEMPT and/or non-x86 arches.
You're obviously trying to fix a real world problem. Whether you call
that issue a bug, or "suboptimal behavior", it's still a problem.
As you said, the kthread is acting within the allowed parameters of how
a kthread should behave. But "kpatch load" will fail. That sounds
broken to me. We need to figure out a way to fix that for all
configs/arches.
> Lastly, maybe a really naive question: does the following also helps
> PREEMPT=y configurations?
As I have been trying to say, that won't work for PREEMPT+!ORC, because,
when the kthread gets preempted, the stack trace will be attempted from
an IRQ and will be reported as unreliable.
Ideally we'd have the ORC unwinder for all arches, that would make this
much easier. But we're not there yet.
--
Josh
Powered by blists - more mailing lists