[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220511092433.GA26047@pathway.suse.cz>
Date: Wed, 11 May 2022 11:24:33 +0200
From: Petr Mladek <pmladek@...e.com>
To: Josh Poimboeuf <jpoimboe@...nel.org>
Cc: Song Liu <songliubraving@...com>, Rik van Riel <riel@...com>,
"song@...nel.org" <song@...nel.org>,
"joe.lawrence@...hat.com" <joe.lawrence@...hat.com>,
"peterz@...radead.org" <peterz@...radead.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
"live-patching@...r.kernel.org" <live-patching@...r.kernel.org>,
Kernel Team <Kernel-team@...com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"jpoimboe@...hat.com" <jpoimboe@...hat.com>
Subject: Re: [RFC] sched,livepatch: call klp_try_switch_task in __cond_resched
On Tue 2022-05-10 17:33:31, Josh Poimboeuf wrote:
> On Tue, May 10, 2022 at 11:57:04PM +0000, Song Liu wrote:
> > > If it's a real bug, we should fix it everywhere, not just for Facebook.
> > > Otherwise CONFIG_PREEMPT and/or non-x86 arches become second-class
> > > citizens.
> >
> > I think "is it a real bug?" is the top question for me. So maybe we
> > should take a step back.
> >
> > The behavior we see is: A busy kernel thread blocks klp transition
> > for more than a minute. But the transition eventually succeeded after
> > < 10 retries on most systems. The kernel thread is well-behaved, as
> > it calls cond_resched() at a reasonable frequency, so this is not a
> > deadlock.
> >
> > If I understand Petr correctly, this behavior is expected, and thus
> > is not a bug or issue for the livepatch subsystem. This is different
> > to our original expectation, but if this is what we agree on, we
> > will look into ways to incorporate long wait time for patch
> > transition in our automations.
>
> That's how we've traditionally looked at it, though apparently Red Hat
> and SUSE have implemented different ideas of what a long wait time is.
>
> In practice, one minute has always been enough for all of kpatch's users
> -- AFAIK, everybody except SUSE -- up until now.
I am actually surprised that nobody met the problem yet. There are
"only" 60 attempts to transition the pending tasks.
Well, the problem is mainly with kthreads. User space processes are
migrated also on the kernel boundary. And the fake signal is likely
pretty effective here. And it probably is not that common that
a kthread would occupy a single CPU all the time.
> Though, these options might be considered workarounds, as it's
> theoretically possible for a kthread to be CPU-bound indefinitely,
> beyond any arbitrarily chosen timeout. But maybe that's not realistic
> beyond a certain timeout value of X and we don't care? I dunno.
I agree that it might happen theoretically. And it would be great
to be prepared for this. My only concern is the complexity and risk.
We should know that it is worth it.
> As I have been trying to say, that won't work for PREEMPT+!ORC, because,
> when the kthread gets preempted, the stack trace will be attempted from
> an IRQ and will be reported as unreliable.
This limits the range of possible solutions quite a lot. But it is
how it is.
> Ideally we'd have the ORC unwinder for all arches, that would make this
> much easier. But we're not there yet.
The alternative solution is that the process has to migrate itself
on some safe location.
One crazy idea. It still might be possible to find the called
functions on the stack even when it is not reliable. Then it
might be possible to add another ftrace handler on
these found functions. This other ftrace handler might migrate
the task when it calls this function again.
It assumes that the task will call the same functions again
and again. Also it might require that the tasks checks its
own stack from the ftrace handler. I am not sure if this
is possible.
There might be other variants of this approach.
Best Regards,
Petr
Powered by blists - more mailing lists