[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <22bad750-ef5d-82a5-527c-5213346dd280@jv-coder.de>
Date: Mon, 14 Sep 2020 08:03:27 +0200
From: Joerg Vehlow <lkml@...coder.de>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: peterz@...radead.org, Steven Rostedt <rostedt@...dmis.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Huang Ying <ying.huang@...el.com>,
linux-kernel@...r.kernel.org,
Joerg Vehlow <joerg.vehlow@...-tech.de>
Subject: Re: [BUG RT] dump-capture kernel not executed for panic in interrupt
context
Hi Eric,
> What is this patch supposed to be doing?
>
> What bug is it fixing?
This information is part in the first message of this mail thread.
The patch was intendedfor the active discussion in this thread,
not for a broad review.
A short summary: In the rt kernel, a panic in an interrupt context does
not start the dump-capture kernel, because there is a mutex_trylock in
__crash_kexe. If this is called in interrupt context, it always fails.
In the non-rt kernel calling mutex_trylock is not allowed according to
the comment of the function, but it still works.
> A BUG_ON that triggers inside of BUG_ONs seems not just suspect but
> outright impossible to make use of.
I am not entirely sure what would happen here. But even if it gets in
some kind ofendless loop, I guess this is ok, because it allows finding
the problem. A piece of code in the function, that ensures the precondition
is a lot better than relying on only a comment.
If this was in mtex_trylock, the bug described above wouldn't have sneaked
in 12 years ago...
> I get the feeling skimming this that it is time to sort out and simplify
> the locking here, rather than make it more complex, and more likely to
> fail.
I would very much like that, but sadly it looks like it is not possible.
Either it wouldrequire blocking locks, that may fail, or not locking at
all, that may also fail.Using a different kind of lock (like spinlock)
is also not possible, becausespinlock_trylock again uses mutex_trylock
in the rt kernel.
>
> I get the feeling that over the years somehow the assumption that the
> rest of the kernel is broken and that we need to get out of the broken
> kernel as fast and as simply as possible has been lost.
Yes I also have the feeling, that the mutexes need fixing, but I wouldn't
to post any patch for that. At the moment, given the interface of the mutex,
this is clearly a bug in kexec, even if it works in the non-rt kernel.
Jörg
Powered by blists - more mailing lists