lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 22 Aug 2020 19:49:28 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     peterz@...radead.org
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Joerg Vehlow <lkml@...coder.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Huang Ying <ying.huang@...el.com>,
        linux-kernel@...r.kernel.org,
        Joerg Vehlow <joerg.vehlow@...-tech.de>, dave@...olabs.net
Subject: Re: [BUG RT] dump-capture kernel not executed for panic in
 interrupt context

On Sat, 22 Aug 2020 14:32:52 +0200
peterz@...radead.org wrote:

> On Fri, Aug 21, 2020 at 05:03:34PM -0400, Steven Rostedt wrote:
> 
> > > Sigh.  Is it too hard to make mutex_trylock() usable from interrupt
> > > context?  
> > 
> > 
> > That's a question for Thomas and Peter Z.  
> 
> You should really know that too, the TL;DR answer is it's fundamentally
> buggered, can't work.

I knew there was an issue but I couldn't remember the reasoning, and
figured you could easily answer it without having to look back at the
code.

> 
> The problem is that RT relies on being able to PI boost the mutex owner.
> 
> ISTR we had a thread about all this last year or so, let me see if I can
> find that.
> 
> Here goes:
> 
>   https://lkml.kernel.org/r/20191218135047.GS2844@hirez.programming.kicks-ass.net

>From this email:

> The problem happens when that owner is the idle task, this can happen
> when the irq/softirq hits the idle task, in that case the contending
> mutex_lock() will try and PI boost the idle task, and that is a big
> no-no.

What's wrong with priority boosting the idle task? It's not obvious,
and I can't find comments in the code saying it would be bad.

I looked around the code to see if I could find "why this is bad" but
couldn't find it. There's lots of places that say "Do not use
mutex_trylock in interrupt context, the implementation is not safe to
do so" but I can't find where it says "why" it is not safe to do so.

The idle task is not mentioned at all in rtmutex.c and not mentioned in
kernel/locking except for some comments about RCU in lockdep.

I see that in the idle code the prio_change method does a BUG(), but
there's no comment to say why it does so.

The commit that added that BUG, doesn't explain why it can't happen:

a8941d7ec8167 ("sched: Simplify the idle scheduling class")

I may have once known the rationale behind all this, but it's been a
long time since I worked on the PI code, and it's out of my cache.


-- Steve

Powered by blists - more mailing lists