linux-kernel - Re: [PATCH,RFC] smp,csd: throw an error if a CSD lock is stuck for too long

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <925761974f452b3f7afa98f96cf6762dc8d89dba.camel@surriel.com>
Date:   Wed, 13 Sep 2023 16:17:34 -0400
From:   Rik van Riel <riel@...riel.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, kernel-team@...a.com,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Valentin Schneider <vschneid@...hat.com>,
        Juergen Gross <jgross@...e.com>
Subject: Re: [PATCH,RFC] smp,csd: throw an error if a CSD lock is stuck for
 too long

On Wed, 2023-09-13 at 18:17 +0200, Peter Zijlstra wrote:
> On Wed, Sep 13, 2023 at 10:33:51AM -0400, Rik van Riel wrote:
> > > 
> > It's more fun than that. We're seeing this on bare metal.
> 
> Oh, 'fun' indeed, *groan*.
> 
> > Unfortunately, when a system gets wedged that way currently,
> > it ends up being power cycled automatically, and we aren't
> > getting crash dumps with clues on what causes the issue.
> > 
> > Doing a BUG_ON() + panic, followed by a kexec into the kdump
> > kernel will hopefully give us some clues on what might be
> > causing the issue.
> 
> I'm conflicted on the need to push such a debug patch upstream, otoh.
> given the amount of debug code already in csd, why not.
> 
> But yeah, curious hear what comes out of this.
> 
Oh, there's more to it than just debugging the issue.

This will also help recover systems faster, since they
will end up panicking, kdumping, and rebooting, faster
than the "hey, that system looks like it's stuck"
power cycling scripts can get to it.

-- 
All Rights Reversed.