lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210917125525.GF108031@montezuma.acc.umu.se>
Date:   Fri, 17 Sep 2021 14:55:25 +0200
From:   Anton Lundin <glance@....umu.se>
To:     Corey Minyard <minyard@....org>
Cc:     openipmi-developer@...ts.sourceforge.net,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: Issue with panic handling and ipmi

On 17 September, 2021 - Corey Minyard wrote:

> On Fri, Sep 17, 2021 at 12:14:19PM +0200, Anton Lundin wrote:
> > On 16 September, 2021 - Corey Minyard wrote:
> > 
> > > On Thu, Sep 16, 2021 at 04:53:00PM +0200, Anton Lundin wrote:
> > > > Hi.
> > > > 
> > > > I've just done a upgrade of the kernel we're using in a product from
> > > > 4.19 to 5.10 and I noted a issue.
> > > > 
> > > > It started that with that we didn't get panic and oops dumps in our erst
> > > > backed pstore, and when debugging that I noted that the reboot on panic
> > > > timer didn't work either.
> > > > 
> > > > I've bisected it down to 2033f6858970 ("ipmi: Free receive messages when
> > > > in an oops").
> > > 
> > > Hmm.  Unfortunately removing that will break other things.  Can you try
> > > the following patch?  It's a good idea, in general, to do as little as
> > > possible in the panic path, this should cover a multitude of issues.
> > > 
> > > Thanks for the report.
> > > 
> > 
> > I'm sorry to report that the patch didn't solve the issue, and the
> > machine locked up in the panic path as before.
> 
> I missed something.  Can you try the following?  If this doesn't work,
> I'm going to have to figure out how to reproduce this.
> 

Sorry, still no joy.

My guess is that there is something locking up due to these Supermicro
machines have their ERST memory backed by the BMC, and the same BMC is
is the other end of all the ipmi communications.

I've reproduced this on Server/X11SCZ-F and Server/H11SSL-i but I'm
guessing it can be reproduced on most, if not all, of their hardware
with the same setup.

We're using the ERST backend for pstore, because we're still
bios-booting them and don't have efi services available to use as pstore
backend.


I've tested to just yank out the ipmi modules from the kernel and that
fixes the panic timer and we get crash dumps to pstore.

//Anton

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ