lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210917132648.GG108031@montezuma.acc.umu.se>
Date:   Fri, 17 Sep 2021 15:26:48 +0200
From:   Anton Lundin <glance@....umu.se>
To:     Corey Minyard <minyard@....org>
Cc:     openipmi-developer@...ts.sourceforge.net,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [Openipmi-developer] Issue with panic handling and ipmi

On 17 September, 2021 - Corey Minyard wrote:

> On Fri, Sep 17, 2021 at 02:55:25PM +0200, Anton Lundin wrote:
> > On 17 September, 2021 - Corey Minyard wrote:
> > 
> > > On Fri, Sep 17, 2021 at 12:14:19PM +0200, Anton Lundin wrote:
> > > > On 16 September, 2021 - Corey Minyard wrote:
> > > > 
> > > > > On Thu, Sep 16, 2021 at 04:53:00PM +0200, Anton Lundin wrote:
> > > > > > Hi.
> > > > > > 
> > > > > > I've just done a upgrade of the kernel we're using in a product from
> > > > > > 4.19 to 5.10 and I noted a issue.
> > > > > > 
> > > > > > It started that with that we didn't get panic and oops dumps in our erst
> > > > > > backed pstore, and when debugging that I noted that the reboot on panic
> > > > > > timer didn't work either.
> > > > > > 
> > > > > > I've bisected it down to 2033f6858970 ("ipmi: Free receive messages when
> > > > > > in an oops").
> > > > > 
> > > > > Hmm.  Unfortunately removing that will break other things.  Can you try
> > > > > the following patch?  It's a good idea, in general, to do as little as
> > > > > possible in the panic path, this should cover a multitude of issues.
> > > > > 
> > > > > Thanks for the report.
> > > > > 
> > > > 
> > > > I'm sorry to report that the patch didn't solve the issue, and the
> > > > machine locked up in the panic path as before.
> > > 
> > > I missed something.  Can you try the following?  If this doesn't work,
> > > I'm going to have to figure out how to reproduce this.
> > > 
> > 
> > Sorry, still no joy.
> > 
> > My guess is that there is something locking up due to these Supermicro
> > machines have their ERST memory backed by the BMC, and the same BMC is
> > is the other end of all the ipmi communications.
> > 
> > I've reproduced this on Server/X11SCZ-F and Server/H11SSL-i but I'm
> > guessing it can be reproduced on most, if not all, of their hardware
> > with the same setup.
> > 
> > We're using the ERST backend for pstore, because we're still
> > bios-booting them and don't have efi services available to use as pstore
> > backend.
> > 
> > 
> > I've tested to just yank out the ipmi modules from the kernel and that
> > fixes the panic timer and we get crash dumps to pstore.
> 
> Dang.  I'm going to have to look deeper at what that could change to
> cause an issue like this.  Are you using the IPMI watchdog?  Do you have
> CONFIG_IPMI_PANIC_EVENT=y set in the config?

# CONFIG_IPMI_PANIC_EVENT is not set

We're using the IPMI watchdog.

//Anton

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ