[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aKNDVFTI-UZeNq0Y@CMGLRV3>
Date: Mon, 18 Aug 2025 10:14:28 -0500
From: Frederick Lawler <fred@...udflare.com>
To: Corey Minyard <corey@...yard.net>
Cc: openipmi-developer@...ts.sourceforge.net, linux-kernel@...r.kernel.org,
kernel-team@...udflare.com
Subject: Re: [RFC] Patches to disable messages during BMC reset
On Fri, Aug 15, 2025 at 08:56:33PM -0500, Corey Minyard wrote:
> On Fri, Aug 15, 2025 at 04:23:08PM -0500, Frederick Lawler wrote:
> > Hi Corey,
> >
> > On Thu, Aug 07, 2025 at 06:02:31PM -0500, Corey Minyard wrote:
> > > I went ahead and did some patches for this, since it was on my mind.
> > >
> > > With these, if a reset is sent to the BMC, the driver will disable
> > > messages to the BMC for a time, defaulting to 30 seconds. Don't
> > > modify message timing, since no messages are allowed, anyway.
> > >
> > > If a firmware update command is sent to the BMC, then just reject
> > > sysfs commands that query the BMC. Modify message timing and
> > > allow direct messages through the driver interface.
> > >
> > > Hopefully this will work around the problem, and it's a good idea,
> > > anyway.
> > >
> > > -corey
> > >
> >
> > Thanks for the patches, and sorry for the delay in response.
> > It's one of _those weeks_. Anyway, I backported the patch series
> > to 6.12, and the changes seem reasonable to me overall. Ran it
> > through our infra on a single node, and nothing seemed to break.
> >
> > I did observe with testing that resetting BMC via ipmitool on the host
> > did kick out sysfs reads as expected.
>
> Ok, I took the liberty of adding a "Tested-by" line with your name. If
> that's not ok, I can pull it out.
>
Not a problem.
> >
> > Resetting the BMC remotely, was not handled (this seems obvious given the state
> > changes are handled via ipmi_msg handler). Would the BMC send an event
> > to the kernel letting it know its resetting so that case could be
> > handled?
>
> Unfortunately not. It's one of the many things that would be nice to
> have...
>
> In general, dealing with a BMC being reset is a real pain. They tend to
> do all kinds of different things. The worst is when they sort of act
> like they are operational, but then do strange things.
>
> I haven't thought of a good general purpose way to handle this. I'm
> toying with the idea of making it so if the BMC gets an error, just shut
> things down for a second or so and then test it to see if it's working.
> During this time just return errors, like the new patches do during
> reset.
>
> Thanks for testing these.
>
> -corey
>
Thanks for working with me on this.
> >
> > Best,
> > Fred
Powered by blists - more mailing lists