[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aJ_lUYTlrzYnRD-5@mail.minyard.net>
Date: Fri, 15 Aug 2025 20:56:33 -0500
From: Corey Minyard <corey@...yard.net>
To: Frederick Lawler <fred@...udflare.com>
Cc: openipmi-developer@...ts.sourceforge.net, linux-kernel@...r.kernel.org,
kernel-team@...udflare.com
Subject: Re: [RFC] Patches to disable messages during BMC reset
On Fri, Aug 15, 2025 at 04:23:08PM -0500, Frederick Lawler wrote:
> Hi Corey,
>
> On Thu, Aug 07, 2025 at 06:02:31PM -0500, Corey Minyard wrote:
> > I went ahead and did some patches for this, since it was on my mind.
> >
> > With these, if a reset is sent to the BMC, the driver will disable
> > messages to the BMC for a time, defaulting to 30 seconds. Don't
> > modify message timing, since no messages are allowed, anyway.
> >
> > If a firmware update command is sent to the BMC, then just reject
> > sysfs commands that query the BMC. Modify message timing and
> > allow direct messages through the driver interface.
> >
> > Hopefully this will work around the problem, and it's a good idea,
> > anyway.
> >
> > -corey
> >
>
> Thanks for the patches, and sorry for the delay in response.
> It's one of _those weeks_. Anyway, I backported the patch series
> to 6.12, and the changes seem reasonable to me overall. Ran it
> through our infra on a single node, and nothing seemed to break.
>
> I did observe with testing that resetting BMC via ipmitool on the host
> did kick out sysfs reads as expected.
Ok, I took the liberty of adding a "Tested-by" line with your name. If
that's not ok, I can pull it out.
>
> Resetting the BMC remotely, was not handled (this seems obvious given the state
> changes are handled via ipmi_msg handler). Would the BMC send an event
> to the kernel letting it know its resetting so that case could be
> handled?
Unfortunately not. It's one of the many things that would be nice to
have...
In general, dealing with a BMC being reset is a real pain. They tend to
do all kinds of different things. The worst is when they sort of act
like they are operational, but then do strange things.
I haven't thought of a good general purpose way to handle this. I'm
toying with the idea of making it so if the BMC gets an error, just shut
things down for a second or so and then test it to see if it's working.
During this time just return errors, like the new patches do during
reset.
Thanks for testing these.
-corey
>
> Best,
> Fred
Powered by blists - more mailing lists