[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aL711299g2yKstat@mail.minyard.net>
Date: Mon, 8 Sep 2025 10:27:19 -0500
From: Corey Minyard <corey@...yard.net>
To: Gilles BULOZ <gilles.buloz@...tron.com>
Cc: Linux Kernel <linux-kernel@...r.kernel.org>,
OpenIPMI Developers <openipmi-developer@...ts.sourceforge.net>
Subject: Re: user->nr_msgs going negative in ipmi_msghandler.c
On Mon, Sep 08, 2025 at 04:54:50PM +0200, Gilles BULOZ wrote:
> Le 05/09/2025 à 20:50, Corey Minyard a écrit :
> > I'm adding the OpenIPMI mailing list and the LKML.
> >
> > On Fri, Sep 05, 2025 at 05:04:28PM +0200, Gilles BULOZ wrote:
> >> Le 05/09/2025 à 15:15, Gilles BULOZ a écrit :
> >>> Le 05/09/2025 à 14:03, Corey Minyard a écrit :
> >>>> On Fri, Sep 05, 2025 at 06:54:23AM -0500, Corey Minyard wrote:
> >>>>> On Fri, Sep 05, 2025 at 10:16:19AM +0200, Gilles BULOZ wrote:
> >>>>>> Hi Corey,
> >>>>>>
> >>>>>> I'm HW/SW developer at Kontron (computer manufacturer) and don't know what to
> >>>>>> think about an issue with user->nr_msgs going negative in ipmi_msghandler.c.
> >>>>>> Not sure if it's a weakness in ipmi_msghandler.c or a bug in the IPMC software
> >>>>>> of our computer board (satellite) with event messages.
> >>>>> I worked with people from Kontron a long time ago. Those were good
> >>>>> times :).
> >>>>>
> >>>>>> I see this when I run ipmitool on this board while some event messages already
> >>>>>> available. In this case deliver_response() is processing the messages and is
> >>>>>> decreasing user->nr_msgs below 0. Then when ipmitool calls
> >>>>>> ioctl(IPMICTL_SEND_COMMAND) it gets an error with errno=EBUSY because in
> >>>>>> i_ipmi_request() user->nr_msgs is incremented but still negative, casted to
> >>>>>> unsigned int so becomes huge, and found greater than max_msgs_per_user (100).
> >>>>> Thanks for the detailed description. The fix for the bug is:
> >>>>>
> >>>>> diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> >>>>> index e12b531f5c2f..ba33070622b1 100644
> >>>>> --- a/drivers/char/ipmi/ipmi_msghandler.c
> >>>>> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> >>>>> @@ -1634,6 +1634,7 @@ int ipmi_set_gets_events(struct ipmi_user *user, bool val)
> >>>>>
> >>>>> list_for_each_entry_safe(msg, msg2, &msgs, link) {
> >>>>> msg->user = user;
> >>>>> + atomic_add(1, &usr->nr_msgs);
> >>>> Sorry, that should obviously be user->nr_msgs
> >>> Thanks very much !
> >>> I've tried it with kernel 6.11.8 and it's better but still not enough.
> >>> Running "ipmitool sensor" with some debug in ipmi_msghandler.c shows that I always have nr_msgs=1 right after the atomic_add (called
> >>> 9 times), then when in i_ipmi_request() I reach line "rv = -EBUSY;" with nr_msgs=-2 (twice).
> >> My understanding is that ipmitool calls ioctl(IPMICTL_SET_GETS_EVENTS_CMD) calling ipmi_set_gets_events() and thanks to your patch
> >> for each event available the nr_msgs is incremented here and decremented in deliver_response(). So your patch is OK for that.
> >> But after that if other events are coming, deliver_response() is called and nr_msgs gets decremented. So when ipmitool calls
> >> ioctl(IPMICTL_SEND_COMMAND), this calls i_ipmi_request() with nr_msgs < 0 and that returns -EBUSY
> >
> > Yeah, after I sent my email I started looking through the driver for
> > other issues around this, and there were several. That change wasn't
> > well thought through.
> >
> > So, I've added some tests around this in my test suite, and I've
> > reworked this to be a much better implementation that's harder to get
> > wrong.
> >
> > I'm going to send the fix in a separate email.
> >
> > Thanks,
> >
> > -corey
> >
> Thanks Corey
> I confirm your fix (separate email) is working for me. Applied on kernel 6.17-rc5 sources, with few changes under drivers/char/ipmi/
> to build the kernel modules there for kernel 6.11.8, and then using these modules.
Thanks a bunch for testing and reporting. At this point it's destined
for 6.18, marked for backport to 4.19. It's a little late to push this
to Linus now.
> Another simple fix that worked for me on 6.11.8 was to replace atomic_dec() with atomic_dec_if_positive() in deliver_response), in
> addition to your suggested change to add an atomic_add() to ipmi_set_gets_events().
Yeah, but that's just covering over the problem. It's ok for a quick
hack, but it's not really fixing the issue. The refcounts need to be
right.
Thanks again.
-corey
> >>>>> kref_get(&user->refcount);
> >>>>> deliver_local_response(intf, msg);
> >>>>> }
> >>>>>
> >>>>>
> >>>>> Can you try that out?
> >>>>>
> >>>>> I'll forward port this to current kernel (if appropriate, this has all
> >>>>> been rewritten) and required a backport.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> -corey
> >>>>>
> >>>>>> As workaround I set module parameter max_msgs_per_user to 0xffffffff so that
> >>>>>> user->nr_msgs is never greater than this value.
> >>>>>> I was using kernel 6.11.8 but updated to 6.16.3 and get the same issue.
> >>>>>> I'm also not sure if our board is supposed to receive such event messages as
> >>>>>> it is not Shelf Manager, even if these events come from the local sensors of
> >>>>>> the board.
> >>>>>>
> >>>>>> Best regards
> >>>>>>
> >>>>>> Gilles Buloz
> >>>>>> Kontron Modular Computers France
> >>>>>> R&D SW/HW senior developer
> >>>>>>
> >>>> .
> > .
>
Powered by blists - more mailing lists