[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200908003412.GD15602@minyard.net>
Date: Mon, 7 Sep 2020 19:34:12 -0500
From: Corey Minyard <minyard@....org>
To: Markus Boehme <markubo@...zon.com>
Cc: openipmi-developer@...ts.sourceforge.net,
Arnd Bergmann <arnd@...db.de>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
linux-kernel@...r.kernel.org, Stefan Nuernberger <snu@...zon.com>,
SeongJae Park <sjpark@...zon.com>, Amit Shah <aams@...zon.com>
Subject: Re: [PATCH 3/3] ipmi: Add timeout waiting for channel information
On Mon, Sep 07, 2020 at 06:25:37PM +0200, Markus Boehme wrote:
> We have observed hosts with misbehaving BMCs that receive a Get Channel
> Info command but don't respond. This leads to an indefinite wait in the
> ipmi_msghandler's __scan_channels function, showing up as hung task
> messages for modprobe.
>
> Add a timeout waiting for the channel scan to complete. If the scan
> fails to complete within that time, treat that like IPMI 1.0 and only
> assume the presence of the primary IPMB channel at channel number 0.
This patch is a significant rewrite of the function. This makes me a
little uncomfortable. It's generally better to fix the bug in a minimal
patch. It would be easier to read if you had two patches, one to
restructure the code and one to fix the bug.
One comment inline, but it doesn't matter, because...
While thinking about this, I realized an issue with these patches.
There should be timers in the lower layers that detect that the BMC does
not respond and should return an error response. This is supposed to be
guaranteed by the lower layer, you shouldn't need timers here. In fact,
if you abort with a timer here, you should get a lower layer reponds
later, causing other issues.
So, this is wrong. If you are never getting a response, there is a bug
in the lower layer. If you are not getting the error response as
quickly as you would like, I'm not sure what to do about that.
The first patch, of course, is an obvious bug fix. I'll include that.
-corey
>
> Signed-off-by: Stefan Nuernberger <snu@...zon.com>
> Signed-off-by: Markus Boehme <markubo@...zon.com>
> ---
> drivers/char/ipmi/ipmi_msghandler.c | 72 ++++++++++++++++++++-----------------
> 1 file changed, 39 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> index 2a2e8b2..9de9ba6 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -3315,46 +3315,52 @@ channel_handler(struct ipmi_smi *intf, struct ipmi_recv_msg *msg)
> */
> static int __scan_channels(struct ipmi_smi *intf, struct ipmi_device_id *id)
> {
> - int rv;
> + long rv;
> + unsigned int set;
>
> - if (ipmi_version_major(id) > 1
> - || (ipmi_version_major(id) == 1
> - && ipmi_version_minor(id) >= 5)) {
> - unsigned int set;
> + if (ipmi_version_major(id) == 1 && ipmi_version_minor(id) < 5) {
This is incorrect, it will not correctly handle IPMI 0.x BMCs. Yes,
they exist.
> + set = intf->curr_working_cset;
> + goto single_ipmb_channel;
> + }
>
> - /*
> - * Start scanning the channels to see what is
> - * available.
> - */
> - set = !intf->curr_working_cset;
> - intf->curr_working_cset = set;
> - memset(&intf->wchannels[set], 0,
> - sizeof(struct ipmi_channel_set));
> -
> - intf->null_user_handler = channel_handler;
> - intf->curr_channel = 0;
> - rv = send_channel_info_cmd(intf, 0);
> - if (rv) {
> - dev_warn(intf->si_dev,
> - "Error sending channel information for channel 0, %d\n",
> - rv);
> - intf->null_user_handler = NULL;
> - return -EIO;
> - }
> + /*
> + * Start scanning the channels to see what is
> + * available.
> + */
> + set = !intf->curr_working_cset;
> + intf->curr_working_cset = set;
> + memset(&intf->wchannels[set], 0, sizeof(struct ipmi_channel_set));
>
> - /* Wait for the channel info to be read. */
> - wait_event(intf->waitq, intf->channels_ready);
> + intf->null_user_handler = channel_handler;
> + intf->curr_channel = 0;
> + rv = send_channel_info_cmd(intf, 0);
> + if (rv) {
> + dev_warn(intf->si_dev,
> + "Error sending channel information for channel 0, %ld\n",
> + rv);
> intf->null_user_handler = NULL;
> - } else {
> - unsigned int set = intf->curr_working_cset;
> + return -EIO;
> + }
>
> - /* Assume a single IPMB channel at zero. */
> - intf->wchannels[set].c[0].medium = IPMI_CHANNEL_MEDIUM_IPMB;
> - intf->wchannels[set].c[0].protocol = IPMI_CHANNEL_PROTOCOL_IPMB;
> - intf->channel_list = intf->wchannels + set;
> - intf->channels_ready = true;
> + /* Wait for the channel info to be read. */
> + rv = wait_event_timeout(intf->waitq, intf->channels_ready, 5 * HZ);
> + if (rv == 0) {
> + dev_warn(intf->si_dev,
> + "Timed out waiting for channel information. Assuming a single IPMB channel at 0.\n");
> + goto single_ipmb_channel;
> }
>
> + goto out;
> +
> +single_ipmb_channel:
> + /* Assume a single IPMB channel at zero. */
> + intf->wchannels[set].c[0].medium = IPMI_CHANNEL_MEDIUM_IPMB;
> + intf->wchannels[set].c[0].protocol = IPMI_CHANNEL_PROTOCOL_IPMB;
> + intf->channel_list = intf->wchannels + set;
> + intf->channels_ready = true;
> +
> +out:
> + intf->null_user_handler = NULL;
> return 0;
> }
>
> --
> 2.7.4
>
Powered by blists - more mailing lists