[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200913123930.GH15602@minyard.net>
Date: Sun, 13 Sep 2020 07:39:30 -0500
From: Corey Minyard <minyard@....org>
To: Xianting Tian <tian.xianting@....com>
Cc: arnd@...db.de, gregkh@...uxfoundation.org,
openipmi-developer@...ts.sourceforge.net,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ipmi: retry to get device id when error
On Sun, Sep 13, 2020 at 08:02:03PM +0800, Xianting Tian wrote:
> We can't get bmc's device id with low probability when loading ipmi driver,
> it caused bmc device register failed. This issue may caused by bad lpc
> signal quality. When this issue happened, we got below kernel printks:
> [Wed Sep 9 19:52:03 2020] ipmi_si IPI0001:00: IPMI message handler: device id demangle failed: -22
> [Wed Sep 9 19:52:03 2020] IPMI BT: using default values
> [Wed Sep 9 19:52:03 2020] IPMI BT: req2rsp=5 secs retries=2
> [Wed Sep 9 19:52:03 2020] ipmi_si IPI0001:00: Unable to get the device id: -5
> [Wed Sep 9 19:52:04 2020] ipmi_si IPI0001:00: Unable to register device: error -5
>
> When this issue happened, we want to manually unload the driver and try to
> load it again, but it can't be unloaded by 'rmmod' as it is already 'in use'.
I'm not sure this patch is a good idea; it would cause a long boot delay
in situations where there really isn't a BMC out there. Yes, it
happens.
You don't have to reload the driver to add a device, though. You can
hot-add devices using /sys/modules/ipmi_si/parameters/hotmod. Look in
Documentation/driver-api/ipmi.rst for details.
Does that work for you?
-corey
>
> We add below 'printk' in handle_one_recv_msg(), when this issue happened,
> the msg we received is "Recv: 1c 01 d5", which means the data_len is 1,
> data[0] is 0xd5.
> Debug code:
> static int handle_one_recv_msg(struct ipmi_smi *intf,
> struct ipmi_smi_msg *msg) {
> printk("Recv: %*ph\n", msg->rsp_size, msg->rsp);
> ... ...
> }
> Then in ipmi_demangle_device_id(), it returned '-EINVAL' as 'data_len < 7'
> and 'data[0] != 0'.
>
> We used this patch to retry to get device id when error happen, we
> reproduced this issue again and the retry succeed on the first retry, we
> finally got the correct msg and then all is ok:
> Recv: 1c 01 00 01 81 05 84 02 af db 07 00 01 00 b9 00 10 00
>
> So use retry machanism in this patch to give bmc more opportunity to
> correctly response kernel.
>
> Signed-off-by: Xianting Tian <tian.xianting@....com>
> ---
> drivers/char/ipmi/ipmi_msghandler.c | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> index 737c0b6b2..bfb2de77a 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -34,6 +34,7 @@
> #include <linux/uuid.h>
> #include <linux/nospec.h>
> #include <linux/vmalloc.h>
> +#include <linux/delay.h>
>
> #define IPMI_DRIVER_VERSION "39.2"
>
> @@ -60,6 +61,9 @@ enum ipmi_panic_event_op {
> #else
> #define IPMI_PANIC_DEFAULT IPMI_SEND_PANIC_EVENT_NONE
> #endif
> +
> +#define GET_DEVICE_ID_MAX_RETRY 5
> +
> static enum ipmi_panic_event_op ipmi_send_panic_event = IPMI_PANIC_DEFAULT;
>
> static int panic_op_write_handler(const char *val,
> @@ -2426,19 +2430,26 @@ send_get_device_id_cmd(struct ipmi_smi *intf)
> static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc)
> {
> int rv;
> -
> - bmc->dyn_id_set = 2;
> + unsigned int retry_count = 0;
>
> intf->null_user_handler = bmc_device_id_handler;
>
> +retry:
> + bmc->dyn_id_set = 2;
> +
> rv = send_get_device_id_cmd(intf);
> if (rv)
> return rv;
>
> wait_event(intf->waitq, bmc->dyn_id_set != 2);
>
> - if (!bmc->dyn_id_set)
> + if (!bmc->dyn_id_set) {
> + msleep(1000);
> + if (++retry_count <= GET_DEVICE_ID_MAX_RETRY)
> + goto retry;
> +
> rv = -EIO; /* Something went wrong in the fetch. */
> + }
>
> /* dyn_id_set makes the id data available. */
> smp_rmb();
> --
> 2.17.1
>
Powered by blists - more mailing lists