[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1b1be267b80404dc8ca5a14b3e26710c53f50fb4.camel@redhat.com>
Date: Thu, 09 Apr 2020 13:07:05 -0400
From: "Ewan D. Milne" <emilne@...hat.com>
To: Joe Perches <joe@...ches.com>, Daniel Wagner <dwagner@...e.de>,
linux-scsi@...r.kernel.org
Cc: linux-kernel@...r.kernel.org,
"James E.J. Bottomley" <jejb@...ux.ibm.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>
Subject: Re: [PATCH] scsi: core: Rate limit "rejecting I/O" messages
On Wed, 2020-04-08 at 12:49 -0700, Joe Perches wrote:
>
> Could add a ratelimit_state to struct scsi_device.
>
> Something like:
> ---
> drivers/scsi/scsi_scan.c | 2 ++
> include/scsi/scsi_device.h | 2 ++
> 2 files changed, 4 insertions(+)
>
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index f2437a..938c83f 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -279,6 +279,8 @@ static struct scsi_device *scsi_alloc_sdev(struct
> scsi_target *starget,
> scsi_change_queue_depth(sdev, sdev->host->cmd_per_lun ?
> sdev->host->cmd_per_lun : 1);
>
> + ratelimit_state_init(&sdev->rs, DEFAULT_RATELIMIT_INTERVAL,
> + DEFAULT_RATELIMIT_BURST);
> scsi_sysfs_device_initialize(sdev);
>
> if (shost->hostt->slave_alloc) {
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index c3cba2..2600de7 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -8,6 +8,7 @@
> #include <linux/blkdev.h>
> #include <scsi/scsi.h>
> #include <linux/atomic.h>
> +#include <linux/ratelimit.h>
>
> struct device;
> struct request_queue;
> @@ -233,6 +234,7 @@ struct scsi_device {
> struct mutex state_mutex;
> enum scsi_device_state sdev_state;
> struct task_struct *quiesced_by;
> + struct ratelimit_state rs;
> unsigned long sdev_data[];
> } __attribute__((aligned(sizeof(unsigned long))));
>
We could but in our experience this may not work well enough. We do
wants to see the message when the device goes offline, so we can look
at logs from SAN failures to see when that happened, but logging more
than one message per device is worthless. And there can be *LOTS*
of LUNs behind targets that go away. Hundreds. Thousands, even.
I keep getting crash dumps with nothing useful in the dmesg buffer.
And we see a lot of serial console lockups.
-Ewan
Powered by blists - more mailing lists