[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <120ce7f4cd1fd070e1f7c353223c21b8e4f29337.camel@redhat.com>
Date: Wed, 08 Apr 2020 15:16:27 -0400
From: "Ewan D. Milne" <emilne@...hat.com>
To: Daniel Wagner <dwagner@...e.de>, linux-scsi@...r.kernel.org
Cc: linux-kernel@...r.kernel.org,
"James E.J. Bottomley" <jejb@...ux.ibm.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>
Subject: Re: [PATCH] scsi: core: Rate limit "rejecting I/O" messages
On Wed, 2020-04-08 at 19:10 +0200, Daniel Wagner wrote:
> Prevent excessive logging by rate limiting the "rejecting I/O"
> messages. For example in setups where remote syslog is used the link
> is saturated by those messages when a storage controller/disk
> misbehaves.
>
> Cc: "James E.J. Bottomley" <jejb@...ux.ibm.com>
> Cc: "Martin K. Petersen" <martin.petersen@...cle.com>
> Signed-off-by: Daniel Wagner <dwagner@...e.de>
> ---
> drivers/scsi/scsi_lib.c | 4 ++--
> include/scsi/scsi_device.h | 10 ++++++++++
> 2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 47835c4b4ee0..01c35c58c6f3 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1217,7 +1217,7 @@ scsi_prep_state_check(struct scsi_device *sdev,
> struct request *req)
> */
> if (!sdev->offline_already) {
> sdev->offline_already = true;
> - sdev_printk(KERN_ERR, sdev,
> + sdev_printk_ratelimited(KERN_ERR, sdev,
> "rejecting I/O to offline
> device\n");
I would really prefer we not do it this way if at all possible.
It loses information we may need to debug SAN outage problems.
The reason I didn't use ratelimit is that the ratelimit structure is
per-instance of the ratelimit call here, not per-device. So this
doesn't work right -- it will drop messages for other devices.
> }
> return BLK_STS_IOERR;
> @@ -1226,7 +1226,7 @@ scsi_prep_state_check(struct scsi_device *sdev,
> struct request *req)
> * If the device is fully deleted, we refuse to
> * process any commands as well.
> */
> - sdev_printk(KERN_ERR, sdev,
> + sdev_printk_ratelimited(KERN_ERR, sdev,
> "rejecting I/O to dead device\n");
I practice I hardly see this message, do you actually have a case
where this happens? If so perhaps add another flag similar to
offline_already?
The offline message happens a *lot*, we get a ton of them for each
active device when the queues are unblocked when a target goes away.
-Ewan
> return BLK_STS_IOERR;
> case SDEV_BLOCK:
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index c3cba2aaf934..8be40b0e1b8f 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -257,6 +257,16 @@ sdev_prefix_printk(const char *, const struct
> scsi_device *, const char *,
> #define sdev_printk(l, sdev, fmt, a...)
> \
> sdev_prefix_printk(l, sdev, NULL, fmt, ##a)
>
> +#define sdev_printk_ratelimited(l, sdev, fmt, a...)
> \
> +({
> \
> + static DEFINE_RATELIMIT_STATE(_rs,
> \
> + DEFAULT_RATELIMIT_INTERVAL, \
> + DEFAULT_RATELIMIT_BURST);
> \
> +
> \
> + if (__ratelimit(&_rs))
> \
> + sdev_prefix_printk(l, sdev, NULL, fmt, ##a);
> \
> +})
> +
> __printf(3, 4) void
> scmd_printk(const char *, const struct scsi_cmnd *, const char *,
> ...);
>
Powered by blists - more mailing lists