linux-kernel - Re: [PATCH] scsi: core: Rate limit "rejecting I/O" messages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1b1be267b80404dc8ca5a14b3e26710c53f50fb4.camel@redhat.com>
Date:   Thu, 09 Apr 2020 13:07:05 -0400
From:   "Ewan D. Milne" <emilne@...hat.com>
To:     Joe Perches <joe@...ches.com>, Daniel Wagner <dwagner@...e.de>,
        linux-scsi@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>
Subject: Re: [PATCH] scsi: core: Rate limit "rejecting I/O" messages

On Wed, 2020-04-08 at 12:49 -0700, Joe Perches wrote:
> 
> Could add a ratelimit_state to struct scsi_device.
> 
> Something like:
> ---
>  drivers/scsi/scsi_scan.c   | 2 ++
>  include/scsi/scsi_device.h | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index f2437a..938c83f 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -279,6 +279,8 @@ static struct scsi_device *scsi_alloc_sdev(struct
> scsi_target *starget,
>  	scsi_change_queue_depth(sdev, sdev->host->cmd_per_lun ?
>  					sdev->host->cmd_per_lun : 1);
>  
> +	ratelimit_state_init(&sdev->rs, DEFAULT_RATELIMIT_INTERVAL,
> +			     DEFAULT_RATELIMIT_BURST);
>  	scsi_sysfs_device_initialize(sdev);
>  
>  	if (shost->hostt->slave_alloc) {
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index c3cba2..2600de7 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -8,6 +8,7 @@
>  #include <linux/blkdev.h>
>  #include <scsi/scsi.h>
>  #include <linux/atomic.h>
> +#include <linux/ratelimit.h>
>  
>  struct device;
>  struct request_queue;
> @@ -233,6 +234,7 @@ struct scsi_device {
>  	struct mutex		state_mutex;
>  	enum scsi_device_state sdev_state;
>  	struct task_struct	*quiesced_by;
> +	struct ratelimit_state	rs;
>  	unsigned long		sdev_data[];
>  } __attribute__((aligned(sizeof(unsigned long))));
>  

We could but in our experience this may not work well enough.  We do
wants to see the message when the device goes offline, so we can look
at logs from SAN failures to see when that happened, but logging more
than one message per device is worthless.  And there can be *LOTS*
of LUNs behind targets that go away.  Hundreds.  Thousands, even.

I keep getting crash dumps with nothing useful in the dmesg buffer.
And we see a lot of serial console lockups.

-Ewan