lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 09 Apr 2020 13:07:05 -0400
From:   "Ewan D. Milne" <emilne@...hat.com>
To:     Joe Perches <joe@...ches.com>, Daniel Wagner <dwagner@...e.de>,
        linux-scsi@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>
Subject: Re: [PATCH] scsi: core: Rate limit "rejecting I/O" messages

On Wed, 2020-04-08 at 12:49 -0700, Joe Perches wrote:
> 
> Could add a ratelimit_state to struct scsi_device.
> 
> Something like:
> ---
>  drivers/scsi/scsi_scan.c   | 2 ++
>  include/scsi/scsi_device.h | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index f2437a..938c83f 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -279,6 +279,8 @@ static struct scsi_device *scsi_alloc_sdev(struct
> scsi_target *starget,
>  	scsi_change_queue_depth(sdev, sdev->host->cmd_per_lun ?
>  					sdev->host->cmd_per_lun : 1);
>  
> +	ratelimit_state_init(&sdev->rs, DEFAULT_RATELIMIT_INTERVAL,
> +			     DEFAULT_RATELIMIT_BURST);
>  	scsi_sysfs_device_initialize(sdev);
>  
>  	if (shost->hostt->slave_alloc) {
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index c3cba2..2600de7 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -8,6 +8,7 @@
>  #include <linux/blkdev.h>
>  #include <scsi/scsi.h>
>  #include <linux/atomic.h>
> +#include <linux/ratelimit.h>
>  
>  struct device;
>  struct request_queue;
> @@ -233,6 +234,7 @@ struct scsi_device {
>  	struct mutex		state_mutex;
>  	enum scsi_device_state sdev_state;
>  	struct task_struct	*quiesced_by;
> +	struct ratelimit_state	rs;
>  	unsigned long		sdev_data[];
>  } __attribute__((aligned(sizeof(unsigned long))));
>  

We could but in our experience this may not work well enough.  We do
wants to see the message when the device goes offline, so we can look
at logs from SAN failures to see when that happened, but logging more
than one message per device is worthless.  And there can be *LOTS*
of LUNs behind targets that go away.  Hundreds.  Thousands, even.

I keep getting crash dumps with nothing useful in the dmesg buffer.
And we see a lot of serial console lockups.

-Ewan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ