linux-kernel - Re: [PATCH] scsi: core: Rate limit "rejecting I/O" messages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <120ce7f4cd1fd070e1f7c353223c21b8e4f29337.camel@redhat.com>
Date:   Wed, 08 Apr 2020 15:16:27 -0400
From:   "Ewan D. Milne" <emilne@...hat.com>
To:     Daniel Wagner <dwagner@...e.de>, linux-scsi@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>
Subject: Re: [PATCH] scsi: core: Rate limit "rejecting I/O" messages

On Wed, 2020-04-08 at 19:10 +0200, Daniel Wagner wrote:
> Prevent excessive logging by rate limiting the "rejecting I/O"
> messages. For example in setups where remote syslog is used the link
> is saturated by those messages when a storage controller/disk
> misbehaves.
> 
> Cc: "James E.J. Bottomley" <jejb@...ux.ibm.com>
> Cc: "Martin K. Petersen" <martin.petersen@...cle.com>
> Signed-off-by: Daniel Wagner <dwagner@...e.de>
> ---
>  drivers/scsi/scsi_lib.c    |  4 ++--
>  include/scsi/scsi_device.h | 10 ++++++++++
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 47835c4b4ee0..01c35c58c6f3 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1217,7 +1217,7 @@ scsi_prep_state_check(struct scsi_device *sdev,
> struct request *req)
>  		 */
>  		if (!sdev->offline_already) {
>  			sdev->offline_already = true;
> -			sdev_printk(KERN_ERR, sdev,
> +			sdev_printk_ratelimited(KERN_ERR, sdev,
>  				    "rejecting I/O to offline
> device\n");

I would really prefer we not do it this way if at all possible.
It loses information we may need to debug SAN outage problems.

The reason I didn't use ratelimit is that the ratelimit structure is
per-instance of the ratelimit call here, not per-device.  So this
doesn't work right -- it will drop messages for other devices.

>  		}
>  		return BLK_STS_IOERR;
> @@ -1226,7 +1226,7 @@ scsi_prep_state_check(struct scsi_device *sdev,
> struct request *req)
>  		 * If the device is fully deleted, we refuse to
>  		 * process any commands as well.
>  		 */
> -		sdev_printk(KERN_ERR, sdev,
> +		sdev_printk_ratelimited(KERN_ERR, sdev,
>  			    "rejecting I/O to dead device\n");

I practice I hardly see this message, do you actually have a case
where this happens?  If so perhaps add another flag similar to
offline_already?

The offline message happens a *lot*, we get a ton of them for each
active device when the queues are unblocked when a target goes away.

-Ewan

>  		return BLK_STS_IOERR;
>  	case SDEV_BLOCK:
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index c3cba2aaf934..8be40b0e1b8f 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -257,6 +257,16 @@ sdev_prefix_printk(const char *, const struct
> scsi_device *, const char *,
>  #define sdev_printk(l, sdev, fmt, a...)				
> \
>  	sdev_prefix_printk(l, sdev, NULL, fmt, ##a)
>  
> +#define sdev_printk_ratelimited(l, sdev, fmt, a...)			
> \
> +({									
> \
> +	static DEFINE_RATELIMIT_STATE(_rs,				
> \
> +				      DEFAULT_RATELIMIT_INTERVAL,	\
> +				      DEFAULT_RATELIMIT_BURST);		
> \
> +									
> \
> +	if (__ratelimit(&_rs))						
> \
> +		sdev_prefix_printk(l, sdev, NULL, fmt, ##a);		
> \
> +})
> +
>  __printf(3, 4) void
>  scmd_printk(const char *, const struct scsi_cmnd *, const char *,
> ...);
>