lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200409073634.t2xgdhoyb4jmidnp@beryllium.lan>
Date:   Thu, 9 Apr 2020 09:36:34 +0200
From:   Daniel Wagner <dwagner@...e.de>
To:     "Ewan D. Milne" <emilne@...hat.com>
Cc:     linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>
Subject: Re: [PATCH] scsi: core: Rate limit "rejecting I/O" messages

Hi Ewan,

On Wed, Apr 08, 2020 at 03:16:27PM -0400, Ewan D. Milne wrote:
> On Wed, 2020-04-08 at 19:10 +0200, Daniel Wagner wrote:
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1217,7 +1217,7 @@ scsi_prep_state_check(struct scsi_device *sdev,
> > struct request *req)
> >  		 */
> >  		if (!sdev->offline_already) {
> >  			sdev->offline_already = true;
> > -			sdev_printk(KERN_ERR, sdev,
> > +			sdev_printk_ratelimited(KERN_ERR, sdev,
> >  				    "rejecting I/O to offline
> > device\n");
> 
> I would really prefer we not do it this way if at all possible.
> It loses information we may need to debug SAN outage problems.

Understood.

> The reason I didn't use ratelimit is that the ratelimit structure is
> per-instance of the ratelimit call here, not per-device.  So this
> doesn't work right -- it will drop messages for other devices.

I didn't really think this through. Sorry.

> > -		sdev_printk(KERN_ERR, sdev,
> > +		sdev_printk_ratelimited(KERN_ERR, sdev,
> >  			    "rejecting I/O to dead device\n");
> 
> I practice I hardly see this message, do you actually have a case
> where this happens?  If so perhaps add another flag similar to
> offline_already?
> 
> The offline message happens a *lot*, we get a ton of them for each
> active device when the queues are unblocked when a target goes away.

I've missed commit b0962c53bde9 ("scsi: core: avoid repetitive logging
of device offline messages") which should address the report I got in
our enterprise kernel. I was over eager to rate limit the 'dead
device' as well. It seem no need for this patch. Let me backport the
commit and see what our customer has to say.

Thanks for the help!
Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ