linux-kernel - scsi logging future directions, was Re: [RFC PATCH -logging 00/10] scsi/constants: Output continuous error messages on trace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140824204454.GA11423@lst.de>
Date:	Sun, 24 Aug 2014 22:44:54 +0200
From:	Christoph Hellwig <hch@....de>
To:	"Elliott, Robert (Server Storage)" <Elliott@...com>
Cc:	Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@...achi.com>,
	Hannes Reinecke <hare@...e.de>,
	"Martin K. Petersen" <martin.petersen@...cle.com>,
	"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
	"yrl.pp-manager.tt@...achi.com" <yrl.pp-manager.tt@...achi.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"James E.J. Bottomley" <JBottomley@...allels.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	Doug Gilbert <dgilbert@...erlog.com>,
	Hidehiro Kawai <hidehiro.kawai.ez@...achi.com>,
	Christoph Hellwig <hch@....de>
Subject: scsi logging future directions, was Re: [RFC PATCH -logging 00/10]
	scsi/constants: Output continuous error messages on trace

On Fri, Aug 22, 2014 at 12:39:59AM +0000, Elliott, Robert (Server Storage) wrote:
> If you trigger hundreds of errors (e.g., hot remove a device
> during heavy IO), then all the prints to the linux serial console
> bog down the system, causing timeouts in commands to other
> devices and soft lockups for applications.
> 
> Some changes that would help are:
> 1. Put them under SCSI logging level control
> 2. Use printk_ratelimited so an excessive number are trimmed
> 
> Would you like to include something like this in your
> patch set?

I think we should come to an agreement where we want to go with scsi
logging first before doing various smaller adjustments.  (Although your
example is one that's urgent enough that I'd like to put it in ASAP,
I had issues with it a few times).

I had a chat with Martin at Linuxcon about these issues, and we were
both in favor of getting rid of the old scsi logging mechansisms and
instead replace it by an extended version of the scsi tracepoints that
cover all places, and dump all data from the old logging mechanism
that people find useful.

In a few places we'd still want to log normal dev_printk style errors,
and the I/O completion is one of them, even if they really need to be
ratelimited and condensed.

If someone has arguments in favour of keeping the old logging code
I'd love to hear them, but in practive the traceevent code has huge
benefits:

 - almost zero overhead if disabled
 - can easily be used without any tools through configs, but can be used
   even better with tools like trace-cmd or perf
 - allows both fine and coarse grained selections of events to trace
 - allows to capture statistics on each trace point without event enabling the
   output
 - doesn't have any of the console lockup problems.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/