lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 22 Aug 2013 23:57:42 +0200
From:	Jan Kara <jack@...e.cz>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Jan Kara <jack@...e.cz>, LKML <linux-kernel@...r.kernel.org>,
	mhocko@...e.cz, hare@...e.de, Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH 0/4 v6] Avoid softlockups in console_unlock()

On Thu 22-08-13 12:49:13, Andrew Morton wrote:
> On Thu, 22 Aug 2013 00:59:15 +0200 Jan Kara <jack@...e.cz> wrote:
> 
> > On Wed 21-08-13 14:27:23, Andrew Morton wrote:
> > > On Wed, 21 Aug 2013 10:08:28 +0200 Jan Kara <jack@...e.cz> wrote:
> > > 
> > > > These patches avoid softlockups when a CPU gets caught in console_unlock() for
> > > > a long time during heavy printing from other CPU. As is discussed in patch 3/4
> > > > it isn't enough to just silence the watchdog because if CPU spends too long in
> > > > console_unlock() also RCU will complain, other CPUs can be blocked waiting for
> > > > printing CPU to process IPI, and even disk can be offlined because commands
> > > > couldn't be delivered to it for too long.
> > > > 
> > > > This patch series solves the problem by stopping printing in console_unlock()
> > > > after 1000 characters and the printing is postponed to irq work. To avoid
> > > > hogging a single CPU (irq work gets processed on the same CPU where it was
> > > > queued so it doesn't really help to reduce the printing load on that CPU) we
> > > > introduce a new type of lazy irq work - IRQ_WORK_UNBOUND - which can be
> > > > processed by any CPU.
> > > 
> > > I still hate the patchset :(
> > > 
> > > Remind us why we need this?  Whose kernel is spewing so much logging and
> > > why?
> >   We have customers (quite a few of them actually) which have machines with
> > lots of SCSI disks attached (due to multipath etc.) and during boot when
> > these disks are discovered and partitions set up quite some printing
> > happens - multiplied by the number of devices (1000+) it is too much for a
> > serial console to handle quickly enough. So these machines aren't able to
> > boot with serial console enabled.
> 
> It sounds like rather a corner case, not worth mucking up the critical
> core logging code.
> 
> Desperately seeking alternatives...
> 
> I suppose there's some reason why we can't just make those drivers shut
> up?  If the messages are in the log buffer but aren't displayed,
> they're still accessible after boot?
> 
> Or how about passing those messages over to a kernel thread, to be
> printed out at a lower rate?  A linked list and schedule_work() would
> suffice.
  Andrew, you seem really desperate ;-) I don't really like modifying
individual drivers, partitioning code, or SCSI core to be less verbose -
IMHO that's fighting with windmills and it's not like any of those parts is
excessively verbose. Every part prints its bits and it accumulates. I
cannot really imagine this would work long term.

Handing over printing to someone else is exactly what I'm doing - if
there's too big traffic so that one CPU is forced to write a lot of stuff
for other CPUs. The only difference to what you suggest seems to be that
you would like to explicitely mark printks that can be passed to someone
else. We could technically do that but I have trouble how do identify which
printks to mark - I could experimetally find those for some machine but
finding it for all machines is difficult. And then you have cases like
'echo t >/proc/sysrq-trigger' which currently kill the machine with serial
console and lots of processes (I've tried that although no customer
complained about this yet). And marking 'less important' printks wouldn't
bring any code simplification anyway since you would still have to handle
offload for the marked printks.

So as much as I understand the uncertainty of giving up the printing CPU
and relying on a timer tick on some CPU to pick up printing disturbs you, it
seems as the most maintainable solution to me... Or do you have other
concerns?

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ