linux-kernel - Re: [PATCH v2] printk: Avoid softlockups in console

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130206142346.GF6330@quack.suse.cz>
Date:	Wed, 6 Feb 2013 15:23:46 +0100
From:	Jan Kara <jack@...e.cz>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Jan Kara <jack@...e.cz>, LKML <linux-kernel@...r.kernel.org>,
	jslaby@...e.cz, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH v2] printk: Avoid softlockups in console_unlock()

On Tue 05-02-13 12:38:38, Andrew Morton wrote:
> On Mon,  4 Feb 2013 23:17:10 +0100
> Jan Kara <jack@...e.cz> wrote:
> 
> > A CPU can be caught in console_unlock() for a long time (tens of seconds are
> > reported by our customers) when other CPUs are using printk heavily and serial
> > console makes printing slow. Despite serial console drivers are calling
> > touch_nmi_watchdog() this triggers softlockup warnings because
> > interrupts are disabled for the whole time console_unlock() runs (e.g.
> > vprintk() calls console_unlock() with interrupts disabled). Thus IPIs
> > cannot be processed and other CPUs get stuck spinning in calls like
> > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> > 
> > In my artifical testing I also managed to trigger a situation when disk
> > disappeared from the system apparently because commands to / from it
> > could not be delivered for long enough. This is why just silencing
> > watchdogs isn't a reliable solution to the problem and we simply have to
> > avoid spending too long in console_unlock().
> > 
> > We fix the issue by limiting the time we spend in console_unlock() to
> > watchdog_thresh() / 4 (unless we are in an early boot stage or oops is
> > happening). The rest of the buffer will be printed either by further
> > callers to printk() or by a queued work.
> 
> I still hate the patch :(
> 
> > ...
> >
> > +void console_unlock(void)
> > +{
> > +	if (__console_unlock()) {
> > +		/* Let worker do the rest of printing */
> > +		schedule_work(&printk_work);
> > +	}
> >  }
> 
> This creates another place from where we cannot call printk(): anywhere
> where worker_pool.lock is held.
> 
> And as schedule_work() can do a wakeup it creates a third reason why
> the sched code cannot call printk (along with rq->lock taken by
> wake_up(klogd) and rq->lock taken by up(&console_sem).  Hence
> printk_sched().  See the lkml thread "[GIT PULL] printk: Support for
> full dynticks mode".
> 
> We already have machinery for doing async tickling in printk: the
> printk_pending stuff.  Did you consider adding another
> PRINTK_PENDING_foo in some fashion?
  Yes, I noticed that thread just yesterday and also though that using
similar trick might be viable. I'll experiment if we could use the same
method for handling lockup problems I hit. Steven seems to have already
tweaked PRINTK_PENDING stuff to be usable more easily...

								Honza

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/