linux-kernel - Re: [PATCH v2] printk: Avoid softlockups in console

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFTL4hxGwUXmHs0venWANAkYs1CSwjDR1KEFF+waxqzSwup5TQ@mail.gmail.com>
Date:	Tue, 5 Feb 2013 23:56:09 +0100
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Jan Kara <jack@...e.cz>, LKML <linux-kernel@...r.kernel.org>,
	jslaby@...e.cz, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH v2] printk: Avoid softlockups in console_unlock()

2013/2/5 Andrew Morton <akpm@...ux-foundation.org>:
> On Mon,  4 Feb 2013 23:17:10 +0100
> Jan Kara <jack@...e.cz> wrote:
>
>> A CPU can be caught in console_unlock() for a long time (tens of seconds are
>> reported by our customers) when other CPUs are using printk heavily and serial
>> console makes printing slow. Despite serial console drivers are calling
>> touch_nmi_watchdog() this triggers softlockup warnings because
>> interrupts are disabled for the whole time console_unlock() runs (e.g.
>> vprintk() calls console_unlock() with interrupts disabled). Thus IPIs
>> cannot be processed and other CPUs get stuck spinning in calls like
>> smp_call_function_many(). Also RCU eventually starts reporting lockups.
>>
>> In my artifical testing I also managed to trigger a situation when disk
>> disappeared from the system apparently because commands to / from it
>> could not be delivered for long enough. This is why just silencing
>> watchdogs isn't a reliable solution to the problem and we simply have to
>> avoid spending too long in console_unlock().
>>
>> We fix the issue by limiting the time we spend in console_unlock() to
>> watchdog_thresh() / 4 (unless we are in an early boot stage or oops is
>> happening). The rest of the buffer will be printed either by further
>> callers to printk() or by a queued work.
>
> I still hate the patch :(
>
>> ...
>>
>> +void console_unlock(void)
>> +{
>> +     if (__console_unlock()) {
>> +             /* Let worker do the rest of printing */
>> +             schedule_work(&printk_work);
>> +     }
>>  }
>
> This creates another place from where we cannot call printk(): anywhere
> where worker_pool.lock is held.
>
> And as schedule_work() can do a wakeup it creates a third reason why
> the sched code cannot call printk (along with rq->lock taken by
> wake_up(klogd) and rq->lock taken by up(&console_sem).  Hence
> printk_sched().  See the lkml thread "[GIT PULL] printk: Support for
> full dynticks mode".

Agreed, I really wish we avoid that workqueue solution.

>
> We already have machinery for doing async tickling in printk: the
> printk_pending stuff.  Did you consider adding another
> PRINTK_PENDING_foo in some fashion?

Yeah, that would delay until the next timer tick (small exception
after my patchset, if tick is stopped this will trigger as soon as
irqs are re-enabled through a self IPI) but we can probably improve
that behaviour. And that won't mess up with locking scenarios. The
printk tick (or irq work after my patchset) can also re-trigger itself
until the next tick if the batch to send to the console driver is too
big.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/