linux-kernel - Re: [PATCH] printk: avoid livelock if another CPU printks continuously

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160208161123.2ceb601d@gandalf.local.home>
Date:	Mon, 8 Feb 2016 16:11:23 -0500
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Denys Vlasenko <dvlasenk@...hat.com>
Cc:	linux-kernel@...r.kernel.org, srostedt@...hat.com,
	Tejun Heo <tj@...nel.org>,
	Peter Hurley <peter@...leysoftware.com>
Subject: Re: [PATCH] printk: avoid livelock if another CPU printks
 continuously

On Mon,  8 Feb 2016 21:35:03 +0100
Denys Vlasenko <dvlasenk@...hat.com> wrote:


> This patch is reported to make affected user's machine survive.

Would be nice to have a test case for this. Make a test module to
reproduce the issue?

> 
> Signed-off-by: Denys Vlasenko <dvlasenk@...hat.com>
> CC: linux-kernel@...r.kernel.org
> CC: srostedt@...hat.com
> CC: Steven Rostedt <rostedt@...dmis.org>
> CC: Tejun Heo <tj@...nel.org>
> CC: Peter Hurley <peter@...leysoftware.com>
> ---
>  kernel/printk/printk.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c963ba5..ca4f9d55 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2235,6 +2235,7 @@ void console_unlock(void)
>  	unsigned long flags;
>  	bool wake_klogd = false;
>  	bool do_cond_resched, retry;
> +	unsigned cnt;
>  
>  	if (console_suspended) {
>  		up_console_sem();
> @@ -2257,6 +2258,7 @@ void console_unlock(void)
>  	/* flush buffered message fragment immediately to console */
>  	console_cont_flush(text, sizeof(text));
>  again:
> +	cnt = 5;
>  	for (;;) {
>  		struct printk_log *msg;
>  		size_t ext_len = 0;
> @@ -2284,6 +2286,9 @@ skip:
>  		if (console_seq == log_next_seq)
>  			break;
>  
> +		if (--cnt == 0)
> +			break;	/* Someone else printk's like crazy */
> +
>  		msg = log_from_idx(console_idx);
>  		if (msg->flags & LOG_NOCONS) {
>  			/*
> @@ -2350,6 +2355,26 @@ skip:
>  	if (retry && console_trylock())
>  		goto again;
>  
> +	if (cnt == 0) {
> +		/*
> +		 * Other CPU(s) printk like crazy, filling log_buf[].
> +		 * Try to get rid of the "honor" of servicing their data:
> +		 * give _them_ time to grab console_sem and start working.
> +		 */
> +		cnt = 9999;

I'll ignore that this looks very hacky.

> +		while (--cnt != 0) {
> +			cpu_relax();
> +			if (console_seq == log_next_seq) {

First, console_seq needs logbuf_lock protection. On some archs, this may
hit 9999 every time as the console_seq is most likely in cache and isn't
updating. Not to mention the race of another task moving log_next_seq
too and this could have been on another CPU changing both console_seq
and log_next_seq.

Perhaps just save off console_seq and see if it changes at all.


> +				/* Good, other CPU entered "for(;;)" loop */
> +				goto out;
> +			}
> +		}
> +		/* No one seems to be willing to take it... */
> +		if (console_trylock())
> +			goto again; /* we took it */

Perhaps add a few loops to the taking of the console sem. But again,
this just sounds like playing with heuristics, and I hate heuristics.

There's gotta be a better solution.

-- Steve

> +		/* Nope, someone else holds console_sem! Good */
> +	}
> +out:
>  	if (wake_klogd)
>  		wake_up_klogd();
>  }