linux-kernel - Re: [PATCH] printk: avoid livelock if another CPU printks continuously

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160209101707.540641fb@gandalf.local.home>
Date:	Tue, 9 Feb 2016 10:17:07 -0500
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Denys Vlasenko <dvlasenk@...hat.com>
Cc:	linux-kernel@...r.kernel.org, srostedt@...hat.com,
	Tejun Heo <tj@...nel.org>,
	Peter Hurley <peter@...leysoftware.com>
Subject: Re: [PATCH] printk: avoid livelock if another CPU printks
 continuously

On Tue, 09 Feb 2016 15:59:48 +0100
Denys Vlasenko <dvlasenk@...hat.com> wrote:

> > First, console_seq needs logbuf_lock protection. On some archs, this may
> > hit 9999 every time as the console_seq is most likely in cache and isn't
> > updating.  
> 
> We end up here only in one case:
> 
>                if (console_seq == log_next_seq)
>                        break;
> 
>                if (--cnt == 0)
>                        break;  /* Someone else printk's like crazy */
> 
> - if this second "break" triggers.
> In this case, "console_seq == log_next_seq" can't be true.

We released all locks, why can't it be true? What prevents another task
on another CPU from coming into this section and updating everything?

> 
> Therefore, if we later see that console_seq become
> equal to log_next_seq (even if momentarily), then other CPU definitely
> entered printk servicing loop. Which means that we are off the hook
> - that other CPU is responsible now. We can bail out.
> 
> I saw this as a cheap (no locking ops, no writes) way to check whether
> we can exit cpu_relax() loop early. It is not reliable, yes.
> The definitive check for the case where no one took the job is later,
> "if (console_trylock())". If that trylock succeeds, we must iterate
> over the buffer once again - there might be unserviced printk messages.
> 
> (Why early exit check it is effective?
> IIRC cpu_relax() is a memory barrier. Compiler is not allowed to cache
> memory variables across it, thus "if (console_seq == log_next_seq)..."
> check always re-reads these variables from memory.

cpu_relax() is a compiler barrier, not a memory barrier. Yes, the
compiler wont cache the variables, but nothing stops the CPU from doing
so. All archs use this code.

> )
> 
> Not that I am particularly attached to this particular form of the check.
> Propose a better method. Do you prefer
> "if (console_seq != saved_console_seq)"? Do you want READ_ONCE() there?

Again, READ_ONCE() is only good for the compiler, not the hardware.

spin_locks() are memory barriers, and will sync everything.

> 
> >> +				/* Good, other CPU entered "for(;;)" loop */
> >> +				goto out;
> >> +			}
> >> +		}
> >> +		/* No one seems to be willing to take it... */
> >> +		if (console_trylock())
> >> +			goto again; /* we took it */  
> > 
> > Perhaps add a few loops to the taking of the console sem.  
> 
> Why?
> 
> If we fail to take the lock, another CPU took it.
> There is no need to try harder, as soon as we know that any
> other CPU took that lock, we can safely exit this function.

Because if this CPU is the one spamming the other CPU, it will widen
the window to be the one that takes the lock.

I still think this is very hacky.

-- Steve