linux-kernel - Re: [RFC][PATCH -next 1/2] printk: move can_use_console out of console_trylock_for

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160119150040.GA558@swordfish>
Date:	Wed, 20 Jan 2016 00:00:40 +0900
From:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
To:	Petr Mladek <pmladek@...e.com>
Cc:	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Tejun Heo <tj@...nel.org>, Jan Kara <jack@...e.com>,
	Kyle McMartin <kyle@...nel.org>,
	Dave Jones <davej@...emonkey.org.uk>,
	Calvin Owens <calvinowens@...com>,
	linux-kernel@...r.kernel.org,
	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Subject: Re: [RFC][PATCH -next 1/2] printk: move can_use_console out of
 console_trylock_for_printk

Hello,

On (01/19/16 14:31), Petr Mladek wrote:
> > On (01/18/16 16:42), Petr Mladek wrote:
> > > On Thu 2016-01-14 13:57:22, Sergey Senozhatsky wrote:
> > > > vprintk_emit() disables preemption around console_trylock_for_printk()
> > > > and console_unlock() calls for a strong reason -- can_use_console()
> > > > check. The thing is that vprintl_emit() can be called on a CPU that
> > > > is not fully brought up yet (!cpu_online()), which potentially can
> > > > cause problems if console driver accesses per-cpu data. A console
> > > > driver can explicitly state that it's safe to call it from !online
> > > > cpu by setting CON_ANYTIME bit in console ->flags. That's why for
> > > > !cpu_online() can_use_console() iterates all the console to find out
> > > > if there is a CON_ANYTIME console, otherwise console_unlock() must be
> > > > avoided.
> > > > 
> > > > call_console_drivers(), called from console_cont_flush() and
> > > > console_unlock(), does the same test during for_each_console() loop.
> > > > However, we can have the following corner case. Assume that we have 2
> > > > cpus -- CPU0 is online, CPU1 is !online; and no CON_ANYTIME consoles
> > > > available.
> > > > 
> > > > CPU0 online                        CPU1 !online
> > > >                                  console_trylock()
> > > >                                  ...
> > > >                                  console_unlock()
> > > 
> > > Please, where this console_unlock() comes from?
> > 
> > from UP* or DOWN* (_PREPARE f.e.) notifiers on this CPU, for example, we don't
> > know what's going on there. what prevents it from calling console_trylock(),
> > grabbing the console_sem and eventually doing console_unlock()? there is
> > a can_use_console() check, but it handles only one case -- printk().
> 
> So, is it a theoretical problem or do you know about any
> particular path where this happens?

a theoretical one at this point.

> Well, it might make sense to get rid of console_trylock_for_printk()
> and do the can_use_console() check at the beginning of
> unlock_console(). I mean to do
> 
> -	if (console_trylock_for_printk())
> +	if (console_trylock())
> 		unlock_console();

agree, I almost removed it, but decided to keep for a while,
just in case if we would want to add any additional code there.


> But do we really need to repeat the check in every cycle?

well, on every iteration in the best case we check cpu_online()
only. which is what we would have done anyway in vprintk_emit(),
so no additional checks added. at the same time call_console_drivers
does not do '!cpu_online && !CON_ANYTIME' for each console now, so
in some sense there are less checks now (this is far even from a
micro-optimization, just noted).

> can_use_console() checks available consoles and if the CPU
> is online. Consoles could not get added or removed when
> we own console_sem. It seems that CPUs get disabled
> in a process context. Therefore it seems that it might happen
> only when unlock_console() gets rescheduled. I guess that
> it could not get scheduled back on an offline CPU. So, it
> seems that it is enough to check can_use_console() only once at
> the beginning.
> 
> Or did I miss anything?

It does sound interesting, thanks. I was trying to keep the existing
behaviour.

It almost works, there is one case.

console_unlock()    /* w/o can_use_console() in logbuf_lock section */
....
again:
	for (;;) {
		raw_spin_lock_irqsave logbuf_lock
		msg_print_text
		raw_spin_unlock logbuf_lock
		call_console_drivers
		local_irq_restore
	}

	up_console_sem

	raw_spin_lock logbuf_lock
	retry = console_seq != log_next_seq
	raw_spin_unlock_irqrestore logbuf_lock
	
	if retry && console_trylock()
		goto again;


when we up_console_sem(), consoles may appear and disappear, since we
don't keep the semaphore. Suppose that we have OFFLINE cpu and we had a
CON_ANYTIME console, but in between up_console_sem--console_trylock
that single CON_ANYTIME console was removed. So now we have !cpu_online
and !CON_ANYTIME.

So this is why I re-do the can_use_console() check. I can add a bool flag
and do the can_use_console() only once -- when the flag is false, set it
to true on successful can_use_console() and avoid can_use_console() during
this loop; and set the flag to false right after the 'again:' label, which
will imply that console semaphore has been re-acquired and we need to
can_use_console() again.

something like this, perhaps (to be folded.. or should it be a separate
commit -- optimization). will give a test tomorrow.

I reused `bool retry' flag (which is probably a bit hacky, it'll be
better to have a separate byte for this thing). And I moved
can_use_console() from console_cont_flush() to the top -- so if we
are executing the `for (;;)' loop, then can_use_console() was
successful in console_cont_flush() and we have to re-do
can_use_console() only if we released console_sem and jumped to
`again' label.

---
 kernel/printk/printk.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 28b2dec..c7a5ce8 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2170,6 +2170,11 @@ static bool console_cont_flush(char *text, size_t size)
 
 	raw_spin_lock_irqsave(&logbuf_lock, flags);
 
+	if (!can_use_console()) {
+		ret = false;
+		goto out;
+	}
+
 	if (!cont.len)
 		goto out;
 
@@ -2181,11 +2186,6 @@ static bool console_cont_flush(char *text, size_t size)
 	if (console_seq < log_next_seq && !cont.cons)
 		goto out;
 
-	if (!can_use_console()) {
-		ret = false;
-		goto out;
-	}
-
 	len = cont_print_text(text, size);
 	raw_spin_unlock(&logbuf_lock);
 	stop_critical_timings();
@@ -2219,7 +2219,7 @@ void console_unlock(void)
 	static u64 seen_seq;
 	unsigned long flags;
 	bool wake_klogd = false;
-	bool do_cond_resched, retry, aborted = false;
+	bool do_cond_resched, retry = false, aborted = false;
 
 	if (console_suspended) {
 		up_console_sem();
@@ -2255,9 +2255,20 @@ again:
 
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
 
-		if (!can_use_console()) {
-			aborted = true;
-			break;
+		/*
+		 * @retry == true implies that we have released console_sem
+		 * and, thus, someone could have added/removed console(-s).
+		 * we need to can_use_console() again. @retry intially set
+		 * to false, because console_cont_flush() does the
+		 * can_use_console() check and we can't execute this loop
+		 * in can_use_console() returned `false` there.
+		 */
+		if (retry) {
+			retry = false;
+			if (!can_use_console()) {
+				aborted = true;
+				break;
+			}
 		}
 
 		if (seen_seq != log_next_seq) {
-- 
2.7.0