lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171215050607.GC11199@jagdpanzerIV>
Date:   Fri, 15 Dec 2017 14:06:07 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Tejun Heo <tj@...nel.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Petr Mladek <pmladek@...e.com>, Jan Kara <jack@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Rafael Wysocki <rjw@...ysocki.net>,
        Pavel Machek <pavel@....cz>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

Hello,

On (12/14/17 22:18), Steven Rostedt wrote:
> > Steven, your approach works ONLY when we have the following preconditions:
> > 
> >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> >     etc) context
> > 
> >         what does guarantee that? what happens if there is NO non-atomic
> >         CPU or that non-atomic simplky missses the console_owner != false
> >         point? we are going to conclude
> > 
> >         "if printk() doesn't work for you, it's because you are holding it wrong"?
> > 
> > 
> >         what if that non-atomic CPU does not call printk(), but instead
> >         it does console_lock()/console_unlock()? why there is no handoff?
> > 
> >         CPU0				CPU1 ~ CPU10
> > 					in atomic contexts [!]. ping-ponging console_sem
> > 					ownership to each other. while what they really
> > 					need to do is to simply up() and let CPU0 to
> > 					handle it.
> > 					printk
> > 	console_lock()
> > 	 schedule()
> > 					...
> > 					printk
> > 					printk
> > 					...
> > 					printk
> > 					printk
> > 
> > 					up()
> > 
> > 	// woken up
> > 	console_unlock()
> > 
> >         why do we make an emphasis on fixing vprintk_printk()?
> 
> Where do we do the above? And has this been proven to be an issue?

um... hundreds of cases.

deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
happening at the same moment + NMI backtraces from all the CPUs (more
than 3 cpus) that follows the lockups, over not-so-fast serial console.
exactly the bug report I received two days ago. so which one of the CPUs
here is a good candidate to successfully emit all of the pending logbuf
entries? none. all of them either have local IRQs disabled, or dump_stack()
from either backtrace IPI or backtrace NMI (depending on the configuration).


do we periodically do console_lock() on a running system? yes, we do.
add to console_unlock()

---

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b9006617710f..1c811f6d94bf 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2143,6 +2143,10 @@ void console_unlock(void)
        bool wake_klogd = false;
        bool do_cond_resched, retry;
 
+       if (!(current->flags & PF_KTHREAD))
+               dump_stack();
+
+
        if (console_suspended) {
                up_console_sem();
                return;

---

and just boot the system.


I work for a company that has several thousand engineers spread
across the globe. and people do use printk(), and issues do happen.

the scenarios that Tejun and I talk about are not theoretical. if those
scenarios are completely theoretical, as you suggest, - then, OK, what
exactly guarantees that

	whenever atomic CPUs printk there is always a non-atomic
	CPU to take over the printing?



> >  b) non-atomic CPU sees console_owner set (which is set for a very short
> >     period of time)
> > 
> >         again. what if that non-atomic CPU does not see console_owner?
> >         "don't use printk()"?
> 
> May I ask, why are we doing the printk in the first place?

this argument is really may be applied against your patch as well. I
really don't want us to have this type of "technical" discussion.

printk() is a tool for developers. but developers can't use.


> >  c) the task that is looping in console_unlock() sees non-atomic CPU when
> >     console_owner is set.
> 
> I haven't looked at the latest code, but my last patch didn't care
> about "atomic" and "non-atomic"

I know. and I think it is sort of a problem.

lots of printk-s are happening from IRQs / softirqs and so on.
take a look at CONFIG_IP_ROUTE_VERBOSE, for example.

do_softirq() -> ip_handle_martian_source() and a bunch of other
places. 
these irq->printk-s can "steal" the console_sem and go to
console_unlock().

"don't use printk() then" type of argument does not really help
to a guy who reports the lockup.


> > Steven, I thought we reached the agreement [**] that the solution we should
> > be working on is a combination of prinkt_kthread and console_sem hand
> > off. Simply because it adds the missing "there is a non-atomic CPU wishing
> > to console_unlock()" thing.
> > 
> > 	lkml.kernel.org/r/20171108162813.GA983427@...big577.frc2.facebook.com
> > 
> > 	https://marc.info/?l=linux-kernel&m=151011840830776&w=2
> > 	https://marc.info/?l=linux-kernel&m=151015141407368&w=2
> > 	https://marc.info/?l=linux-kernel&m=151018900919386&w=2
> > 	https://marc.info/?l=linux-kernel&m=151019815721161&w=2
> > 	https://marc.info/?l=linux-kernel&m=151020275921953&w=2
> > **	https://marc.info/?l=linux-kernel&m=151020404622181&w=2
> > **	https://marc.info/?l=linux-kernel&m=151020565222469&w=2
> 
> I'm still fine with the hybrid approach, but I want to see a problem
> first before we fix it.
> 
> > 
> > 
> > what am I missing?
> 
> The reproducer.

will that printk_test module

  lkml.kernel.org/r/20171204135314.9122-2-sergey.senozhatsky@...il.com

suffice?

	-ss

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ