linux-kernel - Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171215090801.eulx4pg54p667ya5@pathway.suse.cz>
Date:   Fri, 15 Dec 2017 10:08:01 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc:     Steven Rostedt <rostedt@...dmis.org>, Tejun Heo <tj@...nel.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Jan Kara <jack@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Rafael Wysocki <rjw@...ysocki.net>,
        Pavel Machek <pavel@....cz>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

On Fri 2017-12-15 17:42:36, Sergey Senozhatsky wrote:
> On (12/15/17 09:31), Petr Mladek wrote:
> > > On (12/14/17 22:18), Steven Rostedt wrote:
> > > > > Steven, your approach works ONLY when we have the following preconditions:
> > > > > 
> > > > >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> > > > >     etc) context
> > > > > 
> > > > >         what does guarantee that? what happens if there is NO non-atomic
> > > > >         CPU or that non-atomic simplky missses the console_owner != false
> > > > >         point? we are going to conclude
> > > > > 
> > > > >         "if printk() doesn't work for you, it's because you are holding it wrong"?
> > > > > 
> > > > > 
> > > > >         what if that non-atomic CPU does not call printk(), but instead
> > > > >         it does console_lock()/console_unlock()? why there is no handoff?
> > > > > 
> > > > >         CPU0				CPU1 ~ CPU10
> > > > > 					in atomic contexts [!]. ping-ponging console_sem
> > > > > 					ownership to each other. while what they really
> > > > > 					need to do is to simply up() and let CPU0 to
> > > > > 					handle it.
> > > > > 					printk
> > > > > 	console_lock()
> > > > > 	 schedule()
> > > > > 					...
> > > > > 					printk
> > > > > 					printk
> > > > > 					...
> > > > > 					printk
> > > > > 					printk
> > > > > 
> > > > > 					up()
> > > > > 
> > > > > 	// woken up
> > > > > 	console_unlock()
> > > > > 
> > > > >         why do we make an emphasis on fixing vprintk_printk()?
> > 
> > Is the above scenario really dangerous? console_lock() owner is
> > able to sleep. Therefore there is no risk of a softlockup.
> > 
> > Sure, many messages will get stacked in the meantime and the console
> > owner my get then passed to another owner in atomic context. But
> > do you really see this in the real life?
> 
> console_sem is locked by atomic printk CPU1~CPU10. non-atomic CPU is just
> sleeping waiting for the console_sem. while atomic printk CPUs just hand
> off console_sem ownership to each other without ever up()-ing the console_sem.
> what's the point of hand off here? how is that going to work?
> 
> what we need to do is to offload printing from atomic contexts to a
> non-atomic one - which is CPU0. and that non-atomic CPU is sleeping
> on the console_sem, ready to continue printing. but it never gets its
> chance to do so, because CPU0 ~ CPU10 just passing console_sem ownership
> around, resulting in the same "we print from atomic context" thing.

Yes, I understand the scenario. The question is how much it is
realistic. And if it is realistic, the question is if the Steven's
patch helps to avoid the softlockup or not.

IMHO, what we need is to push the patch into wild and wait for
real life reports.


> > Of course, there is a chance that it will pass the work from
> > a safe context to atomic one. But there was the same chance that
> > the work already started in the atomic context. Therefore statistically
> > this should not make things worse.
> 
> which is not a justification. we are not looking for a solution that
> does not make the things worse. we are looking for a solution that
> does improve the things.

But it does improve things! The question is if it is enough or not
in the real life.

Do you see a scenario where it makes things statistically or
even deterministically worse?

Why did you completely ignored the paragraph about step by step
approach? Is there anything wrong about it?


You are looking for a perfect solution. But there is no perfect
solution. There still will be conflict between the user requirements:
"get messages out" vs. "do not lockup the machine".

The nice thing about Steven's solution is that it slightly improves
one side and does not make worse the other side. Or am I wrong?
Sure, it is possible that it will not be enough. But why not try
this small step first?

Best Regards,
Petr