linux-kernel - Re: [RFC][PATCH 0/4] printk: introduce printing kernel thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170324144308.GA12055@pathway.suse.cz>
Date:   Fri, 24 Mar 2017 15:43:08 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "Rafael J . Wysocki" <rjw@...ysocki.net>,
        linux-kernel@...r.kernel.org,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Subject: Re: [RFC][PATCH 0/4] printk: introduce printing kernel thread

On Fri 2017-03-24 10:59:36, Sergey Senozhatsky wrote:
> On (03/23/17 09:51), Peter Zijlstra wrote:
> [..]
> > > > sysrq runs from interrupt context, right? Should be able to do wakeups.
> > > 
> > > what I though about was -
> > > 	what if there are 'misbehaving' higher prio tasks all the time?
> > > 	the existing sysrq would attempt to do printing from irq context
> > > 	so it doesn't care about run queues.
> > > 
> > > does it make sense to you?
> > 
> > Ah, that's what you meant. Yeah, dunno, I'm still unconvinced about the
> > whole printk thread thing.
> 
> I see your point.
> but I can't think of alternatives that would fix all those lockups and
> stalls and at the same time have better guarantees than printk_kthread.
> 
> 
> > Also those function names are horrifically long.
> 
> right. not happy with the naming either.
> 
> so what I'm thinking about right now is:
> 
> we have that thing which we call "old printk" mode, which is not
> really informative. and my proposal is rename "old" mode and use
> "printk rescue" mode instead. because we switch to that mode when
> we are trying to "rescue" kernel logs. so the API can be something
> like
> 		printk_rescue_on()
> 		printk_rescue_off()

Sounds good to me. Slight problem is that off() does not cause
stopping the mode if we are nested.

Just one more attempt inspired by this:

		printk_emergency_begin()
		printk_emergency_end()

Note that we actually start this mode automatically also
with pr_emerg() message.

But I am fine with whatever from the mentioned generic names.

> 
> --- random thoughts ---
> 
> another thing that bothers me a bit is that we need to place those
> printk_rescue_on/printk_rescue_off switches all over the kernel.
> sort of a root cause [in some of the cases] here is the fact that
> we don't have any feedback from printk_kthread in vprintk_emit():
> 	does printk_kthread make any progress?
> 	do we flush messages to the serial console?
> 	etc.
> 
> and we've got everything we need to have such a feedback in
> vprintk_emit():
> 
> 	a) console is not suspended so console_unlock() can call console drivers
> 	b) printk_kthread != NULL
> 	c) we are not in enforced rescue/emergency mode
> 	d) `log_next_seq' moves forward (always `true', we are in vprintk_emit())
> 	e) `console_seq' stands still
> 
> so we can have an automatic rescue mode fallback in vprintk_emit().
> if (a)-(e) are true then we give up on waking up printk_kthread,
> switch to rescue mode and attempt to console_trylock() directly from
> vprintk_emit(). the part that sucks here is that we need to give
> printk_kthread some time to catch up. for instance, if (e) is true
> for the past 50 invocations of vprintk_emit(), IOW:
> 
> 	- we added 50 lines to printk
> 	- none have been printed on the serial console
>
> then we
> 	- declare rescue
> 	- do console_trylock() instead of wake_up() //unless in deferred vprintk_emit()

I am not sure if we are able to distinguish a flood of messages
from a real emergency situation.

If we start flushing messages directly when there is a flood
of messages, we will put back the original problem with soft
lookups.

Well, there is a handful of annotated locations at the moment.
I would start thinking of an automatic detection once we have
more of them and have more data for a good heuristic.

I still would like to see the kernel parameter/sysfs knob
that would allow to force the rescue/emergency mode all
the time ;-)

Best Regards,
Petr