[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170324144308.GA12055@pathway.suse.cz>
Date: Fri, 24 Mar 2017 15:43:08 +0100
From: Petr Mladek <pmladek@...e.com>
To: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"Rafael J . Wysocki" <rjw@...ysocki.net>,
linux-kernel@...r.kernel.org,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Subject: Re: [RFC][PATCH 0/4] printk: introduce printing kernel thread
On Fri 2017-03-24 10:59:36, Sergey Senozhatsky wrote:
> On (03/23/17 09:51), Peter Zijlstra wrote:
> [..]
> > > > sysrq runs from interrupt context, right? Should be able to do wakeups.
> > >
> > > what I though about was -
> > > what if there are 'misbehaving' higher prio tasks all the time?
> > > the existing sysrq would attempt to do printing from irq context
> > > so it doesn't care about run queues.
> > >
> > > does it make sense to you?
> >
> > Ah, that's what you meant. Yeah, dunno, I'm still unconvinced about the
> > whole printk thread thing.
>
> I see your point.
> but I can't think of alternatives that would fix all those lockups and
> stalls and at the same time have better guarantees than printk_kthread.
>
>
> > Also those function names are horrifically long.
>
> right. not happy with the naming either.
>
> so what I'm thinking about right now is:
>
> we have that thing which we call "old printk" mode, which is not
> really informative. and my proposal is rename "old" mode and use
> "printk rescue" mode instead. because we switch to that mode when
> we are trying to "rescue" kernel logs. so the API can be something
> like
> printk_rescue_on()
> printk_rescue_off()
Sounds good to me. Slight problem is that off() does not cause
stopping the mode if we are nested.
Just one more attempt inspired by this:
printk_emergency_begin()
printk_emergency_end()
Note that we actually start this mode automatically also
with pr_emerg() message.
But I am fine with whatever from the mentioned generic names.
>
> --- random thoughts ---
>
> another thing that bothers me a bit is that we need to place those
> printk_rescue_on/printk_rescue_off switches all over the kernel.
> sort of a root cause [in some of the cases] here is the fact that
> we don't have any feedback from printk_kthread in vprintk_emit():
> does printk_kthread make any progress?
> do we flush messages to the serial console?
> etc.
>
> and we've got everything we need to have such a feedback in
> vprintk_emit():
>
> a) console is not suspended so console_unlock() can call console drivers
> b) printk_kthread != NULL
> c) we are not in enforced rescue/emergency mode
> d) `log_next_seq' moves forward (always `true', we are in vprintk_emit())
> e) `console_seq' stands still
>
> so we can have an automatic rescue mode fallback in vprintk_emit().
> if (a)-(e) are true then we give up on waking up printk_kthread,
> switch to rescue mode and attempt to console_trylock() directly from
> vprintk_emit(). the part that sucks here is that we need to give
> printk_kthread some time to catch up. for instance, if (e) is true
> for the past 50 invocations of vprintk_emit(), IOW:
>
> - we added 50 lines to printk
> - none have been printed on the serial console
>
> then we
> - declare rescue
> - do console_trylock() instead of wake_up() //unless in deferred vprintk_emit()
I am not sure if we are able to distinguish a flood of messages
from a real emergency situation.
If we start flushing messages directly when there is a flood
of messages, we will put back the original problem with soft
lookups.
Well, there is a handful of annotated locations at the moment.
I would start thinking of an automatic detection once we have
more of them and have more data for a good heuristic.
I still would like to see the kernel parameter/sysfs knob
that would allow to force the rescue/emergency mode all
the time ;-)
Best Regards,
Petr
Powered by blists - more mailing lists