linux-kernel - Re: [RFC PATCH] printk: Introduce "store now but print later" prefix.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190306100404.jzfk7oqsh3turqth@pathway.suse.cz>
Date:   Wed, 6 Mar 2019 11:04:04 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        John Ogness <john.ogness@...utronix.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] printk: Introduce "store now but print later" prefix.

On Tue 2019-03-05 10:23:03, Tetsuo Handa wrote:
> Petr Mladek wrote:
> > On Mon 2019-03-04 20:40:37, Tetsuo Handa wrote:
> > > On 2019/03/04 12:22, Sergey Senozhatsky wrote:
> > > > On (02/23/19 13:42), Tetsuo Handa wrote:
> > > > [..]
> > > >> This patch tries to address "don't lockup the system" with minimal risk of
> > > >> failing to "print out printk() messages", by allowing printk() callers to
> > > >> tell printk() "store $body_text_lines lines into logbuf but start actual
> > > >> printing after $trailer_text_line line is stored into logbuf". This patch
> > > >> is different from existing printk_deferred(), for printk_deferred() is
> > > >> intended for scheduler/timekeeping use only. Moreover, what this patch
> > > >> wants to do is "do not try to print out printk() messages as soon as
> > > >> possible", for accumulated stalling period cannot be decreased if
> > > >> printk_deferred() from e.g. dump_tasks() from out_of_memory() immediately
> > > >> prints out the messages. The point of this patch is to defer the stalling
> > > >> duration to after leaving the critical section.
> > > > 
> > > > We can export printk deferred, I guess; but I'm not sure if it's going
> > > > to be easy to switch OOM to printk_deferred - there are lots of direct
> > > > printk callers: warn-s, dump_stacks, etc; it might even be simpler to
> > > > start re-directing OOM printouts to printk_safe buffer.
> > 
> > Exactly. OOM calls many functions that are called also in other situations.
> > The async messages are not pushed to the console unless someone calls
> > a non-async printk.
> > 
> > How do you want to guarantee that the non-async printk will get called
> > in all situations? The async printk() API is too error-prone.
> 
> I'm suggesting to use a non-async printk() for $trailer_text_line line. I think
> that changing all printk() called from out_of_memory() to async is doable, if
> the caller of out_of_memory() wakes up a dedicated kthread described below.

This is error prone. The trailing printk is not guaranteed:

    + It might get lost by code refactoring.

    + People would miss that it is needed when async printk is used
      in a shared function, e.g. dump_stack().

    + People will not know that it is needed because they
      are not familiar with the API. The would just use it
      as part of a cargo cult programming or so.

For this reason, this API is a "no no no" from my side.


I am open to bring back discussions about a forced offload when
a single task is blocked in console_unlock() for too long.

It might be configurable depending on the usecase. For example,
real time kernel would be more aggressive with offloading.
It might need to be completely disabled to debug certain
type of problems, e.g. sudden death.


But I would really like to avoid even more printk variants.
They have many problems:

    + Only a handful of people would know how to use them right.

    + They are hard to maintain. Any incompatible printk() in the call
      chain might break the intention.

    + They are problematic in a shared code. Incompatible printk()
      variants might be required/preferred according to the use case.

    + They might improve situation in one workload and make it
      worse in another one.


> > For example, I would stop printing allocation failures until
> > the currently running OOM killer succeeds in freeing some memory.
> 
> We could serialize out_of_memory() and warn_alloc(), but since warn_alloc()
> can be called from atomic context (e.g. GFP_ATOMIC from interrupt handler)
> while out_of_memory() is always called from schedulable context (e.g.
> GFP_KERNEL), we can't use oom_lock mutex.

I did not mean serializing. I meant to avoid printing the warnings
at all until OOM killer finishes its job.


> Also, both nopage_rs in warn_alloc() and oom_rs in oom_kill_process() are not
> working well. This is because ___ratelimit() function assumes that operations
> to be ratelimited complete fast enough to be able to repeat many times within
> a second. If one operation to be ratelimited takes many seconds (or even
> minutes), ___ratelimit() becomes always true and can not ratelimit at all.

The current ratelimiting is time driven. We might need an event
driven variant. It might even be done with the current
implementation if we add something like:

void ratelimit_reset(struct ratelimit_state *rs)
{
	unsigned long flags;

	raw_spin_irqsave(&rs->lock, flags);

	rs->begin   = jiffies;
	rs->printed = 0;

	raw_spin_unlock_irqrestore(&rs->lock, flags);
}

We could call this when some event "solved" the problem.
It means that it makes sense to enable the related
ratelimited messages again because they would describe
another problem.

Best Regards,
Petr