linux-kernel - Re: printk: what is going on with additional newlines?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201709021512.DJG00503.JHOSOFFMFtVLOQ@I-love.SAKURA.ne.jp>
Date:   Sat, 2 Sep 2017 15:12:01 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     torvalds@...ux-foundation.org
Cc:     sergey.senozhatsky.work@...il.com, pavel@....cz,
        sergey.senozhatsky@...il.com, pmladek@...e.com,
        rostedt@...dmis.org, jack@...e.cz, akpm@...ux-foundation.org,
        jslaby@...e.com, andi@...as.de, linux-kernel@...r.kernel.org
Subject: Re: printk: what is going on with additional newlines?

Linus Torvalds wrote:
> On Tue, Aug 29, 2017 at 1:41 PM, Tetsuo Handa
> <penguin-kernel@...ove.sakura.ne.jp> wrote:
> >>
> >> A private buffer has none of those issues.
> >
> > Yes, I posted "[PATCH] printk: Add best-effort printk() buffering." at
> > http://lkml.kernel.org/r/1493560477-3016-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp .
> 
> No, this is exactly what I *don't* want, because it takes over printk() itself.
> 
> And that's problematic, because nesting happens for various reasons.
> 
> For example, you try to handle that nesting with printk_context(), and
> nothing when an interrupt happens.
> 
> But that is fundamentally broken.
> 
> Just to give an example: what if an interrupt happens, it does this
> buffering thing, then it gets interrupted by *another* interrupt, and
> now the printk from that other interrupt gets incorrectly nested
> together with the first one, because your "printk_context()" gives
> them the same context?

My assumption was that

  (1) task context can be preempted by soft IRQ context, hard IRQ context and NMI context.
  (2) soft IRQ context can be preempted by hard IRQ context and NMI context.
  (3) hard IRQ context can be preempted by NMI context.
  (4) An kernel-oops event can interrupt task context, soft IRQ context, hard IRQ context
      and NMI context, but the interrupted context can not continue execution of
      vprintk_default() after returning from the kernel-oops event even if the
      kernel-oops event occurred in schedulable context and panic_on_oops == 0.

and thus my "printk_context()" gives them different context.

But my assumption was wrong that

  soft IRQ context can be preempted by different soft IRQ context
  (e.g. SoftIRQ1 can be preempted by SoftIRQ2 while running
  handler for SoftIRQ1, and SoftIRQ2 can be preempted by SoftIRQ3
  while running handler for SoftIRQ2, and so on)

  hard IRQ context can be preempted by different hard IRQ context
  (e.g. HardIRQ1 can be preempted by HardIRQ2 while running
  handler for HardIRQ1, and HardIRQ2 can be preempted by HardIRQ3
  while running handler for HardIRQ2, and so on)

? Then, we need to recognize how many IRQs are nested.

I just tried to distinguish context using one "unsigned long" value
by embedding IRQ status into lower bits of "struct task_struct *".
I can change to distinguish context using multiple "unsigned long" values.

> 
> And it really doesn't have to even be interrupts. Look at what happens
> if you take a page fault in kernel space. Same exact deal. Both are
> sleeping contexts.

Is merging messages from outside a page fault and inside a page fault
so serious? That would happen only if memory access which might cause
a page fault occurs between get_printk_buffer() and put_printk_buffer(),
and I think that such user is rare.

> 
> So I really think that the only thing that knows what the "context" is
> is the person who does the printing. So if you want to create a
> printing buffer, it should be explicit. You allocate the buffer ahead
> of time (perhaps on the stack, possibly using actual allocations), and
> you use that explicit context.

If my assumption was wrong, isn't it dangerous from stack usage point of
view that we try to call kmalloc() (or allocate from stack memory) for
prbuf_init() for each nested level because it is theoretically possible
that a different IRQ jumps in while kmalloc() is in progress (or stack
memory is in use)?

> 
> Yes, it means that you don't do "printk()". You do an actual
> "buf_print()" or similar.
> 
>              Linus
> 

My worry is that there are so many functions which will need to receive
"struct seq_buf *" argument (from tail of __dump_stack() to head of
out_of_memory(), including e.g. cpuset_print_current_mems_allowed()) that
patches for passing "struct seq_buf *" argument becomes so large and
difficult to synchronize. I tried to pass such argument to relevant
functions before I propose "[PATCH] printk: Add best-effort printk()
buffering." patch, but I came to conclusion that passing such argument is
too complicated and too much bloat compared to merit.

If we teach printk subsystem that "I want to use buffering" via
get_printk_buffer(), we don't need to scatter around "struct seq_buf *"
argument throughout the kernel.

Using kmalloc() for prbuf_init() also introduces problems such as

  (a) we need to care about safe GFP flags (i.e. GFP_ATOMIC or
      GFP_KERNEL or something else which cannot be managed by
      current_gfp_context()) based on locking context

  (b) allocations can fail, and printing allocation failure messages
      when printing original messages is disturbing

  (c) allocation stall/failure messages are printed under memory pressure,
      but stack memory is not large enough to store messages related
      allocation stall/failure messages

and thus I want to use "statically allocated buffer" like workqueue's
rescuer kernel thread which can be used under memory pressure.

Linus Torvalds wrote at http://lkml.kernel.org/r/CA+55aFxmL4ybpz19OPn97VYqAk2ZS-tf=0W2Ff1K=-UUB6mYyg@mail.gmail.com :
> On Fri, Sep 1, 2017 at 10:32 AM, Joe Perches <joe@...ches.com> wrote:
> >
> > Yes, it's a poor name.  At least keep using a pr_ prefix.
> 
> I'd suggest perhaps just "pr_line()".
> 
> And instead of having those "err/info/cont" variations, the severity
> level should just be set at initialization time.  Not different
> versions of "pr_line()".
> 
> There's no point to having different severity variations, since the
> *only* reason for this would be for buffering. So "pr_cont()" is kind
> of assumed for everything but the first.

But it is annoying for me that

  Lines1-for-event1-with-loglevel-foo
  Lines2-for-event1-with-loglevel-bar
  Lines3-for-event1-with-loglevel-baz

(like OOM killer messages) are all treated as loglevel foo
breaks console_loglevel filtering and

> 
> And even if you end up doing multiple lines, if you actually do
> different severities, you damn well shouldn't buffer them together.
> They are clearly different things!

two series of messages

  Line1-for-event1-with-loglevel-foo
  Line2-for-event1-with-loglevel-bar
  Line3-for-event1-with-loglevel-bar
  Line4-for-event1-with-loglevel-bar
  Line5-for-event1-with-loglevel-baz

by task/1000 and

  Line1-for-event2-with-loglevel-foo
  Line2-for-event2-with-loglevel-bar
  Line3-for-event2-with-loglevel-bar
  Line4-for-event2-with-loglevel-bar
  Line5-for-event2-with-loglevel-baz

by task/1001 are mixed due to not to buffering lines with different
loglevels causes unreadable logs (unless printk() automatically
inserts context identifier into each line like

  foo task/1000 Line1-for-event1-with-loglevel-foo
  foo task/1001 Line1-for-event2-with-loglevel-foo
  bar task/1001 Line2-for-event2-with-loglevel-bar
  bar task/1001 Line3-for-event2-with-loglevel-bar
  bar task/1001 Line4-for-event2-with-loglevel-bar
  bar task/1000 Line2-for-event1-with-loglevel-bar
  bar task/1000 Line3-for-event1-with-loglevel-bar
  bar task/1000 Line4-for-event1-with-loglevel-bar
  baz task/1000 Line5-for-event1-with-loglevel-baz
  baz task/1001 Line5-for-event2-with-loglevel-baz

).