linux-kernel - Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171218200351.09164cdd@gandalf.local.home>
Date:   Mon, 18 Dec 2017 20:03:51 -0500
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc:     Tejun Heo <tj@...nel.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Petr Mladek <pmladek@...e.com>, Jan Kara <jack@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Rafael Wysocki <rjw@...ysocki.net>,
        Pavel Machek <pavel@....cz>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

On Tue, 19 Dec 2017 09:52:48 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com> wrote:

> > The case here, you are talking about a CPU doing console_lock() from a
> > non printk() case. Which is what I was asking about how often this
> > happens.  
> 
> I'd say often enough. but the point I was trying to make is that we can
> have non-atomic CPUs which can do the print out, instead of "sharing the
> load" between atomic CPUs.

We don't even know if sharing between "atomic" and "non-atomic" is an
issue. Anything that does a printk() in an atomic location, is going to
have latency to begin with.

> 
> > As for why there's no handoff. Does the non printk()
> > console_lock/unlock ever happen from a critical location? I don't think
> > it does (but I haven't checked). Then it is the perfect candidate to do
> > all the printing.  
> 
> that's right. that is the point I was trying making. we can have better
> candidates to do all the printing.

Sure, but we don't even know if we have to. A problem scenario hasn't
come up that wasn't due to the current implementation (which my patch
changes).


> I did tests yesterday, traces are available. I can't conclude that
> the patch fixes the unfairness of printk().

It doesn't fix the "unfairness" it fixes the unboundedness of printk.
That is what has been triggering all the issues from before.



> consider the following case
> 
> we have console_lock() from non-atomic context. console_sem owner is
> getting preempted, under console_sem. which is totally possible and
> happens a lot. in the mean time we have OOM, which can print a lot of
> info. by the time console_sem returns back to TASK_RUNNING logbuf
> contains some number of pending messages [lets say 10 seconds worth
> of printing]. console owner goes to console_unlock(). accidentally
> we have printk from IRQ on CPUz. console_owner hands over printing
> duty to CPUz. so now we have to print 10 seconds worth of OOM messages
> from irq.

Yes that can happen. But printk's from irq context is not nice to have
either, and should only happen when things are going wrong to begin
with.

> 
> 
> 
> CPU0                        CPU1 ~ CPUx                     CPUz
> 
> console_lock
> 
>  << preempted >>
> 
> 
>    OOM                    OOM printouts, lots
>                           of OOM traces, etc.
> 
>                           OOM end [progress done].
> 
>  << back to RUNNING >>
> 
>   console_unlock()
>     
>     for (;;)
>       sets console_owner
>       call_console_drivers()				  IRQ
>                                                          printk
> 							  sees console_owner
> 							  sets console_waiter
> 
>       clears console_owner
>       sees console_waier
>       handoff
>                                                           for (;;) {
> 							     call_console_drivers()
> 							     ??? lockup
> 							  }
> 							  up()
> 
> this is how we down() from non-atomic and up() from atomic [if we make
> it to up(). we might end up in NMI panic]. this scenario is totally possible,

The printk buffer needs to be very big, and bad things have to happen
first. This is a theoretical scenario, and I'd like to see it happen in
the real world before we try to fix it. My patch should make printk
behave *MUCH BETTER* than it currently does.

If you are worried about NMI panics, then we could add a touch nmi
within the printk loop.


> isn't it? the optimistic expectation here is that some other printk() from
> non-atomic CPU will jump in and take over printing from atomic CPUz. but I
> don't see why we are counting on it.

I don't see why we even care. Placing a printk in an atomic context is
a problem to begin with, and should only happen if there's issues in
the system.

-- Steve