lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 9 Nov 2017 00:06:58 -0500
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc:     Tejun Heo <tj@...nel.org>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
        linux-mm@...ck.org, xiyou.wangcong@...il.com,
        dave.hansen@...el.com, hannes@...xchg.org, mgorman@...e.de,
        mhocko@...nel.org, pmladek@...e.com, sergey.senozhatsky@...il.com,
        vbabka@...e.cz
Subject: Re: [PATCH v3] printk: Add console owner and waiter logic to load
 balance console writes

On Thu, 9 Nov 2017 13:45:48 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com> wrote:

>
> so what we are looking at
> 
>    a) we take over printing. can be from safe context to unsafe context
>       [well, bad karma]. can be from unsafe context to a safe one. or from
>       safe context to another safe context... or from one unsafe context to
>       another unsafe context [bad karma again]. we really never know, no
>       one does.
> 
>       lots of uncertainties - "may be X, may be Y, may be Z". a bigger
>       picture: we still can have the same lockup scenarios as we do
>       have today.
> 
>       and we also bring busy loop with us, so the new console_sem
>       owner [regardless its current context] CPU must wait until the
>       current console_sem finishes its call_console_drivers(). I
>       mentioned it in my another email, you seemed to jump over that
>       part. was it irrelevant or wrong?
> 
> vs.
> 
>    b) we offload to printk_kthread [safe context].
> 
> 
> why (a) is better than (b)?
> 


What does safe context mean? Do we really want to allow the printk
thread to sleep when there's more to print? What happens if there's a
crash at that moment? How do we safely flush out all the data when the
printk thread is sleeping?

Now we could have something that uses both nicely. When the
printk_thread wakes up (we need to figure out when to do that), then it
could constantly take over.


	CPU1				CPU2
	----				----
   console_unlock()
     start printing a lot
     (more than one, wake up printk_thread)

					printk thread wakes up

					becomes the waiter

   sees waiter hands off

					starts printing

   printk()
     becomes waiter

					sees waiter hands off
					then becomes new waiter! <-- key

    starts printing
    sees waiter hands off
					continues printing


That is, we keep the waiter logic, and if anyone starts printing too
much, it wakes up the printk thread (hopefully on another CPU, or the
printk thread should migrate)  when the printk thread starts running it
becomes the new waiter if the console lock is still held (just like in
printk). Then it gets handed off the printk. We could just have the
printk thread keep going, though I'm not sure I would want to let it
schedule while printing. But it could also hand off printks (like
above), but then take it back immediately. This would mean that a
printk caller from a "critical" path will only get to do one message,
before the printk thread asks for it again.

Perhaps we could have more than one printk thread that migrates around,
and they each hand off the printing. This makes sure the printing
always happens and that it never stops due to the console_lock holder
sleeping and we never lock up one CPU that does printing. This would
work with just two printk threads. When one starts a printk loop,
another one wakes up on another CPU and becomes the waiter to get the
handoff of the console_lock. Then the first could schedule out (migrate
if the current CPU is busy), and take over. In  fact, this would
basically have two CPUs bouncing back and forth to do the printing.

This gives us our cake and we get to eat it too.

One, printing never stops (no scheduling out), as there's two threads
to share the load (obiously only on SMP machines).

There's no lock up. There's two threads that print a little, pass off
the console lock, do a cond_resched(), then takes over again.

Bascially, what I'm saying is that this is not two different solutions.
There is two algorithms that can work together to give us reliable
output and not lock up the system in doing so.

-- Steve

Powered by blists - more mailing lists