linux-kernel - Re: 4.14 backport request for dbdda842fe96f: "printk: Add console owner and waiter logic to load balance console writes"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 12 Dec 2018 17:10:34 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Sasha Levin <sashal@...nel.org>
Cc:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Daniel Wang <wonderfly@...gle.com>,
        Petr Mladek <pmladek@...e.com>,
        Steven Rostedt <rostedt@...dmis.org>, stable@...r.kernel.org,
        Alexander.Levin@...rosoft.com,
        Andrew Morton <akpm@...ux-foundation.org>,
        byungchul.park@....com, dave.hansen@...el.com, hannes@...xchg.org,
        jack@...e.cz, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Mel Gorman <mgorman@...e.de>, mhocko@...nel.org, pavel@....cz,
        penguin-kernel@...ove.sakura.ne.jp,
        Peter Zijlstra <peterz@...radead.org>, tj@...nel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>, vbabka@...e.cz,
        Cong Wang <xiyou.wangcong@...il.com>,
        Peter Feiner <pfeiner@...gle.com>
Subject: Re: 4.14 backport request for dbdda842fe96f: "printk: Add console
 owner and waiter logic to load balance console writes"

On (12/12/18 01:48), Sasha Levin wrote:
> > > > I guess we still don't have a really clear understanding of what exactly
> > > is going in your system
> > > 
> > > I would also like to get to the bottom of it. Unfortunately I haven't
> > > got the expertise in this area nor the time to do it yet. Hence the
> > > intent to take a step back and backport Steven's patch to fix the
> > > issue that has resurfaced in our production recently.
> > 
> > No problem.
> > I just meant that -stable people can be a bit "unconvinced".
> 
> The -stable people tried adding this patch back in April, but ended up
> getting complaints up the wazoo (https://lkml.org/lkml/2018/4/9/154)
> about how this is not -stable material.

OK, really didn't know that! I wasn't Cc-ed on that AUTOSEL email,
and I wasn't Cc-ed on this whole discussion and found it purely
accidentally while browsing linux-mm list.

I understand what Petr meant by his email. Not arguing; below are just
my 5 cents.

> So yes, testing/acks welcome :)

OK. The way I see it (and I can be utterly wrong here):

The patch set in question, most likely and probably (*and those are
theories*), makes panic() deadlock less likely because panic_cpu waits
for console_sem owner to release uart_port/console_owner locks before
panic_cpu pr_emerg("Kernel panic - not syncing"), dump_stack()-s and
brings other CPUs down via stop IPI or NMI.
So a precondition is
		panic CPU != uart_port->lock owner CPU

If the panic happens on the same CPU which holds the uart_port spin_lock,
then the deadlock is still there, just like before; we have another patch
which attempts to fix this (it makes console drivers re-entrant from
panic()).

So if you are willing to backport this set to -stable, then I wouldn't
mind, probably would be more correct if we don't advertise this as a
"panic() deadlock fix" tho; we know that deadlock is still possible.
And there will be another -stable backport request in a week or so.


In the meantime, I can add my Acked-by to this backport if it helps.

/* Assuming that my theories explain what's happening with
   Daniel's systems. */

	-ss