linux-kernel - Re: 4.14 backport request for dbdda842fe96f: "printk: Add console owner and waiter logic to load balance console writes"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJmjG2-e6f6p=pE5uDECMc=W=81SYyGCmoabrC1ePXwL5DFdSw@mail.gmail.com>
Date:   Sun, 21 Oct 2018 11:09:22 -0700
From:   Daniel Wang <wonderfly@...gle.com>
To:     sergey.senozhatsky.work@...il.com
Cc:     Petr Mladek <pmladek@...e.com>, rostedt@...dmis.org,
        stable@...r.kernel.org, Alexander.Levin@...rosoft.com,
        akpm@...ux-foundation.org, byungchul.park@....com,
        dave.hansen@...el.com, hannes@...xchg.org, jack@...e.cz,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Mel Gorman <mgorman@...e.de>, mhocko@...nel.org, pavel@....cz,
        penguin-kernel@...ove.sakura.ne.jp, peterz@...radead.org,
        tj@...nel.org, torvalds@...ux-foundation.org, vbabka@...e.cz,
        Cong Wang <xiyou.wangcong@...il.com>,
        Peter Feiner <pfeiner@...gle.com>
Subject: Re: 4.14 backport request for dbdda842fe96f: "printk: Add console
 owner and waiter logic to load balance console writes"

Just got back from vacation. Thanks for the continued discussion. Just so
I understand the current state. Looks like we've got a pretty good explanation
of what's going on (though not completely sure), and backporting Steven's
patches is still the way to go? I see that Sergey had sent an RFC series
for similar things. Are those trying to solve the deadlock problem in a
different way?On Thu, Oct 4, 2018 at 1:55 AM Sergey Senozhatsky
<sergey.senozhatsky.work@...il.com> wrote:
>
> On (10/04/18 10:36), Petr Mladek wrote:
> >
> > This looks like a reasonable explanation of what is happening here.
> > It also explains why the console owner logic helped.
>
> Well, I'm still a bit puzzled, frankly speaking. I've two theories.
>
> Theory #1 [most likely]
>
>   Steven is a wizard and his code cures whatever problem we throw it at.
>
> Theory #2
>
>   console_sem hand over actually spreads print out, so we don't have one CPU
> doing all the printing job. Instead every CPU prints its backtrace, while the
> CPU which issued all_cpus_backtrace() waits for them. So all_cpus_backtrace()
> still has to wait for NR_CPUS * strlen(bakctrace), which still probably
> truggers NMI panic on it at some point. The panic CPU send out stop IPI, then
> it waits for foreign CPUs to ACK stop IPI request - for 10 seconds. So each
> CPU prints its backtrace, then ACK stop IPI. So when panic CPU proceeds with
> flush_on_panic() and emergency_reboot() uart_port->lock is unlocked. Without
> the patch we probably declare NMI panic on the CPU which does all the printing
> work, and panic sometimes jumps in when that CPU is in busy in
> serial8250_console_write(), holding the uart_port->lock. So we can't re-enter
> the 8250 driver from panic CPU and we can't reboot the system. In other
> words... Steven is a wizard.
>
> > > serial8250_console_write()
> > > {
> > >     if (port->sysrq)
> > >             locked = 0;
> > >     else if (oops_in_progress)
> > >             locked = spin_trylock_irqsave(&port->lock, flags);
> > >     else
> > >             spin_lock_irqsave(&port->lock, flags);
> > >
> > >     ...
> > >     uart_console_write(port, s, count, serial8250_console_putchar);
> > >     ...
> > >
> > >     if (locked)
> > >             spin_unlock_irqrestore(&port->lock, flags);
> > > }
> > >
> > > Now... the problem. A theory, in fact.
> > > panic() sets oops_in_progress back to zero - bust_spinlocks(0) -  too soon.
> >
> > I see your point. I am just a bit scared of this way. Ignoring locks
> > is a dangerous and painful approach in general.
>
> Well, I agree. But 8250 is not the only console which does ignore
> uart_port lock state sometimes. Otherwise sysrq would be totally unreliable,
> including emergency reboot. So it's sort of how it has been for quite some
> time, I guess. We are in panic(), it's over, so we probably can ignore
> uart_port->lock at this point.
>
>         -ss



--
Best,
Daniel

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4849 bytes)