linux-kernel - Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't boot anymore

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f67f7566-24f2-9c71-36be-2e55ec436097@rasmusvillemoes.dk>
Date:   Thu, 13 Feb 2020 08:45:28 +0100
From:   Rasmus Villemoes <linux@...musvillemoes.dk>
To:     Christophe Leroy <christophe.leroy@....fr>,
        Li Yang <leoyang.li@....com>, Qiang Zhao <qiang.zhao@....com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
        Scott Wood <oss@...error.net>,
        linux-arm-kernel@...ts.infradead.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't
 boot anymore

On 12/02/2020 15.24, Christophe Leroy wrote:
> Hi Rasmus,
> 
> Kernel 5.6-rc1 silently fails on boot.
> 
> I bisected the problem to commit b6231ea2b3c6 ("soc: fsl: qe: drop
> broken lazy call of cpm_muram_init()")
> 
> I get a bad_page_fault() for an access at address 8 in
> cpm_muram_alloc_common(), called from cpm_uart_console_setup() via
> cpm_uart_allocbuf()

Sorry about that. But I'm afraid I don't see what I could have done
differently - the patch series, including b6231ea2b3c6, has been in
-next since 20191210, both you and ppc-dev were cc'ed on the entire
series (last revision sent November 28). And I've been dogfooding the
patches on both arm- and ppc-derived boards ever since (but obviously
only for a few cpus).

> Reverting the guilty commit on top of 5.6-rc1 is not trivial.
> 
> In your commit text you explain that cpm_muram_init() is called via
> subsys_initcall. But console init is done before that, so it cannot work.

No, but neither did the code I removed seem to work - how does doing
spin_lock_init on a held spinlock, and then unlocking it, work? Is
everything-spinlock always a no-op in your configuration? And even so,
I'd think a GFP_KERNEL allocation under spin_lock_irqsave() would
trigger some splat somewhere?

Please note I'm not claiming my patch is not at fault, it clearly is, I
just want to try to understand how I could have been wrong about the
"nobody can have been relying on it" part.

> Do you have a fix for that ?

Not right now, but I'll have a look. It's true that the patch probably
doesn't revert cleanly, but it shouldn't be hard to add back those few
lines in the appropriate spot, with a big fat comment that this does
something very fishy (at least as a temporary measure if we don't find a
proper solution soonish).

Rasmus