linux-kernel - Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't boot anymore

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f68d7a21-63b6-07a1-09de-5e66f422dcae@c-s.fr>
Date:   Thu, 13 Feb 2020 10:40:21 +0000
From:   Christophe Leroy <christophe.leroy@....fr>
To:     Rasmus Villemoes <linux@...musvillemoes.dk>,
        Li Yang <leoyang.li@....com>, Qiang Zhao <qiang.zhao@....com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
        Scott Wood <oss@...error.net>,
        linux-arm-kernel@...ts.infradead.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't
 boot anymore



On 02/13/2020 07:45 AM, Rasmus Villemoes wrote:
> On 12/02/2020 15.24, Christophe Leroy wrote:
>> Hi Rasmus,
>>
>> Kernel 5.6-rc1 silently fails on boot.
>>
>> I bisected the problem to commit b6231ea2b3c6 ("soc: fsl: qe: drop
>> broken lazy call of cpm_muram_init()")
>>
>> I get a bad_page_fault() for an access at address 8 in
>> cpm_muram_alloc_common(), called from cpm_uart_console_setup() via
>> cpm_uart_allocbuf()
> 
> Sorry about that. But I'm afraid I don't see what I could have done
> differently - the patch series, including b6231ea2b3c6, has been in
> -next since 20191210, both you and ppc-dev were cc'ed on the entire
> series (last revision sent November 28). And I've been dogfooding the
> patches on both arm- and ppc-derived boards ever since (but obviously
> only for a few cpus).

Yes, this patch series should have ringed a bell in my head, looks like 
I'm the one who introduced this 4 years ago through commit 4d486e008379 
("soc/fsl/qe: fix Oops on CPM1 (and likely CPM2)")

But I had completely forgotten that patch until I did some git blame 
this morning on this lazy call.


> 
>> Reverting the guilty commit on top of 5.6-rc1 is not trivial.
>>
>> In your commit text you explain that cpm_muram_init() is called via
>> subsys_initcall. But console init is done before that, so it cannot work.
> 
> No, but neither did the code I removed seem to work - how does doing
> spin_lock_init on a held spinlock, and then unlocking it, work? Is
> everything-spinlock always a no-op in your configuration? And even so,
> I'd think a GFP_KERNEL allocation under spin_lock_irqsave() would
> trigger some splat somewhere?
> 
> Please note I'm not claiming my patch is not at fault, it clearly is, I
> just want to try to understand how I could have been wrong about the
> "nobody can have been relying on it" part.
> 

It seems spin_lock_init() does just nothing.
spin_lock_irqsave() just disable IRQs and increases preempt_count.
spin_lock_irqrestore() restore IRQ state, decreace preempt_count and 
call preempt_schedule if preempt_count reaches 0.

Maybe with some debugging options like DEBUG_ATOMIC_SLEEP could detect it ?

>> Do you have a fix for that ?
> 
> Not right now, but I'll have a look. It's true that the patch probably
> doesn't revert cleanly, but it shouldn't be hard to add back those few
> lines in the appropriate spot, with a big fat comment that this does
> something very fishy (at least as a temporary measure if we don't find a
> proper solution soonish).
> 

Thanks
Christophe