linux-kernel - Re: PPC476 hangs during tlb flush after calling /init in crash kernel with linux 5.4+

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <711a9a60-264b-9b86-6772-6585622a5bd4@csgroup.eu>
Date:   Wed, 28 Apr 2021 08:08:17 +0200
From:   Christophe Leroy <christophe.leroy@...roup.eu>
To:     Eddie James <eajames@...ux.ibm.com>, linuxppc-dev@...ts.ozlabs.org
Cc:     linux-kernel@...r.kernel.org, benh@...nel.crashing.org,
        paulus@...ba.org, mpe@...erman.id.au, npiggin@...il.com,
        miltonm@...ibm.com
Subject: Re: PPC476 hangs during tlb flush after calling /init in crash kernel
 with linux 5.4+



Le 28/04/2021 à 00:42, Eddie James a écrit :
> On Tue, 2021-04-27 at 19:26 +0200, Christophe Leroy wrote:
>> Hi Eddies,
>>
>> Le 27/04/2021 à 19:03, Eddie James a écrit :
>>> Hi all,
>>>
>>> I'm having a problem in simulation and hardware where my PPC476
>>> processor stops executing instructions after callling /init. In my
>>> case
>>> this is a bash script. The code descends to flush the TLB, and
>>> somewhere in the loop in _tlbil_pid, the PC goes to
>>> InstructionTLBError47x but does not go any further. This only
>>> occurs in
>>> the crash kernel environment, which is using the same kernel,
>>> initramfs, and init script as the main kernel, which executed fine.
>>> I
>>> do not see this problem with linux 4.19 or 3.10. I do see it with
>>> 5.4
>>> and 5.10. I see a fair amount of refactoring in the PPC memory
>>> management area between 4.19 and 5.4. Can anyone point me in a
>>> direction to debug this further? My stack trace is below as I can
>>> run
>>> gdb in simulation.
>>
>> Can you bisect to pin point the culprit commit ?
> 
> Hi, thanks for your prompt reply.
> 
> Good idea! I have bisected to:
> 
> commit 9e849f231c3c72d4c3c1b07c9cd19ae789da0420 (b8-bad,
> refs/bisect/bad)
> Author: Christophe Leroy <christophe.leroy@....fr>
> Date:   Thu Feb 21 19:08:40 2019 +0000
> 
>      powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.
>      
>      Now that mmu_mapin_ram() is able to handle other blocks
>      than the one starting at 0, the WII can use it for all
>      its blocks.
>      
>      Signed-off-by: Christophe Leroy <christophe.leroy@....fr>
>      Signed-off-by: Michael Ellerman <mpe@...erman.id.au>
> 
> I also confirmed that reverting this commit resolves the issue in 5.4+.
> 
> Now, I don't understand why this is problematic or what is really
> happening... Reverting is probably not the desired solution.
> 

Can you provide the 'dmesg' or a dump of the logs printed by the kernel at boottime ?

The difference with this commit is that if there are several memblocks, all get mapped. Maybe your 
target doesn't like it.

You are talking about simulation, are you using QEMU ? If yes can you provide details so that I can 
try and reproduce the issue ?

Thanks
Christophe