[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <711a9a60-264b-9b86-6772-6585622a5bd4@csgroup.eu>
Date: Wed, 28 Apr 2021 08:08:17 +0200
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: Eddie James <eajames@...ux.ibm.com>, linuxppc-dev@...ts.ozlabs.org
Cc: linux-kernel@...r.kernel.org, benh@...nel.crashing.org,
paulus@...ba.org, mpe@...erman.id.au, npiggin@...il.com,
miltonm@...ibm.com
Subject: Re: PPC476 hangs during tlb flush after calling /init in crash kernel
with linux 5.4+
Le 28/04/2021 à 00:42, Eddie James a écrit :
> On Tue, 2021-04-27 at 19:26 +0200, Christophe Leroy wrote:
>> Hi Eddies,
>>
>> Le 27/04/2021 à 19:03, Eddie James a écrit :
>>> Hi all,
>>>
>>> I'm having a problem in simulation and hardware where my PPC476
>>> processor stops executing instructions after callling /init. In my
>>> case
>>> this is a bash script. The code descends to flush the TLB, and
>>> somewhere in the loop in _tlbil_pid, the PC goes to
>>> InstructionTLBError47x but does not go any further. This only
>>> occurs in
>>> the crash kernel environment, which is using the same kernel,
>>> initramfs, and init script as the main kernel, which executed fine.
>>> I
>>> do not see this problem with linux 4.19 or 3.10. I do see it with
>>> 5.4
>>> and 5.10. I see a fair amount of refactoring in the PPC memory
>>> management area between 4.19 and 5.4. Can anyone point me in a
>>> direction to debug this further? My stack trace is below as I can
>>> run
>>> gdb in simulation.
>>
>> Can you bisect to pin point the culprit commit ?
>
> Hi, thanks for your prompt reply.
>
> Good idea! I have bisected to:
>
> commit 9e849f231c3c72d4c3c1b07c9cd19ae789da0420 (b8-bad,
> refs/bisect/bad)
> Author: Christophe Leroy <christophe.leroy@....fr>
> Date: Thu Feb 21 19:08:40 2019 +0000
>
> powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.
>
> Now that mmu_mapin_ram() is able to handle other blocks
> than the one starting at 0, the WII can use it for all
> its blocks.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@....fr>
> Signed-off-by: Michael Ellerman <mpe@...erman.id.au>
>
> I also confirmed that reverting this commit resolves the issue in 5.4+.
>
> Now, I don't understand why this is problematic or what is really
> happening... Reverting is probably not the desired solution.
>
Can you provide the 'dmesg' or a dump of the logs printed by the kernel at boottime ?
The difference with this commit is that if there are several memblocks, all get mapped. Maybe your
target doesn't like it.
You are talking about simulation, are you using QEMU ? If yes can you provide details so that I can
try and reproduce the issue ?
Thanks
Christophe
Powered by blists - more mailing lists