lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 27 Apr 2021 17:42:32 -0500
From:   Eddie James <eajames@...ux.ibm.com>
To:     Christophe Leroy <christophe.leroy@...roup.eu>,
        linuxppc-dev@...ts.ozlabs.org
Cc:     linux-kernel@...r.kernel.org, benh@...nel.crashing.org,
        paulus@...ba.org, mpe@...erman.id.au, npiggin@...il.com,
        miltonm@...ibm.com
Subject: Re: PPC476 hangs during tlb flush after calling /init in crash
 kernel with linux 5.4+

On Tue, 2021-04-27 at 19:26 +0200, Christophe Leroy wrote:
> Hi Eddies,
> 
> Le 27/04/2021 à 19:03, Eddie James a écrit :
> > Hi all,
> > 
> > I'm having a problem in simulation and hardware where my PPC476
> > processor stops executing instructions after callling /init. In my
> > case
> > this is a bash script. The code descends to flush the TLB, and
> > somewhere in the loop in _tlbil_pid, the PC goes to
> > InstructionTLBError47x but does not go any further. This only
> > occurs in
> > the crash kernel environment, which is using the same kernel,
> > initramfs, and init script as the main kernel, which executed fine.
> > I
> > do not see this problem with linux 4.19 or 3.10. I do see it with
> > 5.4
> > and 5.10. I see a fair amount of refactoring in the PPC memory
> > management area between 4.19 and 5.4. Can anyone point me in a
> > direction to debug this further? My stack trace is below as I can
> > run
> > gdb in simulation.
> 
> Can you bisect to pin point the culprit commit ?

Hi, thanks for your prompt reply.

Good idea! I have bisected to:

commit 9e849f231c3c72d4c3c1b07c9cd19ae789da0420 (b8-bad,
refs/bisect/bad)
Author: Christophe Leroy <christophe.leroy@....fr>
Date:   Thu Feb 21 19:08:40 2019 +0000

    powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.
    
    Now that mmu_mapin_ram() is able to handle other blocks
    than the one starting at 0, the WII can use it for all
    its blocks.
    
    Signed-off-by: Christophe Leroy <christophe.leroy@....fr>
    Signed-off-by: Michael Ellerman <mpe@...erman.id.au>

I also confirmed that reverting this commit resolves the issue in 5.4+.

Now, I don't understand why this is problematic or what is really
happening... Reverting is probably not the desired solution.

Thanks
Eddie


> 
> Assuming the problem is in arch/powerpc/ , you should get the result
> in approx 10 steps:
> 
> [root@...5610vm linux-powerpc]# git bisect start -- arch/powerpc/
> [root@...5610vm linux-powerpc]# git bisect bad v5.4
> [root@...5610vm linux-powerpc]# git bisect good v4.19
> Bisecting: 964 revisions left to test after this (roughly 10 steps)
> 
> 
> Christophe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ