lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 7 Aug 2019 01:24:57 +0000
From:   Chris Packham <Chris.Packham@...iedtelesis.co.nz>
To:     "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
        "christophe.leroy@....fr" <christophe.leroy@....fr>,
        "mpe@...erman.id.au" <mpe@...erman.id.au>,
        "npiggin@...il.com" <npiggin@...il.com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Grant McEwan" <grant.mcewan@...iedtelesis.co.nz>
Subject: Re: SMP lockup at boot on Freescale/NXP T2080 (powerpc 64)

On Wed, 2019-08-07 at 11:13 +1000, Michael Ellerman wrote:
> Chris Packham <Chris.Packham@...iedtelesis.co.nz> writes:
> > 
> > On Tue, 2019-08-06 at 21:32 +1000, Michael Ellerman wrote:
> > > 
> > > Chris Packham <Chris.Packham@...iedtelesis.co.nz> writes:
> > > > 
> > > > On Mon, 2019-08-05 at 14:06 +1200, Chris Packham wrote:
> > > > > 
> > > > > 
> > > > > Hi All,
> > > > > 
> > > > > I have a custom board that uses the Freescale/NXP T2080 SoC.
> > > > > 
> > > > > The board boots fine using v4.19.60 but when I use v5.1.21 it
> > > > > locks
> > > > > up
> > > > > waiting for the other CPUs to come online (earlyprintk output
> > > > > below).
> > > > > If I set maxcpus=0 then the system boots all the way through
> > > > > to
> > > > > userland. The same thing happens with 5.3-rc2.
> > > > > 
> > > > > The defconfig I'm using is 
> > > > > https://gist.github.com/cpackham/f24d0b426f3
> > > > > de0eaaba17b82c3528a9d it was updated from the working
> > > > > v4.19.60
> > > > > defconfig using make olddefconfig.
> > > > > 
> > > > > Does this ring any bells for anyone?
> > > > > 
> > > > > I haven't dug into the differences between the working an
> > > > > non-
> > > > > working
> > > > > versions yet. I'll start looking now.
> > > > I've bisected this to the following commit
> > > Thanks that's super helpful.
> > > 
> > > > 
> > > > 
> > > > commit ed1cd6deb013a11959d17a94e35ce159197632da
> > > > Author: Christophe Leroy <christophe.leroy@....fr>
> > > > Date:   Thu Jan 31 10:08:58 2019 +0000
> > > > 
> > > >     powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
> > > >     
> > > >     This patch activates CONFIG_THREAD_INFO_IN_TASK which
> > > >     moves the thread_info into task_struct.
> > > > 
> > > > I'll be the first to admit this is well beyond my area of
> > > > knowledge
> > > > so
> > > > I'm unsure what about this patch is problematic but I can be
> > > > fairly
> > > > sure that a build immediately before this patch works while a
> > > > build
> > > > with this patch hangs.
> > > It makes a pretty fundamental change to the way the kernel stores
> > > some
> > > information about each task, moving it off the stack and into the
> > > task
> > > struct.
> > > 
> > > It definitely has the potential to break things, but I thought we
> > > had
> > > reasonable test coverage of the Book3E platforms, I have a
> > > p5020ds
> > > (e5500) that I boot as part of my CI.
> > > 
> > > Aha. If I take your config and try to boot it on my p5020ds I get
> > > the
> > > same behaviour, stuck at SMP bringup. So it seems it's something
> > > in
> > > your
> > > config vs corenet64_smp_defconfig that is triggering the bug.
> > > 
> > > Can you try bisecting what in the config triggers it?
> > > 
> > > To do that you checkout ed1cd6deb013a11959d17a94e35ce159197632da,
> > > then
> > > you build/boot with corenet64_smp_defconfig to confirm it works.
> > > Then
> > > you use tools/testing/ktest/config-bisect.pl to bisect the
> > > changes in
> > > the .config.
> > > 
> > The difference between a working and non working defconfig is
> > CONFIG_PREEMPT specifically CONFIG_PREEMPT=y makes my system hang
> > at
> > boot.
> > 
> > Is that now intentionally prohibited on 64-bit powerpc?
> It's not prohibitied, but it probably should be because no one really
> tests it properly. I have a handful of IBM machines where I boot a
> PREEMPT kernel but that's about it.
> 
> The corenet configs don't have PREEMPT enabled, which suggests it was
> never really supported on those machines.
> 
> But maybe someone from NXP can tell me otherwise.
> 

I think our workloads need CONFIG_PREEMPT=y because our systems have
switch ASIC drivers implemented in userland and we need to be able to
react quickly to network events in order to prevent loops. We have seen
instances of this not happening simply because some other process is in
the middle of a syscall.

One thing I am working on here is a setup with a few vendor boards and
some of our own kit that we can test the upstream kernels on. Hopefully
that'd make these kinds of reports more timely rather than just
whenever we decide to move to a new kernel version.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ