lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 6 Mar 2019 10:14:47 +0000
From:   Guillaume Tucker <>
To:     Dan Williams <>
Cc:     Andrew Morton <>,
        Michal Hocko <>,
        Mark Brown <>,
        Tomeu Vizoso <>,
        Matt Hart <>,
        Stephen Rothwell <>,,, Nicholas Piggin <>,
        Dominik Brodowski <>,
        Masahiro Yamada <>,
        Kees Cook <>,
        Adrian Reber <>,
        Linux Kernel Mailing List <>,
        Johannes Weiner <>,
        Linux MM <>,
        Mathieu Desnoyers <>,
        Richard Guy Briggs <>,
        "Peter Zijlstra (Intel)" <>,
Subject: Re: next/master boot bisection: next-20190215 on beaglebone-black

On 01/03/2019 23:23, Dan Williams wrote:
> On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
> <> wrote:
>> On 01/03/2019 20:41, Andrew Morton wrote:
>>> On Fri, 1 Mar 2019 09:25:24 +0100 Guillaume Tucker <> wrote:
>>>>>>> Michal had asked if the free space accounting fix up addressed this
>>>>>>> boot regression? I was awaiting word on that.
>>>>>> hm, does actually read emails?  Let's try info@ as well..
>>>> is not person, it's a send-only account for
>>>> automated reports.  So no, it doesn't read emails.
>>>> I guess the tricky point here is that the authors of the commits
>>>> found by bisections may not always have the hardware needed to
>>>> reproduce the problem.  So it needs to be dealt with on a
>>>> case-by-case basis: sometimes they do have the hardware,
>>>> sometimes someone else on the list or on CC does, and sometimes
>>>> it's better for the people who have access to the test lab which
>>>> ran the KernelCI test to deal with it.
>>>> This case seems to fall into the last category.  As I have access
>>>> to the Collabora lab, I can do some quick checks to confirm
>>>> whether the proposed patch does fix the issue.  I hadn't realised
>>>> that someone was waiting for this to happen, especially as the
>>>> BeagleBone Black is a very common platform.  Sorry about that,
>>>> I'll take a look today.
>>>> It may be a nice feature to be able to give access to the
>>>> KernelCI test infrastructure to anyone who wants to debug an
>>>> issue reported by KernelCI or verify a fix, so they won't need to
>>>> have the hardware locally.  Something to think about for the
>>>> future.
>>> Thanks, that all sounds good.
>>>>>> Is it possible to determine whether this regression is still present in
>>>>>> current linux-next?
>>>> I'll try to re-apply the patch that caused the issue, then see if
>>>> the suggested change fixes it.  As far as the current linux-next
>>>> master branch is concerned, KernelCI boot tests are passing fine
>>>> on that platform.
>>> They would, because I dropped
>>> mm-shuffle-default-enable-all-shuffling.patch, so your tests presumably
>>> now have shuffling disabled.
>>> Is it possible to add the below to linux-next and try again?
>> I've actually already done that, and essentially the issue can
>> still be reproduced by applying that patch.  See this branch:
>> next-20190301 boots fine but the head fails, using
>> multi_v7_defconfig + SMP=n in both cases and
>> SHUFFLE_PAGE_ALLOCATOR=y enabled in the 2nd case as a result
>> of the change in the default value.
>> The change suggested by Michal Hocko on Feb 15th has now been
>> applied in linux-next, it's part of this commit but as
>> explained above it does not actually resolve the boot failure:
>>   98cf198ee8ce mm: move buddy list manipulations into helpers
>> I can send more details on Monday and do a bit of debugging to
>> help narrowing down the problem.  Please let me know if
>> there's anything in particular that would seem be worth
>> trying.
> Thanks for taking a look!
> Some questions when you get a chance:
> Is there an early-printk facility that can be turned on to see how far
> we get in the boot?

Yes, I've done that now by enabling CONFIG_DEBUG_AM33XXUART1 and
earlyprintk in the command line.  Here's the result, with the
commit cherry picked on top of next-20190304:

[    1.379522] ti-sysc sysc_flags 00000222 != 00000022
[    1.396718] Unable to handle kernel paging request at virtual address 77bb4003
[    1.404203] pgd = (ptrval)
[    1.406971] [77bb4003] *pgd=00000000
[    1.410650] Internal error: Oops: 5 [#1] ARM
[    1.672310] [<c07051a0>] (clk_hw_create_clk.part.21) from [<c06fea34>] (devm_clk_get+0x4c/0x80)
[    1.681232] [<c06fea34>] (devm_clk_get) from [<c064253c>] (sysc_probe+0x28c/0xde4)

It's always failing at that point in the code.  Also when
enabling "debug" on the kernel command line, the issue goes
away (exact same binaries etc..):

For the record, here's the branch I've been using:

The board otherwise boots fine with next-20190304 (SMP=n), and
also with the patch applied but the shuffle configs set to n.

> Do any of the QEMU machine types [1] approximate this board? I.e. so I
> might be able to independently debug.

Unfortunately there doesn't appear to be any QEMU machine
emulating the TI AM335x SoC or the BeagleBone Black board.

> Were there any boot *successes* on ARM with shuffling enabled? I.e.
> clues about what's different about the specific memory setup for
> beagle-bone-black.

Looking at the KernelCI results from next-20190215, it looks like
only the BeagleBone Black with SMP=n failed to boot:

Of course that's not all the ARM boards that exist out there, but
it's a fairly large coverage already.

As the kernel panic always seems to originate in ti-sysc.c,
there's a chance it's only visible on that platform...  I'm doing
a KernelCI run now with my test branch to double check that,
it'll take a few hours so I'll send an update later if I get
anything useful out of it.

In the meantime, I'm happy to try out other things with more
debug configs turned on or any potential fixes someone might


> Thanks for the help!
> [1]:

Powered by blists - more mailing lists