linux-kernel - Re: next/master boot bisection: next-20190215 on beaglebone-black

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPcyv4hMNiiM11ULjbOnOf=9N=yCABCRsAYLpjXs+98bRoRpCA@mail.gmail.com>
Date:   Fri, 1 Mar 2019 15:23:58 -0800
From:   Dan Williams <dan.j.williams@...el.com>
To:     Guillaume Tucker <guillaume.tucker@...labora.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...e.com>,
        Mark Brown <broonie@...nel.org>,
        Tomeu Vizoso <tomeu.vizoso@...labora.com>,
        Matt Hart <matthew.hart@...aro.org>,
        Stephen Rothwell <sfr@...b.auug.org.au>, khilman@...libre.com,
        enric.balletbo@...labora.com, Nicholas Piggin <npiggin@...il.com>,
        Dominik Brodowski <linux@...inikbrodowski.net>,
        Masahiro Yamada <yamada.masahiro@...ionext.com>,
        Kees Cook <keescook@...omium.org>,
        Adrian Reber <adrian@...as.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Linux MM <linux-mm@...ck.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Richard Guy Briggs <rgb@...hat.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>, info@...nelci.org
Subject: Re: next/master boot bisection: next-20190215 on beaglebone-black

On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
<guillaume.tucker@...labora.com> wrote:
>
> On 01/03/2019 20:41, Andrew Morton wrote:
> > On Fri, 1 Mar 2019 09:25:24 +0100 Guillaume Tucker <guillaume.tucker@...labora.com> wrote:
> >
> >>>>> Michal had asked if the free space accounting fix up addressed this
> >>>>> boot regression? I was awaiting word on that.
> >>>>
> >>>> hm, does bot@...nelci.org actually read emails?  Let's try info@ as well..
> >>
> >> bot@...nelci.org is not person, it's a send-only account for
> >> automated reports.  So no, it doesn't read emails.
> >>
> >> I guess the tricky point here is that the authors of the commits
> >> found by bisections may not always have the hardware needed to
> >> reproduce the problem.  So it needs to be dealt with on a
> >> case-by-case basis: sometimes they do have the hardware,
> >> sometimes someone else on the list or on CC does, and sometimes
> >> it's better for the people who have access to the test lab which
> >> ran the KernelCI test to deal with it.
> >>
> >> This case seems to fall into the last category.  As I have access
> >> to the Collabora lab, I can do some quick checks to confirm
> >> whether the proposed patch does fix the issue.  I hadn't realised
> >> that someone was waiting for this to happen, especially as the
> >> BeagleBone Black is a very common platform.  Sorry about that,
> >> I'll take a look today.
> >>
> >> It may be a nice feature to be able to give access to the
> >> KernelCI test infrastructure to anyone who wants to debug an
> >> issue reported by KernelCI or verify a fix, so they won't need to
> >> have the hardware locally.  Something to think about for the
> >> future.
> >
> > Thanks, that all sounds good.
> >
> >>>> Is it possible to determine whether this regression is still present in
> >>>> current linux-next?
> >>
> >> I'll try to re-apply the patch that caused the issue, then see if
> >> the suggested change fixes it.  As far as the current linux-next
> >> master branch is concerned, KernelCI boot tests are passing fine
> >> on that platform.
> >
> > They would, because I dropped
> > mm-shuffle-default-enable-all-shuffling.patch, so your tests presumably
> > now have shuffling disabled.
> >
> > Is it possible to add the below to linux-next and try again?
>
> I've actually already done that, and essentially the issue can
> still be reproduced by applying that patch.  See this branch:
>
>   https://gitlab.collabora.com/gtucker/linux/commits/next-20190301-beaglebone-black-debug
>
> next-20190301 boots fine but the head fails, using
> multi_v7_defconfig + SMP=n in both cases and
> SHUFFLE_PAGE_ALLOCATOR=y enabled in the 2nd case as a result
> of the change in the default value.
>
> The change suggested by Michal Hocko on Feb 15th has now been
> applied in linux-next, it's part of this commit but as
> explained above it does not actually resolve the boot failure:
>
>   98cf198ee8ce mm: move buddy list manipulations into helpers
>
> I can send more details on Monday and do a bit of debugging to
> help narrowing down the problem.  Please let me know if
> there's anything in particular that would seem be worth
> trying.
>

Thanks for taking a look!

Some questions when you get a chance:

Is there an early-printk facility that can be turned on to see how far
we get in the boot?

Do any of the QEMU machine types [1] approximate this board? I.e. so I
might be able to independently debug.

Were there any boot *successes* on ARM with shuffling enabled? I.e.
clues about what's different about the specific memory setup for
beagle-bone-black.

Thanks for the help!

[1]: https://wiki.qemu.org/Documentation/Platforms/ARM