[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4hMNiiM11ULjbOnOf=9N=yCABCRsAYLpjXs+98bRoRpCA@mail.gmail.com>
Date: Fri, 1 Mar 2019 15:23:58 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: Guillaume Tucker <guillaume.tucker@...labora.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...e.com>,
Mark Brown <broonie@...nel.org>,
Tomeu Vizoso <tomeu.vizoso@...labora.com>,
Matt Hart <matthew.hart@...aro.org>,
Stephen Rothwell <sfr@...b.auug.org.au>, khilman@...libre.com,
enric.balletbo@...labora.com, Nicholas Piggin <npiggin@...il.com>,
Dominik Brodowski <linux@...inikbrodowski.net>,
Masahiro Yamada <yamada.masahiro@...ionext.com>,
Kees Cook <keescook@...omium.org>,
Adrian Reber <adrian@...as.de>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Johannes Weiner <hannes@...xchg.org>,
Linux MM <linux-mm@...ck.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Richard Guy Briggs <rgb@...hat.com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>, info@...nelci.org
Subject: Re: next/master boot bisection: next-20190215 on beaglebone-black
On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
<guillaume.tucker@...labora.com> wrote:
>
> On 01/03/2019 20:41, Andrew Morton wrote:
> > On Fri, 1 Mar 2019 09:25:24 +0100 Guillaume Tucker <guillaume.tucker@...labora.com> wrote:
> >
> >>>>> Michal had asked if the free space accounting fix up addressed this
> >>>>> boot regression? I was awaiting word on that.
> >>>>
> >>>> hm, does bot@...nelci.org actually read emails? Let's try info@ as well..
> >>
> >> bot@...nelci.org is not person, it's a send-only account for
> >> automated reports. So no, it doesn't read emails.
> >>
> >> I guess the tricky point here is that the authors of the commits
> >> found by bisections may not always have the hardware needed to
> >> reproduce the problem. So it needs to be dealt with on a
> >> case-by-case basis: sometimes they do have the hardware,
> >> sometimes someone else on the list or on CC does, and sometimes
> >> it's better for the people who have access to the test lab which
> >> ran the KernelCI test to deal with it.
> >>
> >> This case seems to fall into the last category. As I have access
> >> to the Collabora lab, I can do some quick checks to confirm
> >> whether the proposed patch does fix the issue. I hadn't realised
> >> that someone was waiting for this to happen, especially as the
> >> BeagleBone Black is a very common platform. Sorry about that,
> >> I'll take a look today.
> >>
> >> It may be a nice feature to be able to give access to the
> >> KernelCI test infrastructure to anyone who wants to debug an
> >> issue reported by KernelCI or verify a fix, so they won't need to
> >> have the hardware locally. Something to think about for the
> >> future.
> >
> > Thanks, that all sounds good.
> >
> >>>> Is it possible to determine whether this regression is still present in
> >>>> current linux-next?
> >>
> >> I'll try to re-apply the patch that caused the issue, then see if
> >> the suggested change fixes it. As far as the current linux-next
> >> master branch is concerned, KernelCI boot tests are passing fine
> >> on that platform.
> >
> > They would, because I dropped
> > mm-shuffle-default-enable-all-shuffling.patch, so your tests presumably
> > now have shuffling disabled.
> >
> > Is it possible to add the below to linux-next and try again?
>
> I've actually already done that, and essentially the issue can
> still be reproduced by applying that patch. See this branch:
>
> https://gitlab.collabora.com/gtucker/linux/commits/next-20190301-beaglebone-black-debug
>
> next-20190301 boots fine but the head fails, using
> multi_v7_defconfig + SMP=n in both cases and
> SHUFFLE_PAGE_ALLOCATOR=y enabled in the 2nd case as a result
> of the change in the default value.
>
> The change suggested by Michal Hocko on Feb 15th has now been
> applied in linux-next, it's part of this commit but as
> explained above it does not actually resolve the boot failure:
>
> 98cf198ee8ce mm: move buddy list manipulations into helpers
>
> I can send more details on Monday and do a bit of debugging to
> help narrowing down the problem. Please let me know if
> there's anything in particular that would seem be worth
> trying.
>
Thanks for taking a look!
Some questions when you get a chance:
Is there an early-printk facility that can be turned on to see how far
we get in the boot?
Do any of the QEMU machine types [1] approximate this board? I.e. so I
might be able to independently debug.
Were there any boot *successes* on ARM with shuffling enabled? I.e.
clues about what's different about the specific memory setup for
beagle-bone-black.
Thanks for the help!
[1]: https://wiki.qemu.org/Documentation/Platforms/ARM
Powered by blists - more mailing lists