[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190306140529.GG3549@rapoport-lnx>
Date:   Wed, 6 Mar 2019 16:05:40 +0200
From:   Mike Rapoport <rppt@...ux.ibm.com>
To:     Guillaume Tucker <guillaume.tucker@...labora.com>
Cc:     Dan Williams <dan.j.williams@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...e.com>,
        Mark Brown <broonie@...nel.org>,
        Tomeu Vizoso <tomeu.vizoso@...labora.com>,
        Matt Hart <matthew.hart@...aro.org>,
        Stephen Rothwell <sfr@...b.auug.org.au>, khilman@...libre.com,
        enric.balletbo@...labora.com, Nicholas Piggin <npiggin@...il.com>,
        Dominik Brodowski <linux@...inikbrodowski.net>,
        Masahiro Yamada <yamada.masahiro@...ionext.com>,
        Kees Cook <keescook@...omium.org>,
        Adrian Reber <adrian@...as.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Linux MM <linux-mm@...ck.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Richard Guy Briggs <rgb@...hat.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>, info@...nelci.org
Subject: Re: next/master boot bisection: next-20190215 on beaglebone-black
On Wed, Mar 06, 2019 at 10:14:47AM +0000, Guillaume Tucker wrote:
> On 01/03/2019 23:23, Dan Williams wrote:
> > On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
> > <guillaume.tucker@...labora.com> wrote:
> > 
> > Is there an early-printk facility that can be turned on to see how far
> > we get in the boot?
> 
> Yes, I've done that now by enabling CONFIG_DEBUG_AM33XXUART1 and
> earlyprintk in the command line.  Here's the result, with the
> commit cherry picked on top of next-20190304:
> 
>   https://lava.collabora.co.uk/scheduler/job/1526326
> 
> [    1.379522] ti-sysc 4804a000.target-module: sysc_flags 00000222 != 00000022
> [    1.396718] Unable to handle kernel paging request at virtual address 77bb4003
> [    1.404203] pgd = (ptrval)
> [    1.406971] [77bb4003] *pgd=00000000
> [    1.410650] Internal error: Oops: 5 [#1] ARM
> [...]
> [    1.672310] [<c07051a0>] (clk_hw_create_clk.part.21) from [<c06fea34>] (devm_clk_get+0x4c/0x80)
> [    1.681232] [<c06fea34>] (devm_clk_get) from [<c064253c>] (sysc_probe+0x28c/0xde4)
> 
> It's always failing at that point in the code.  Also when
> enabling "debug" on the kernel command line, the issue goes
> away (exact same binaries etc..):
> 
>   https://lava.collabora.co.uk/scheduler/job/1526327
> 
> For the record, here's the branch I've been using:
> 
>   https://gitlab.collabora.com/gtucker/linux/tree/beaglebone-black-next-20190304-debug
> 
> The board otherwise boots fine with next-20190304 (SMP=n), and
> also with the patch applied but the shuffle configs set to n.
> 
> > Were there any boot *successes* on ARM with shuffling enabled? I.e.
> > clues about what's different about the specific memory setup for
> > beagle-bone-black.
> 
> Looking at the KernelCI results from next-20190215, it looks like
> only the BeagleBone Black with SMP=n failed to boot:
> 
>   https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20190215/
> 
> Of course that's not all the ARM boards that exist out there, but
> it's a fairly large coverage already.
> 
> As the kernel panic always seems to originate in ti-sysc.c,
> there's a chance it's only visible on that platform...  I'm doing
> a KernelCI run now with my test branch to double check that,
> it'll take a few hours so I'll send an update later if I get
> anything useful out of it.
> 
> In the meantime, I'm happy to try out other things with more
> debug configs turned on or any potential fixes someone might
> have.
ARM is the only arch that sets ARCH_HAS_HOLES_MEMORYMODEL to 'y'. Maybe the
failure has something to do with it...
Guillaume, can you try this patch:
diff --git a/mm/shuffle.c b/mm/shuffle.c
index 3ce1248..4a04aac 100644
--- a/mm/shuffle.c
+++ b/mm/shuffle.c
@@ -58,7 +58,8 @@ module_param_call(shuffle, shuffle_store, shuffle_show, &shuffle_param, 0400);
  * For two pages to be swapped in the shuffle, they must be free (on a
  * 'free_area' lru), have the same order, and have the same migratetype.
  */
-static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order)
+static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order,
+						  struct zone *z)
 {
 	struct page *page;
 
@@ -80,6 +81,9 @@ static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order)
 	if (!PageBuddy(page))
 		return NULL;
 
+	if (!memmap_valid_within(pfn, page, z))
+		return NULL;
+
 	/*
 	 * ...is the page on the same list as the page we will
 	 * shuffle it with?
@@ -123,7 +127,7 @@ void __meminit __shuffle_zone(struct zone *z)
 		 * page_j randomly selected in the span @zone_start_pfn to
 		 * @spanned_pages.
 		 */
-		page_i = shuffle_valid_page(i, order);
+		page_i = shuffle_valid_page(i, order, z);
 		if (!page_i)
 			continue;
 
@@ -137,7 +141,7 @@ void __meminit __shuffle_zone(struct zone *z)
 			j = z->zone_start_pfn +
 				ALIGN_DOWN(get_random_long() % z->spanned_pages,
 						order_pages);
-			page_j = shuffle_valid_page(j, order);
+			page_j = shuffle_valid_page(j, order, z);
 			if (page_j && page_j != page_i)
 				break;
 		}
 
-- 
Sincerely yours,
Mike.
Powered by blists - more mailing lists
 
