lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230915141610.GA104956@cmpxchg.org>
Date:   Fri, 15 Sep 2023 10:16:10 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        Zi Yan <ziy@...dia.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene

On Thu, Sep 14, 2023 at 04:52:38PM -0700, Mike Kravetz wrote:
> In next-20230913, I started hitting the following BUG.  Seems related
> to this series.  And, if series is reverted I do not see the BUG.
> 
> I can easily reproduce on a small 16G VM.  kernel command line contains
> "hugetlb_free_vmemmap=on hugetlb_cma=4G".  Then run the script,
> while true; do
>  echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>  echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote
>  echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> done
> 
> For the BUG below I believe it was the first (or second) 1G page creation from
> CMA that triggered:  cma_alloc of 1G.
> 
> Sorry, have not looked deeper into the issue.

Thanks for the report, and sorry about the breakage!

I was scratching my head at this:

                        /* MIGRATE_ISOLATE page should not go to pcplists */
                        VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);

because there is nothing in page isolation that prevents setting
MIGRATE_ISOLATE on something that's on the pcplist already. So why
didn't this trigger before already?

Then it clicked: it used to only check the *pcpmigratetype* determined
by free_unref_page(), which of course mustn't be MIGRATE_ISOLATE.

Pages that get isolated while *already* on the pcplist are fine, and
are handled properly:

                        mt = get_pcppage_migratetype(page);

                        /* MIGRATE_ISOLATE page should not go to pcplists */
                        VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);

                        /* Pageblock could have been isolated meanwhile */
                        if (unlikely(isolated_pageblocks))
                                mt = get_pageblock_migratetype(page);

So this was purely a sanity check against the pcpmigratetype cache
operations. With that gone, we can remove it.

---

>From b0cb92ed10b40fab0921002effa8b726df245790 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@...xchg.org>
Date: Fri, 15 Sep 2023 09:59:52 -0400
Subject: [PATCH] mm: page_alloc: remove pcppage migratetype caching fix

Mike reports the following crash in -next:

[   28.643019] page:ffffea0004fb4280 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13ed0a
[   28.645455] flags: 0x200000000000000(node=0|zone=2)
[   28.646835] page_type: 0xffffffff()
[   28.647886] raw: 0200000000000000 dead000000000100 dead000000000122 0000000000000000
[   28.651170] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   28.653124] page dumped because: VM_BUG_ON_PAGE(is_migrate_isolate(mt))
[   28.654769] ------------[ cut here ]------------
[   28.655972] kernel BUG at mm/page_alloc.c:1231!

This VM_BUG_ON() used to check that the cached pcppage_migratetype set
by free_unref_page() wasn't MIGRATE_ISOLATE.

When I removed the caching, I erroneously changed the assert to check
that no isolated pages are on the pcplist. This is quite different,
because pages can be isolated *after* they had been put on the
freelist already (which is handled just fine).

IOW, this was purely a sanity check on the migratetype caching. With
that gone, the check should have been removed as well. Do that now.

Reported-by: Mike Kravetz <mike.kravetz@...cle.com>
Signed-off-by: Johannes Weiner <hannes@...xchg.org>
---
 mm/page_alloc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e3f1c777feed..9469e4660b53 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1207,9 +1207,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			count -= nr_pages;
 			pcp->count -= nr_pages;
 
-			/* MIGRATE_ISOLATE page should not go to pcplists */
-			VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
-
 			__free_one_page(page, pfn, zone, order, mt, FPI_NONE);
 			trace_mm_page_pcpu_drain(page, order, mt);
 		} while (count > 0 && !list_empty(list));
-- 
2.42.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ