[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<VI2PR04MB11147BF60BD14C843A1371611E8B72@VI2PR04MB11147.eurprd04.prod.outlook.com>
Date: Thu, 10 Apr 2025 07:03:02 +0000
From: Carlos Song <carlos.song@....com>
To: Johannes Weiner <hannes@...xchg.org>, Andrew Morton
<akpm@...ux-foundation.org>
CC: Vlastimil Babka <vbabka@...e.cz>, Brendan Jackman <jackmanb@...gle.com>,
Mel Gorman <mgorman@...hsingularity.net>, "linux-mm@...ck.org"
<linux-mm@...ck.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, kernel test robot <oliver.sang@...el.com>,
"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: RE: [EXT] [PATCH 1/2] mm: page_alloc: speed up fallbacks in
rmqueue_bulk()
Hi,
That is a nice fix! I test it at imx7d.
Tested-by: Carlos Song <carlos.song@....com>
> -----Original Message-----
> From: Johannes Weiner <hannes@...xchg.org>
> Sent: Tuesday, April 8, 2025 2:02 AM
> To: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Vlastimil Babka <vbabka@...e.cz>; Brendan Jackman
> <jackmanb@...gle.com>; Mel Gorman <mgorman@...hsingularity.net>;
> Carlos Song <carlos.song@....com>; linux-mm@...ck.org;
> linux-kernel@...r.kernel.org; kernel test robot <oliver.sang@...el.com>;
> stable@...r.kernel.org
> Subject: [EXT] [PATCH 1/2] mm: page_alloc: speed up fallbacks in
> rmqueue_bulk()
>
> Caution: This is an external email. Please take care when clicking links or
> opening attachments. When in doubt, report the message using the 'Report this
> email' button
>
>
> The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal single
> pages from biggest buddy") as the root cause of a 56.4% regression in
> vm-scalability::lru-file-mmap-read.
>
> Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix freelist
> movement during block conversion"), as the root cause for a regression in
> worst-case zone->lock+irqoff hold times.
>
> Both of these patches modify the page allocator's fallback path to be less greedy
> in an effort to stave off fragmentation. The flip side of this is that fallbacks are
> also less productive each time around, which means the fallback search can run
> much more frequently.
>
> Carlos' traces point to rmqueue_bulk() specifically, which tries to refill the
> percpu cache by allocating a large batch of pages in a loop. It highlights how
> once the native freelists are exhausted, the fallback code first scans orders
> top-down for whole blocks to claim, then falls back to a bottom-up search for the
> smallest buddy to steal.
> For the next batch page, it goes through the same thing again.
>
> This can be made more efficient. Since rmqueue_bulk() holds the
> zone->lock over the entire batch, the freelists are not subject to
> outside changes; when the search for a block to claim has already failed, there is
> no point in trying again for the next page.
>
> Modify __rmqueue() to remember the last successful fallback mode, and restart
> directly from there on the next rmqueue_bulk() iteration.
>
> Oliver confirms that this improves beyond the regression that the test robot
> reported against c2f6ea38fc1b:
>
> commit:
> f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
> c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
> acc4d5ff0b ("Merge tag 'net-6.15-rc0' of
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
> 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <---
> your patch
>
> f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1
> acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
> ---------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change
> %stddev %change %stddev
> \ | \ |
> \ | \
> 25525364 ± 3% -56.4% 11135467 -57.8%
> 10779336 +31.6% 33581409
> vm-scalability.throughput
>
> Carlos confirms that worst-case times are almost fully recovered compared to
> before the earlier culprit patch:
>
> 2dd482ba627d (before freelist hygiene): 1ms
> c0cd6f557b90 (after freelist hygiene): 90ms
> next-20250319 (steal smallest buddy): 280ms
> this patch : 8ms
>
> Reported-by: kernel test robot <oliver.sang@...el.com>
> Reported-by: Carlos Song <carlos.song@....com>
> Tested-by: kernel test robot <oliver.sang@...el.com>
> Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block
> conversion")
> Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest
> buddy")
> Closes:
> https://lore.kern/
> el.org%2Foe-lkp%2F202503271547.fc08b188-lkp%40intel.com&data=05%7C02
> %7Ccarlos.song%40nxp.com%7C3325bfa02b7942cca36508dd75fe4a3d%7C686
> ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638796457225141027%7CUn
> known%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCI
> sIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdat
> a=gl5xQ8OLJNIccLDsYgVejUC%2B9ZQrxmt%2FxkfQXsmDuNY%3D&reserved=0
> Cc: stable@...r.kernel.org # 6.10+
> Signed-off-by: Johannes Weiner <hannes@...xchg.org>
> ---
> mm/page_alloc.c | 100 +++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 74 insertions(+), 26 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> f51aa6051a99..03b0d45ed45a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2194,11 +2194,11 @@ try_to_claim_block(struct zone *zone, struct page
> *page,
> * The use of signed ints for order and current_order is a deliberate
> * deviation from the rest of this file, to make the for loop
> * condition simpler.
> - *
> - * Return the stolen page, or NULL if none can be found.
> */
> +
> +/* Try to claim a whole foreign block, take a page, expand the
> +remainder */
> static __always_inline struct page *
> -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
> +__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> unsigned int
> alloc_flags) {
> struct free_area *area;
> @@ -2236,14 +2236,26 @@ __rmqueue_fallback(struct zone *zone, int order,
> int start_migratetype,
> page = try_to_claim_block(zone, page, current_order, order,
> start_migratetype,
> fallback_mt,
> alloc_flags);
> - if (page)
> - goto got_one;
> + if (page) {
> + trace_mm_page_alloc_extfrag(page, order,
> current_order,
> +
> start_migratetype, fallback_mt);
> + return page;
> + }
> }
>
> - if (alloc_flags & ALLOC_NOFRAGMENT)
> - return NULL;
> + return NULL;
> +}
> +
> +/* Try to steal a single page from a foreign block */ static
> +__always_inline struct page * __rmqueue_steal(struct zone *zone, int
> +order, int start_migratetype) {
> + struct free_area *area;
> + int current_order;
> + struct page *page;
> + int fallback_mt;
> + bool claim_block;
>
> - /* No luck claiming pageblock. Find the smallest fallback page */
> for (current_order = order; current_order < NR_PAGE_ORDERS;
> current_order++) {
> area = &(zone->free_area[current_order]);
> fallback_mt = find_suitable_fallback(area, current_order,
> @@ -2253,25 +2265,28 @@ __rmqueue_fallback(struct zone *zone, int order,
> int start_migratetype,
>
> page = get_page_from_free_area(area, fallback_mt);
> page_del_and_expand(zone, page, order, current_order,
> fallback_mt);
> - goto got_one;
> + trace_mm_page_alloc_extfrag(page, order, current_order,
> + start_migratetype,
> fallback_mt);
> + return page;
> }
>
> return NULL;
> -
> -got_one:
> - trace_mm_page_alloc_extfrag(page, order, current_order,
> - start_migratetype, fallback_mt);
> -
> - return page;
> }
>
> +enum rmqueue_mode {
> + RMQUEUE_NORMAL,
> + RMQUEUE_CMA,
> + RMQUEUE_CLAIM,
> + RMQUEUE_STEAL,
> +};
> +
> /*
> * Do the hard work of removing an element from the buddy allocator.
> * Call me with the zone->lock already held.
> */
> static __always_inline struct page *
> __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> - unsigned int
> alloc_flags)
> + unsigned int alloc_flags, enum rmqueue_mode *mode)
> {
> struct page *page;
>
> @@ -2290,16 +2305,47 @@ __rmqueue(struct zone *zone, unsigned int order,
> int migratetype,
> }
> }
>
> - page = __rmqueue_smallest(zone, order, migratetype);
> - if (unlikely(!page)) {
> - if (alloc_flags & ALLOC_CMA)
> + /*
> + * Try the different freelists, native then foreign.
> + *
> + * The fallback logic is expensive and rmqueue_bulk() calls in
> + * a loop with the zone->lock held, meaning the freelists are
> + * not subject to any outside changes. Remember in *mode where
> + * we found pay dirt, to save us the search on the next call.
> + */
> + switch (*mode) {
> + case RMQUEUE_NORMAL:
> + page = __rmqueue_smallest(zone, order, migratetype);
> + if (page)
> + return page;
> + fallthrough;
> + case RMQUEUE_CMA:
> + if (alloc_flags & ALLOC_CMA) {
> page = __rmqueue_cma_fallback(zone, order);
> -
> - if (!page)
> - page = __rmqueue_fallback(zone, order,
> migratetype,
> - alloc_flags);
> + if (page) {
> + *mode = RMQUEUE_CMA;
> + return page;
> + }
> + }
> + fallthrough;
> + case RMQUEUE_CLAIM:
> + page = __rmqueue_claim(zone, order, migratetype,
> alloc_flags);
> + if (page) {
> + /* Replenished native freelist, back to normal
> mode */
> + *mode = RMQUEUE_NORMAL;
> + return page;
> + }
> + fallthrough;
> + case RMQUEUE_STEAL:
> + if (!(alloc_flags & ALLOC_NOFRAGMENT)) {
> + page = __rmqueue_steal(zone, order,
> migratetype);
> + if (page) {
> + *mode = RMQUEUE_STEAL;
> + return page;
> + }
> + }
> }
> - return page;
> + return NULL;
> }
>
> /*
> @@ -2311,6 +2357,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned
> int order,
> unsigned long count, struct list_head *list,
> int migratetype, unsigned int alloc_flags) {
> + enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
> unsigned long flags;
> int i;
>
> @@ -2321,7 +2368,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned
> int order,
> }
> for (i = 0; i < count; ++i) {
> struct page *page = __rmqueue(zone, order, migratetype,
> -
> alloc_flags);
> + alloc_flags, &rmqm);
> if (unlikely(page == NULL))
> break;
>
> @@ -2934,6 +2981,7 @@ struct page *rmqueue_buddy(struct zone
> *preferred_zone, struct zone *zone, {
> struct page *page;
> unsigned long flags;
> + enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
>
> do {
> page = NULL;
> @@ -2945,7 +2993,7 @@ struct page *rmqueue_buddy(struct zone
> *preferred_zone, struct zone *zone,
> if (alloc_flags & ALLOC_HIGHATOMIC)
> page = __rmqueue_smallest(zone, order,
> MIGRATE_HIGHATOMIC);
> if (!page) {
> - page = __rmqueue(zone, order, migratetype,
> alloc_flags);
> + page = __rmqueue(zone, order, migratetype,
> + alloc_flags, &rmqm);
>
> /*
> * If the allocation fails, allow OOM handling and
> --
> 2.49.0
Powered by blists - more mailing lists