linux-kernel - RE: [EXT] [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <VI2PR04MB11147BF60BD14C843A1371611E8B72@VI2PR04MB11147.eurprd04.prod.outlook.com>
Date: Thu, 10 Apr 2025 07:03:02 +0000
From: Carlos Song <carlos.song@....com>
To: Johannes Weiner <hannes@...xchg.org>, Andrew Morton
	<akpm@...ux-foundation.org>
CC: Vlastimil Babka <vbabka@...e.cz>, Brendan Jackman <jackmanb@...gle.com>,
	Mel Gorman <mgorman@...hsingularity.net>, "linux-mm@...ck.org"
	<linux-mm@...ck.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, kernel test robot <oliver.sang@...el.com>,
	"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: RE: [EXT] [PATCH 1/2] mm: page_alloc: speed up fallbacks in
 rmqueue_bulk()

Hi,

That is a nice fix! I test it at imx7d.

Tested-by: Carlos Song <carlos.song@....com>


> -----Original Message-----
> From: Johannes Weiner <hannes@...xchg.org>
> Sent: Tuesday, April 8, 2025 2:02 AM
> To: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Vlastimil Babka <vbabka@...e.cz>; Brendan Jackman
> <jackmanb@...gle.com>; Mel Gorman <mgorman@...hsingularity.net>;
> Carlos Song <carlos.song@....com>; linux-mm@...ck.org;
> linux-kernel@...r.kernel.org; kernel test robot <oliver.sang@...el.com>;
> stable@...r.kernel.org
> Subject: [EXT] [PATCH 1/2] mm: page_alloc: speed up fallbacks in
> rmqueue_bulk()
>
> Caution: This is an external email. Please take care when clicking links or
> opening attachments. When in doubt, report the message using the 'Report this
> email' button
>
>
> The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal single
> pages from biggest buddy") as the root cause of a 56.4% regression in
> vm-scalability::lru-file-mmap-read.
>
> Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix freelist
> movement during block conversion"), as the root cause for a regression in
> worst-case zone->lock+irqoff hold times.
>
> Both of these patches modify the page allocator's fallback path to be less greedy
> in an effort to stave off fragmentation. The flip side of this is that fallbacks are
> also less productive each time around, which means the fallback search can run
> much more frequently.
>
> Carlos' traces point to rmqueue_bulk() specifically, which tries to refill the
> percpu cache by allocating a large batch of pages in a loop. It highlights how
> once the native freelists are exhausted, the fallback code first scans orders
> top-down for whole blocks to claim, then falls back to a bottom-up search for the
> smallest buddy to steal.
> For the next batch page, it goes through the same thing again.
>
> This can be made more efficient. Since rmqueue_bulk() holds the
> zone->lock over the entire batch, the freelists are not subject to
> outside changes; when the search for a block to claim has already failed, there is
> no point in trying again for the next page.
>
> Modify __rmqueue() to remember the last successful fallback mode, and restart
> directly from there on the next rmqueue_bulk() iteration.
>
> Oliver confirms that this improves beyond the regression that the test robot
> reported against c2f6ea38fc1b:
>
> commit:
>   f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
>   c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
>   acc4d5ff0b ("Merge tag 'net-6.15-rc0' of
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
>   2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()")   <---
> your patch
>
> f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1
> acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
> ---------------- --------------------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change
>  %stddev     %change         %stddev
>              \          |                \          |
> \          |                \
>   25525364 ±  3%     -56.4%   11135467           -57.8%
> 10779336           +31.6%   33581409
> vm-scalability.throughput
>
> Carlos confirms that worst-case times are almost fully recovered compared to
> before the earlier culprit patch:
>
>   2dd482ba627d (before freelist hygiene):    1ms
>   c0cd6f557b90  (after freelist hygiene):   90ms
>  next-20250319    (steal smallest buddy):  280ms
>     this patch                          :    8ms
>
> Reported-by: kernel test robot <oliver.sang@...el.com>
> Reported-by: Carlos Song <carlos.song@....com>
> Tested-by: kernel test robot <oliver.sang@...el.com>
> Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block
> conversion")
> Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest
> buddy")
> Closes:
> https://lore.kern/
> el.org%2Foe-lkp%2F202503271547.fc08b188-lkp%40intel.com&data=05%7C02
> %7Ccarlos.song%40nxp.com%7C3325bfa02b7942cca36508dd75fe4a3d%7C686
> ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638796457225141027%7CUn
> known%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCI
> sIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdat
> a=gl5xQ8OLJNIccLDsYgVejUC%2B9ZQrxmt%2FxkfQXsmDuNY%3D&reserved=0
> Cc: stable@...r.kernel.org      # 6.10+
> Signed-off-by: Johannes Weiner <hannes@...xchg.org>
> ---
>  mm/page_alloc.c | 100 +++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 74 insertions(+), 26 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> f51aa6051a99..03b0d45ed45a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2194,11 +2194,11 @@ try_to_claim_block(struct zone *zone, struct page
> *page,
>   * The use of signed ints for order and current_order is a deliberate
>   * deviation from the rest of this file, to make the for loop
>   * condition simpler.
> - *
> - * Return the stolen page, or NULL if none can be found.
>   */
> +
> +/* Try to claim a whole foreign block, take a page, expand the
> +remainder */
>  static __always_inline struct page *
> -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
> +__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
>                                                 unsigned int
> alloc_flags)  {
>         struct free_area *area;
> @@ -2236,14 +2236,26 @@ __rmqueue_fallback(struct zone *zone, int order,
> int start_migratetype,
>                 page = try_to_claim_block(zone, page, current_order, order,
>                                           start_migratetype,
> fallback_mt,
>                                           alloc_flags);
> -               if (page)
> -                       goto got_one;
> +               if (page) {
> +                       trace_mm_page_alloc_extfrag(page, order,
> current_order,
> +
> start_migratetype, fallback_mt);
> +                       return page;
> +               }
>         }
>
> -       if (alloc_flags & ALLOC_NOFRAGMENT)
> -               return NULL;
> +       return NULL;
> +}
> +
> +/* Try to steal a single page from a foreign block */ static
> +__always_inline struct page * __rmqueue_steal(struct zone *zone, int
> +order, int start_migratetype) {
> +       struct free_area *area;
> +       int current_order;
> +       struct page *page;
> +       int fallback_mt;
> +       bool claim_block;
>
> -       /* No luck claiming pageblock. Find the smallest fallback page */
>         for (current_order = order; current_order < NR_PAGE_ORDERS;
> current_order++) {
>                 area = &(zone->free_area[current_order]);
>                 fallback_mt = find_suitable_fallback(area, current_order,
> @@ -2253,25 +2265,28 @@ __rmqueue_fallback(struct zone *zone, int order,
> int start_migratetype,
>
>                 page = get_page_from_free_area(area, fallback_mt);
>                 page_del_and_expand(zone, page, order, current_order,
> fallback_mt);
> -               goto got_one;
> +               trace_mm_page_alloc_extfrag(page, order, current_order,
> +                                           start_migratetype,
> fallback_mt);
> +               return page;
>         }
>
>         return NULL;
> -
> -got_one:
> -       trace_mm_page_alloc_extfrag(page, order, current_order,
> -               start_migratetype, fallback_mt);
> -
> -       return page;
>  }
>
> +enum rmqueue_mode {
> +       RMQUEUE_NORMAL,
> +       RMQUEUE_CMA,
> +       RMQUEUE_CLAIM,
> +       RMQUEUE_STEAL,
> +};
> +
>  /*
>   * Do the hard work of removing an element from the buddy allocator.
>   * Call me with the zone->lock already held.
>   */
>  static __always_inline struct page *
>  __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> -                                               unsigned int
> alloc_flags)
> +         unsigned int alloc_flags, enum rmqueue_mode *mode)
>  {
>         struct page *page;
>
> @@ -2290,16 +2305,47 @@ __rmqueue(struct zone *zone, unsigned int order,
> int migratetype,
>                 }
>         }
>
> -       page = __rmqueue_smallest(zone, order, migratetype);
> -       if (unlikely(!page)) {
> -               if (alloc_flags & ALLOC_CMA)
> +       /*
> +        * Try the different freelists, native then foreign.
> +        *
> +        * The fallback logic is expensive and rmqueue_bulk() calls in
> +        * a loop with the zone->lock held, meaning the freelists are
> +        * not subject to any outside changes. Remember in *mode where
> +        * we found pay dirt, to save us the search on the next call.
> +        */
> +       switch (*mode) {
> +       case RMQUEUE_NORMAL:
> +               page = __rmqueue_smallest(zone, order, migratetype);
> +               if (page)
> +                       return page;
> +               fallthrough;
> +       case RMQUEUE_CMA:
> +               if (alloc_flags & ALLOC_CMA) {
>                         page = __rmqueue_cma_fallback(zone, order);
> -
> -               if (!page)
> -                       page = __rmqueue_fallback(zone, order,
> migratetype,
> -                                                 alloc_flags);
> +                       if (page) {
> +                               *mode = RMQUEUE_CMA;
> +                               return page;
> +                       }
> +               }
> +               fallthrough;
> +       case RMQUEUE_CLAIM:
> +               page = __rmqueue_claim(zone, order, migratetype,
> alloc_flags);
> +               if (page) {
> +                       /* Replenished native freelist, back to normal
> mode */
> +                       *mode = RMQUEUE_NORMAL;
> +                       return page;
> +               }
> +               fallthrough;
> +       case RMQUEUE_STEAL:
> +               if (!(alloc_flags & ALLOC_NOFRAGMENT)) {
> +                       page = __rmqueue_steal(zone, order,
> migratetype);
> +                       if (page) {
> +                               *mode = RMQUEUE_STEAL;
> +                               return page;
> +                       }
> +               }
>         }
> -       return page;
> +       return NULL;
>  }
>
>  /*
> @@ -2311,6 +2357,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned
> int order,
>                         unsigned long count, struct list_head *list,
>                         int migratetype, unsigned int alloc_flags)  {
> +       enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
>         unsigned long flags;
>         int i;
>
> @@ -2321,7 +2368,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned
> int order,
>         }
>         for (i = 0; i < count; ++i) {
>                 struct page *page = __rmqueue(zone, order, migratetype,
> -
> alloc_flags);
> +                                             alloc_flags, &rmqm);
>                 if (unlikely(page == NULL))
>                         break;
>
> @@ -2934,6 +2981,7 @@ struct page *rmqueue_buddy(struct zone
> *preferred_zone, struct zone *zone,  {
>         struct page *page;
>         unsigned long flags;
> +       enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
>
>         do {
>                 page = NULL;
> @@ -2945,7 +2993,7 @@ struct page *rmqueue_buddy(struct zone
> *preferred_zone, struct zone *zone,
>                 if (alloc_flags & ALLOC_HIGHATOMIC)
>                         page = __rmqueue_smallest(zone, order,
> MIGRATE_HIGHATOMIC);
>                 if (!page) {
> -                       page = __rmqueue(zone, order, migratetype,
> alloc_flags);
> +                       page = __rmqueue(zone, order, migratetype,
> + alloc_flags, &rmqm);
>
>                         /*
>                          * If the allocation fails, allow OOM handling and
> --
> 2.49.0