lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <VI2PR04MB11147FA8008C118F375650623E8AE2@VI2PR04MB11147.eurprd04.prod.outlook.com>
Date: Thu, 3 Apr 2025 09:23:55 +0000
From: Carlos Song <carlos.song@....com>
To: Johannes Weiner <hannes@...xchg.org>
CC: "baolin.wang@...ux.alibaba.com" <baolin.wang@...ux.alibaba.com>,
	"ying.huang@...el.com" <ying.huang@...el.com>, "vbabka@...e.cz"
	<vbabka@...e.cz>, "david@...hat.com" <david@...hat.com>,
	"mgorman@...hsingularity.net" <mgorman@...hsingularity.net>, "ziy@...dia.com"
	<ziy@...dia.com>, "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, Brendan Jackman <jackmanb@...gle.com>
Subject: Re: Ask help about this patch c0cd6f557b90 "mm: page_alloc: fix
 freelist movement during block conversion"



> -----Original Message-----
> From: Johannes Weiner <hannes@...xchg.org>
> Sent: Thursday, April 3, 2025 3:44 AM
> To: Carlos Song <carlos.song@....com>
> Cc: baolin.wang@...ux.alibaba.com; ying.huang@...el.com; vbabka@...e.cz;
> david@...hat.com; mgorman@...hsingularity.net; ziy@...dia.com;
> akpm@...ux-foundation.org; linux-mm@...ck.org;
> linux-kernel@...r.kernel.org; Brendan Jackman <jackmanb@...gle.com>
> Subject: [EXT] Re: Ask help about this patch c0cd6f557b90 "mm: page_alloc: fix
> freelist movement during block conversion"
>
> Caution: This is an external email. Please take care when clicking links or
> opening attachments. When in doubt, report the message using the 'Report this
> email' button
>
>
> Hi Carlos,
>
> On Wed, Apr 02, 2025 at 11:31:58AM +0000, Carlos Song wrote:
> > Hi, all
> >
> > I found a 300ms~600ms IRQ off when writing 1Gb data to storage device at
> I.MX7d SDB board at Linux-kernel-v6.14.
> > From this discussion I find the regression root cause:
> > https://lore/
> > .kernel.org%2Flinux-mm%2FCAJuCfpGajtAP8-kw5B5mKmhfyq6Pn67%2BPJgMj
> BeozW
> >
> -qzjQMkw%40mail.gmail.com%2FT%2F&data=05%7C02%7Ccarlos.song%40nxp
> .com%
> >
> 7Cd7694a8c3df242bbfafb08dd721eca68%7C686ea1d3bc2b4c6fa92cd99c5c301
> 635%
> >
> 7C0%7C0%7C638792198771816986%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0
> eU1hcGkiO
> >
> nRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%
> >
> 3D%3D%7C0%7C%7C%7C&sdata=uLdowX74RLIC8Lwd0u9UGub1Lx7MI6YLT1e2
> ERF5VmY%3
> > D&reserved=0
>
> Thanks for the report!
>
> > 2. After add this patch: c0cd6f557b90 "mm: page_alloc: fix freelist movement
> during block conversion"
> > # tracer: irqsoff
> > #
> > # irqsoff latency trace v1.1.5 on 6.9.0-rc4-00116-gc0cd6f557b90 #
> > --------------------------------------------------------------------
> > # latency: 93635 us, #13758/13758, CPU#0 | (M:server VP:0, KP:0, SP:0 HP:0
> #P:2)
> > #    -----------------
> > #    | task: dd-764 (uid:0 nice:0 policy:0 rt_prio:0)
> > #    -----------------
> > #  => started at: _raw_spin_lock_irqsave
> > #  => ended at:   _raw_spin_unlock_irqrestore
> > #
> > #
> > #                    _------=> CPU#
> > #                   / _-----=> irqs-off/BH-disabled
> > #                  | / _----=> need-resched
> > #                  || / _---=> hardirq/softirq
> > #                  ||| / _--=> preempt-depth
> > #                  |||| / _-=> migrate-disable
> > #                  ||||| /     delay
> > #  cmd     pid     |||||| time  |   caller
> > #     \   /        ||||||  \    |    /
> >       dd-764       0d....    1us!: _raw_spin_lock_irqsave
> >       dd-764       0d....  206us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-764       0d....  209us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-764       0d....  210us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-764       0d....  213us+: steal_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-764       0d....  281us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-764       0d....  282us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-764       0d....  284us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-764       0d....  286us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-764       0d....  288us+: steal_suitable_fallback
> <-__rmqueue_pcplist
>
> This is the freelists being replenished with a loop over __rmqueue(). Two things
> stand out:
>
> 1. steal_suitable_fallback() is the expensive part. The patch in
>    question made this slightly worse because stealability is checked
>    up-front instead of just stealing optimistically like before. So
>    the pages in the block are iterated twice. This can explain some of
>    the issue, but not a 100x increase in lock hold time.
>
> 2. We're doing it *a lot*. And this is the likely culprit. Whereas
>    before the patch, we'd steal whole buddies and their remainders,
>    afterwards there is a lot more single page stealing when grabbing
>    the whole block fails. This means __rmqueue_smallest() fails more
>    often and we end up doing a lot more topdown fallback scans:
>
> >     dd-767       0d.... 2043us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2045us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2047us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2049us+: try_to_claim_block
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2101us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2103us+: try_to_claim_block
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2181us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2184us+: try_to_claim_block
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2220us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2222us+: try_to_claim_block
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2304us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2306us+: try_to_claim_block
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2365us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2367us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2368us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2370us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2372us+: try_to_claim_block
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2434us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2436us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2438us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2442us : __mod_zone_page_state
> <-__rmqueue_pcplist
>
> The __mod_zone_page_state() is the successful allocation after attempting to
> steal a few different blocks. If this had succeeded, it would have replenished the
> native freelist and we'd see another
> __mod_zone_page_state() quickly. Alas it failed:
>
> >       dd-767       0d.... 2445us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2446us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2448us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2450us+: try_to_claim_block
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2490us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2492us+: try_to_claim_block
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2548us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2550us+: try_to_claim_block
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2586us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2588us+: try_to_claim_block
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2652us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2654us+: try_to_claim_block
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2712us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2714us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2715us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2717us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2719us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2720us+: try_to_claim_block
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2778us : find_suitable_fallback
> <-__rmqueue_pcplist
> >       dd-767       0d.... 2780us : __mod_zone_page_state
> <-__rmqueue_pcplist
>
> ... and we go through the whole fallback spiel for the next page.
>
> We can definitely do better. rmqueue_bulk() holds the zone->lock the entire
> time, which means nobody else can modify the freelists underneath us. Once
> block claiming has failed, there is no point in trying it again for the next page.
>
> In fact, the recent kernel test bot report [1] appears to be related to this. It
> points to c2f6ea38fc1b640aa7a2e155cc1c0410ff91afa2 ("mm:
> page_alloc: don't steal single pages from biggest buddy"), a patch that further
> forces bottom-up freelist scans if block stealing fails.
>
> Attached is a patch that has __rmqueue() remember which fallback level it had
> to stoop to in order to succeed; for the next page, it restarts the search from
> there.
>
> I cannot reproduce Carlos' setup, but testing with lru-file-mmap-read from the
> kernel test bot, it shows a stark difference:
>
>          upstream          patched
> real     0m8.939s         0m5.546s
> user     0m2.617s         0m2.528s
> sys     0m52.885s        0m30.183s
>
> Trace points confirm that try_to_reclaim_block() is called about two orders of
> magnitudes less than before.
>
> [1]
> https://lore.kern/
> el.org%2Fall%2F202503271547.fc08b188-lkp%40intel.com%2F&data=05%7C02
> %7Ccarlos.song%40nxp.com%7Cd7694a8c3df242bbfafb08dd721eca68%7C686e
> a1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638792198771844055%7CUnk
> nown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsI
> lAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=
> DyKhob0zA5ujzeidYFYx9EdvvM5Kps3YDk94wsIfTG0%3D&reserved=0
>

Hi,

Thank you for your quick ack and help! After applied this patch, it improved well.
I apply this patch at this HEAD:
f0a16f536332 (tag: next-20250403, origin/master, origin/HEAD) Add linux-next specific files for 20250403

and do 10 times same test like what I have done before in IMX7D:
The IRQ off tracer shows the irq_off time 7~10ms. Is this what you expected?

#
# irqsoff latency trace v1.1.5 on 6.14.0-next-20250403-00003-gf9e8473ee91a
# --------------------------------------------------------------------
# latency: 8111 us, #4323/4323, CPU#0 | (M:NONE VP:0, KP:0, SP:0 HP:0 #P:2)
#    -----------------
#    | task: dd-820 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
#  => started at: __rmqueue_pcplist
#  => ended at:   _raw_spin_unlock_irqrestore
#
#
#                    _------=> CPU#
#                   / _-----=> irqs-off/BH-disabled
#                  | / _----=> need-resched
#                  || / _---=> hardirq/softirq
#                  ||| / _--=> preempt-depth
#                  |||| / _-=> migrate-disable
#                  ||||| /     delay
#  cmd     pid     |||||| time  |   caller
#     \   /        ||||||  \    |    /
      dd-820       0d....    1us : __rmqueue_pcplist
      dd-820       0d....    3us : _raw_spin_trylock <-__rmqueue_pcplist
      dd-820       0d....    7us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d....   11us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d....   13us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d....   15us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d....   17us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d....   19us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d....   21us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d....   23us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d....   25us : __mod_zone_page_state <-__rmqueue_pcplist
...
      dd-820       0d.... 1326us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1328us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1330us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1332us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1334us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1336us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1337us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1339us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1341us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1343us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1345us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1347us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1349us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1351us : __mod_zone_page_state <-__rmqueue_pcplist
...
      dd-820       0d.... 1660us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1662us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1664us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 1666us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 1668us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 1670us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 1672us+: try_to_claim_block <-__rmqueue_pcplist
      dd-820       0d.... 1727us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 1729us+: try_to_claim_block <-__rmqueue_pcplist
      dd-820       0d.... 1806us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 1807us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 1809us+: try_to_claim_block <-__rmqueue_pcplist
      dd-820       0d.... 1854us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 1856us+: try_to_claim_block <-__rmqueue_pcplist
      dd-820       0d.... 1893us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 1895us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 1896us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 1898us+: try_to_claim_block <-__rmqueue_pcplist
      dd-820       0d.... 1954us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 1956us+: try_to_claim_block <-__rmqueue_pcplist
      dd-820       0d.... 2012us : find_suitable_fallback <-__rmqueue_pcplist
...
     dd-820       0d.... 8077us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 8079us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 8081us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 8083us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 8084us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 8086us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 8088us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 8089us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 8091us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 8093us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 8095us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 8097us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 8098us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 8100us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 8102us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 8104us : find_suitable_fallback <-__rmqueue_pcplist
      dd-820       0d.... 8105us : __mod_zone_page_state <-__rmqueue_pcplist
      dd-820       0d.... 8107us : _raw_spin_unlock_irqrestore <-__rmqueue_pcplist
      dd-820       0d.... 8110us : _raw_spin_unlock_irqrestore
      dd-820       0d.... 8113us+: trace_hardirqs_on <-_raw_spin_unlock_irqrestore
      dd-820       0d.... 8156us : <stack trace>
 => get_page_from_freelist
 => __alloc_frozen_pages_noprof
 => __folio_alloc_noprof
 => __filemap_get_folio
 => iomap_write_begin
 => iomap_file_buffered_write
 => blkdev_write_iter
 => vfs_write
 => ksys_write
 => ret_fast_syscall


Carlos Song
> ---
>
> From 13433454403e0c6f99ccc3b76c609034fe47e41c Mon Sep 17 00:00:00
> 2001
> From: Johannes Weiner <hannes@...xchg.org>
> Date: Wed, 2 Apr 2025 14:23:53 -0400
> Subject: [PATCH] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
>
> Not-yet-signed-off-by: Johannes Weiner <hannes@...xchg.org>
> ---
>  mm/page_alloc.c | 100 +++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 74 insertions(+), 26 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> f51aa6051a99..03b0d45ed45a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2194,11 +2194,11 @@ try_to_claim_block(struct zone *zone, struct page
> *page,
>   * The use of signed ints for order and current_order is a deliberate
>   * deviation from the rest of this file, to make the for loop
>   * condition simpler.
> - *
> - * Return the stolen page, or NULL if none can be found.
>   */
> +
> +/* Try to claim a whole foreign block, take a page, expand the
> +remainder */
>  static __always_inline struct page *
> -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
> +__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
>                                                 unsigned int
> alloc_flags)  {
>         struct free_area *area;
> @@ -2236,14 +2236,26 @@ __rmqueue_fallback(struct zone *zone, int order,
> int start_migratetype,
>                 page = try_to_claim_block(zone, page, current_order, order,
>                                           start_migratetype,
> fallback_mt,
>                                           alloc_flags);
> -               if (page)
> -                       goto got_one;
> +               if (page) {
> +                       trace_mm_page_alloc_extfrag(page, order,
> current_order,
> +
> start_migratetype, fallback_mt);
> +                       return page;
> +               }
>         }
>
> -       if (alloc_flags & ALLOC_NOFRAGMENT)
> -               return NULL;
> +       return NULL;
> +}
> +
> +/* Try to steal a single page from a foreign block */ static
> +__always_inline struct page * __rmqueue_steal(struct zone *zone, int
> +order, int start_migratetype) {
> +       struct free_area *area;
> +       int current_order;
> +       struct page *page;
> +       int fallback_mt;
> +       bool claim_block;
>
> -       /* No luck claiming pageblock. Find the smallest fallback page */
>         for (current_order = order; current_order < NR_PAGE_ORDERS;
> current_order++) {
>                 area = &(zone->free_area[current_order]);
>                 fallback_mt = find_suitable_fallback(area, current_order,
> @@ -2253,25 +2265,28 @@ __rmqueue_fallback(struct zone *zone, int order,
> int start_migratetype,
>
>                 page = get_page_from_free_area(area, fallback_mt);
>                 page_del_and_expand(zone, page, order, current_order,
> fallback_mt);
> -               goto got_one;
> +               trace_mm_page_alloc_extfrag(page, order, current_order,
> +                                           start_migratetype,
> fallback_mt);
> +               return page;
>         }
>
>         return NULL;
> -
> -got_one:
> -       trace_mm_page_alloc_extfrag(page, order, current_order,
> -               start_migratetype, fallback_mt);
> -
> -       return page;
>  }
>
> +enum rmqueue_mode {
> +       RMQUEUE_NORMAL,
> +       RMQUEUE_CMA,
> +       RMQUEUE_CLAIM,
> +       RMQUEUE_STEAL,
> +};
> +
>  /*
>   * Do the hard work of removing an element from the buddy allocator.
>   * Call me with the zone->lock already held.
>   */
>  static __always_inline struct page *
>  __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> -                                               unsigned int
> alloc_flags)
> +         unsigned int alloc_flags, enum rmqueue_mode *mode)
>  {
>         struct page *page;
>
> @@ -2290,16 +2305,47 @@ __rmqueue(struct zone *zone, unsigned int order,
> int migratetype,
>                 }
>         }
>
> -       page = __rmqueue_smallest(zone, order, migratetype);
> -       if (unlikely(!page)) {
> -               if (alloc_flags & ALLOC_CMA)
> +       /*
> +        * Try the different freelists, native then foreign.
> +        *
> +        * The fallback logic is expensive and rmqueue_bulk() calls in
> +        * a loop with the zone->lock held, meaning the freelists are
> +        * not subject to any outside changes. Remember in *mode where
> +        * we found pay dirt, to save us the search on the next call.
> +        */
> +       switch (*mode) {
> +       case RMQUEUE_NORMAL:
> +               page = __rmqueue_smallest(zone, order, migratetype);
> +               if (page)
> +                       return page;
> +               fallthrough;
> +       case RMQUEUE_CMA:
> +               if (alloc_flags & ALLOC_CMA) {
>                         page = __rmqueue_cma_fallback(zone, order);
> -
> -               if (!page)
> -                       page = __rmqueue_fallback(zone, order,
> migratetype,
> -                                                 alloc_flags);
> +                       if (page) {
> +                               *mode = RMQUEUE_CMA;
> +                               return page;
> +                       }
> +               }
> +               fallthrough;
> +       case RMQUEUE_CLAIM:
> +               page = __rmqueue_claim(zone, order, migratetype,
> alloc_flags);
> +               if (page) {
> +                       /* Replenished native freelist, back to normal
> mode */
> +                       *mode = RMQUEUE_NORMAL;
> +                       return page;
> +               }
> +               fallthrough;
> +       case RMQUEUE_STEAL:
> +               if (!(alloc_flags & ALLOC_NOFRAGMENT)) {
> +                       page = __rmqueue_steal(zone, order,
> migratetype);
> +                       if (page) {
> +                               *mode = RMQUEUE_STEAL;
> +                               return page;
> +                       }
> +               }
>         }
> -       return page;
> +       return NULL;
>  }
>
>  /*
> @@ -2311,6 +2357,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned
> int order,
>                         unsigned long count, struct list_head *list,
>                         int migratetype, unsigned int alloc_flags)  {
> +       enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
>         unsigned long flags;
>         int i;
>
> @@ -2321,7 +2368,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned
> int order,
>         }
>         for (i = 0; i < count; ++i) {
>                 struct page *page = __rmqueue(zone, order, migratetype,
> -
> alloc_flags);
> +                                             alloc_flags, &rmqm);
>                 if (unlikely(page == NULL))
>                         break;
>
> @@ -2934,6 +2981,7 @@ struct page *rmqueue_buddy(struct zone
> *preferred_zone, struct zone *zone,  {
>         struct page *page;
>         unsigned long flags;
> +       enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
>
>         do {
>                 page = NULL;
> @@ -2945,7 +2993,7 @@ struct page *rmqueue_buddy(struct zone
> *preferred_zone, struct zone *zone,
>                 if (alloc_flags & ALLOC_HIGHATOMIC)
>                         page = __rmqueue_smallest(zone, order,
> MIGRATE_HIGHATOMIC);
>                 if (!page) {
> -                       page = __rmqueue(zone, order, migratetype,
> alloc_flags);
> +                       page = __rmqueue(zone, order, migratetype,
> + alloc_flags, &rmqm);
>
>                         /*
>                          * If the allocation fails, allow OOM handling and
> --
> 2.49.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ