[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2b6c77bd-bead-7bfb-bf07-63e9ca837c58@suse.cz>
Date: Thu, 12 Jan 2023 12:59:06 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: "Kirill A. Shutemov" <kirill@...temov.name>
Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Borislav Petkov <bp@...en8.de>,
Andy Lutomirski <luto@...nel.org>,
Sean Christopherson <seanjc@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Joerg Roedel <jroedel@...e.de>,
Ard Biesheuvel <ardb@...nel.org>,
Andi Kleen <ak@...ux.intel.com>,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
David Rientjes <rientjes@...gle.com>,
Tom Lendacky <thomas.lendacky@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Ingo Molnar <mingo@...hat.com>,
Dario Faggioli <dfaggioli@...e.com>,
Dave Hansen <dave.hansen@...el.com>,
Mike Rapoport <rppt@...nel.org>,
David Hildenbrand <david@...hat.com>,
Mel Gorman <mgorman@...hsingularity.net>,
marcelo.cerri@...onical.com, tim.gardner@...onical.com,
khalid.elmously@...onical.com, philip.cox@...onical.com,
aarcange@...hat.com, peterx@...hat.com, x86@...nel.org,
linux-mm@...ck.org, linux-coco@...ts.linux.dev,
linux-efi@...r.kernel.org, linux-kernel@...r.kernel.org,
Mike Rapoport <rppt@...ux.ibm.com>
Subject: Re: [PATCHv8 02/14] mm: Add support for unaccepted memory
On 12/24/22 17:46, Kirill A. Shutemov wrote:
> On Fri, Dec 09, 2022 at 11:23:50PM +0100, Vlastimil Babka wrote:
>> On 12/9/22 20:26, Kirill A. Shutemov wrote:
>> >> > #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>> >> > /*
>> >> > * Watermark failed for this zone, but see if we can
>> >> > @@ -4299,6 +4411,9 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
>> >> >
>> >> > return page;
>> >> > } else {
>> >> > + if (try_to_accept_memory(zone))
>> >> > + goto try_this_zone;
>> >>
>> >> On the other hand, here we failed the full rmqueue(), including the
>> >> potentially fragmenting fallbacks, so I'm worried that before we finally
>> >> fail all of that and resort to accepting more memory, we already fragmented
>> >> the already accepted memory, more than necessary.
>> >
>> > I'm not sure I follow. We accept memory in pageblock chunks. Do we want to
>> > allocate from a free pageblock if we have other memory to tap from? It
>> > doesn't make sense to me.
>>
>> The fragmentation avoidance based on migratetype does work with pageblock
>> granularity, so yeah, if you accept a single pageblock worth of memory and
>> then (through __rmqueue_fallback()) end up serving both movable and
>> unmovable allocations from it, the whole fragmentation avoidance mechanism
>> is defeated and you end up with unmovable allocations (e.g. page tables)
>> scattered over many pageblocks and inability to allocate any huge pages.
>>
>> >> So one way to prevent would be to move the acceptance into rmqueue() to
>> >> happen before __rmqueue_fallback(), which I originally had in mind and maybe
>> >> suggested that previously.
>> >
>> > I guess it should be pretty straight forward to fail __rmqueue_fallback()
>> > if there's non-empty unaccepted_pages list and steer to
>> > try_to_accept_memory() this way.
>>
>> That could be a way indeed. We do have ALLOC_NOFRAGMENT which could be
>> possible to employ here.
>> But maybe the zone_watermark_fast() modification would be simpler yet
>> sufficient. It makes sense to me that we'd try to keep a high watermark
>> worth of pre-accepted memory. zone_watermark_fast() would fail at low
>> watermark, so we could try accepting (high-low) at a time instead of single
>> pageblock.
>
> Looks like we already have __zone_watermark_unusable_free() that seems
> match use-case rather closely. We only need switch unaccepted memory to
> per-zone accounting.
Could work. I'd still suggest also making try_to_accept_memory() to accept
up to high watermark, not a single pageblock.
> The fixup below suppose to do the trick, but I'm not sure how to test
> fragmentation avoidance properly.
>
> Any suggestions?
Haven't done that for years, maybe Mel knows better. But from what I
remember, I'd compare /proc/pagetypeinfo with and without memory accepting,
and collect the mm_page_alloc_extfrag tracepoint. If there are more of these
events happening, it's bad. Ideally with a workload that stresses both
userspace (movable) allocations and kernel allocations. Again, Mel might
have suggestions for a mmtest?
>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index ca6f0590be21..1bd2d245edee 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -483,7 +483,7 @@ static ssize_t node_read_meminfo(struct device *dev,
> #endif
> #ifdef CONFIG_UNACCEPTED_MEMORY
> ,
> - nid, K(node_page_state(pgdat, NR_UNACCEPTED))
> + nid, K(sum_zone_node_page_state(nid, NR_UNACCEPTED))
> #endif
> );
> len += hugetlb_report_node_meminfo(buf, len, nid);
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index 789b77c7b6df..e9c05b4c457c 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -157,7 +157,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>
> #ifdef CONFIG_UNACCEPTED_MEMORY
> show_val_kb(m, "Unaccepted: ",
> - global_node_page_state(NR_UNACCEPTED));
> + global_zone_page_state(NR_UNACCEPTED));
> #endif
>
> hugetlb_report_meminfo(m);
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 9c762e8175fc..8b5800cd4424 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -152,6 +152,9 @@ enum zone_stat_item {
> NR_ZSPAGES, /* allocated in zsmalloc */
> #endif
> NR_FREE_CMA_PAGES,
> +#ifdef CONFIG_UNACCEPTED_MEMORY
> + NR_UNACCEPTED,
> +#endif
> NR_VM_ZONE_STAT_ITEMS };
>
> enum node_stat_item {
> @@ -198,9 +201,6 @@ enum node_stat_item {
> NR_FOLL_PIN_ACQUIRED, /* via: pin_user_page(), gup flag: FOLL_PIN */
> NR_FOLL_PIN_RELEASED, /* pages returned via unpin_user_page() */
> NR_KERNEL_STACK_KB, /* measured in KiB */
> -#ifdef CONFIG_UNACCEPTED_MEMORY
> - NR_UNACCEPTED,
> -#endif
> #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
> NR_KERNEL_SCS_KB, /* measured in KiB */
> #endif
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e80e8d398863..404b267332a9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1779,7 +1779,7 @@ static bool try_to_accept_memory(struct zone *zone)
>
> migratetype = get_pfnblock_migratetype(page, page_to_pfn(page));
> __mod_zone_freepage_state(zone, -1 << order, migratetype);
> - __mod_node_page_state(page_pgdat(page), NR_UNACCEPTED, -1 << order);
> + __mod_zone_page_state(zone, NR_UNACCEPTED, -1 << order);
> spin_unlock_irqrestore(&zone->lock, flags);
>
> if (last)
> @@ -1808,7 +1808,7 @@ static void __free_unaccepted(struct page *page, unsigned int order)
> migratetype = get_pfnblock_migratetype(page, page_to_pfn(page));
> list_add_tail(&page->lru, &zone->unaccepted_pages);
> __mod_zone_freepage_state(zone, 1 << order, migratetype);
> - __mod_node_page_state(page_pgdat(page), NR_UNACCEPTED, 1 << order);
> + __mod_zone_page_state(zone, NR_UNACCEPTED, 1 << order);
> spin_unlock_irqrestore(&zone->lock, flags);
>
> if (first)
> @@ -4074,6 +4074,9 @@ static inline long __zone_watermark_unusable_free(struct zone *z,
> if (!(alloc_flags & ALLOC_CMA))
> unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
> #endif
> +#ifdef CONFIG_UNACCEPTED_MEMORY
> + unusable_free += zone_page_state(z, NR_UNACCEPTED);
> +#endif
>
> return unusable_free;
> }
Powered by blists - more mailing lists