[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e3a3a3f-737c-ed01-f820-87efee0adc93@126.com>
Date: Mon, 17 Jun 2024 20:47:09 +0800
From: yangge1116 <yangge1116@....com>
To: Barry Song <21cnbao@...il.com>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, baolin.wang@...ux.alibaba.com,
liuzixing@...on.cn
Subject: Re: [PATCH] mm/page_alloc: skip THP-sized PCP list when allocating
non-CMA THP-sized page
在 2024/6/17 下午6:26, Barry Song 写道:
> On Tue, Jun 4, 2024 at 9:15 PM <yangge1116@....com> wrote:
>>
>> From: yangge <yangge1116@....com>
>>
>> Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for
>> THP-sized allocations") no longer differentiates the migration type
>> of pages in THP-sized PCP list, it's possible to get a CMA page from
>> the list, in some cases, it's not acceptable, for example, allocating
>> a non-CMA page with PF_MEMALLOC_PIN flag returns a CMA page.
>>
>> The patch forbids allocating non-CMA THP-sized page from THP-sized
>> PCP list to avoid the issue above.
>
> Could you please describe the impact on users in the commit log?
If a large number of CMA memory are configured in the system (for
example, the CMA memory accounts for 50% of the system memory), starting
virtual machine with device passthrough will get stuck.
During starting virtual machine, it will call pin_user_pages_remote(...,
FOLL_LONGTERM, ...) to pin memory. If a page is in CMA area,
pin_user_pages_remote() will migrate the page from CMA area to non-CMA
area because of FOLL_LONGTERM flag. If non-movable allocation requests
return CMA memory, pin_user_pages_remote() will enter endless loops.
backtrace:
pin_user_pages_remote
----__gup_longterm_locked //cause endless loops in this function
--------__get_user_pages_locked
--------check_and_migrate_movable_pages //always check fail and continue
to migrate
------------migrate_longterm_unpinnable_pages
----------------alloc_migration_target // non-movable allocation
> Is it possible that some CMA memory might be used by non-movable
> allocation requests?
Yes.
> If so, will CMA somehow become unable to migrate, causing cma_alloc() to fail?
No, it will cause endless loops in __gup_longterm_locked(). If
non-movable allocation requests return CMA memory,
migrate_longterm_unpinnable_pages() will migrate a CMA page to another
CMA page, which is useless and cause endless loops in
__gup_longterm_locked().
backtrace:
pin_user_pages_remote
----__gup_longterm_locked //cause endless loops in this function
--------__get_user_pages_locked
--------check_and_migrate_movable_pages //always check fail and continue
to migrate
------------migrate_longterm_unpinnable_pages
>>
>> Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations")
>> Signed-off-by: yangge <yangge1116@....com>
>> ---
>> mm/page_alloc.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 2e22ce5..0bdf471 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -2987,10 +2987,20 @@ struct page *rmqueue(struct zone *preferred_zone,
>> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
>>
>> if (likely(pcp_allowed_order(order))) {
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
>> + order != HPAGE_PMD_ORDER) {
>> + page = rmqueue_pcplist(preferred_zone, zone, order,
>> + migratetype, alloc_flags);
>> + if (likely(page))
>> + goto out;
>> + }
>
> This seems not ideal, because non-CMA THP gets no chance to use PCP. But it
> still seems better than causing the failure of CMA allocation.
>
> Is there a possible approach to avoiding adding CMA THP into pcp from the first
> beginning? Otherwise, we might need a separate PCP for CMA.
>
>> +#else
>> page = rmqueue_pcplist(preferred_zone, zone, order,
>> migratetype, alloc_flags);
>> if (likely(page))
>> goto out;
>> +#endif
>> }
>>
>> page = rmqueue_buddy(preferred_zone, zone, order, alloc_flags,
>> --
>> 2.7.4
>
> Thanks
> Barry
>
Powered by blists - more mailing lists