linux-kernel - Re: [PATCH] mm/page_alloc: skip THP-sized PCP list when allocating non-CMA THP-sized page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9b227c9d-f59b-a8b0-b353-7876a56c0bde@126.com>
Date: Tue, 18 Jun 2024 09:34:06 +0800
From: yangge1116 <yangge1116@....com>
To: Barry Song <21cnbao@...il.com>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, baolin.wang@...ux.alibaba.com,
 liuzixing@...on.cn
Subject: Re: [PATCH] mm/page_alloc: skip THP-sized PCP list when allocating
 non-CMA THP-sized page



在 2024/6/17 下午8:47, yangge1116 写道:
> 
> 
> 在 2024/6/17 下午6:26, Barry Song 写道:
>> On Tue, Jun 4, 2024 at 9:15 PM <yangge1116@....com> wrote:
>>>
>>> From: yangge <yangge1116@....com>
>>>
>>> Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for
>>> THP-sized allocations") no longer differentiates the migration type
>>> of pages in THP-sized PCP list, it's possible to get a CMA page from
>>> the list, in some cases, it's not acceptable, for example, allocating
>>> a non-CMA page with PF_MEMALLOC_PIN flag returns a CMA page.
>>>
>>> The patch forbids allocating non-CMA THP-sized page from THP-sized
>>> PCP list to avoid the issue above.
>>
>> Could you please describe the impact on users in the commit log?
> 
> If a large number of CMA memory are configured in the system (for 
> example, the CMA memory accounts for 50% of the system memory), starting 
> virtual machine with device passthrough will get stuck.
> 
> During starting virtual machine, it will call pin_user_pages_remote(..., 
> FOLL_LONGTERM, ...) to pin memory. If a page is in CMA area, 
> pin_user_pages_remote() will migrate the page from CMA area to non-CMA 
> area because of FOLL_LONGTERM flag. If non-movable allocation requests 
> return CMA memory, pin_user_pages_remote() will enter endless loops.
> 
> backtrace:
> pin_user_pages_remote
> ----__gup_longterm_locked //cause endless loops in this function
> --------__get_user_pages_locked
> --------check_and_migrate_movable_pages //always check fail and continue 
> to migrate
> ------------migrate_longterm_unpinnable_pages
> ----------------alloc_migration_target // non-movable allocation
> 
>> Is it possible that some CMA memory might be used by non-movable
>> allocation requests? 
> 
> Yes.
> 
> 
>> If so, will CMA somehow become unable to migrate, causing cma_alloc() 
>> to fail?
> 
> 
> No, it will cause endless loops in __gup_longterm_locked(). If 
> non-movable allocation requests return CMA memory, 
> migrate_longterm_unpinnable_pages() will migrate a CMA page to another 
> CMA page, which is useless and cause endless loops in 
> __gup_longterm_locked().
> 
> backtrace:
> pin_user_pages_remote
> ----__gup_longterm_locked //cause endless loops in this function
> --------__get_user_pages_locked
> --------check_and_migrate_movable_pages //always check fail and continue 
> to migrate
> ------------migrate_longterm_unpinnable_pages
> 
> 
> 
> 
> 
>>>
>>> Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for 
>>> THP-sized allocations")
>>> Signed-off-by: yangge <yangge1116@....com>
>>> ---
>>>   mm/page_alloc.c | 10 ++++++++++
>>>   1 file changed, 10 insertions(+)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 2e22ce5..0bdf471 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -2987,10 +2987,20 @@ struct page *rmqueue(struct zone 
>>> *preferred_zone,
>>>          WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
>>>
>>>          if (likely(pcp_allowed_order(order))) {
>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> +               if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & 
>>> ALLOC_CMA ||
>>> +                                               order != 
>>> HPAGE_PMD_ORDER) {
>>> +                       page = rmqueue_pcplist(preferred_zone, zone, 
>>> order,
>>> +                                               migratetype, 
>>> alloc_flags);
>>> +                       if (likely(page))
>>> +                               goto out;
>>> +               }
>>
>> This seems not ideal, because non-CMA THP gets no chance to use PCP. 
>> But it
>> still seems better than causing the failure of CMA allocation.
>>
>> Is there a possible approach to avoiding adding CMA THP into pcp from 
>> the first
>> beginning? Otherwise, we might need a separate PCP for CMA.
>>

The vast majority of THP-sized allocations are GFP_MOVABLE, avoiding 
adding CMA THP into pcp may incur a slight performance penalty.

Commit 1d91df85f399 takes a similar approach to filter, and I mainly 
refer to it.


>>> +#else
>>>                  page = rmqueue_pcplist(preferred_zone, zone, order,
>>>                                         migratetype, alloc_flags);
>>>                  if (likely(page))
>>>                          goto out;
>>> +#endif
>>>          }
>>>
>>>          page = rmqueue_buddy(preferred_zone, zone, order, alloc_flags,
>>> -- 
>>> 2.7.4
>>
>> Thanks
>> Barry
>>