lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6A1CCFDF-CC38-4B29-A514-3D500EA04B16@nvidia.com>
Date: Tue, 26 Mar 2024 10:53:56 -0400
From: Zi Yan <ziy@...dia.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>,
 Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-mm@...ck.org, Matthew Wilcox <willy@...radead.org>,
 Yang Shi <shy828301@...il.com>, Huang Ying <ying.huang@...el.com>,
 "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
 Ryan Roberts <ryan.roberts@....com>, "Yin, Fengwei" <fengwei.yin@...el.com>,
 SeongJae Park <sj@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5] mm/migrate: split source folio if it is on deferred
 split list

On 26 Mar 2024, at 10:42, Baolin Wang wrote:

> On 2024/3/26 21:26, Zi Yan wrote:
>> On 26 Mar 2024, at 2:19, Baolin Wang wrote:
>>
>>> On 2024/3/23 03:33, Zi Yan wrote:
>>>> From: Zi Yan <ziy@...dia.com>
>>>>
>>>> If the source folio is on deferred split list, it is likely some subpages
>>>> are not used. Split it before migration to avoid migrating unused subpages.
>>>>
>>>> Commit 616b8371539a6 ("mm: thp: enable thp migration in generic path")
>>>> did not check if a THP is on deferred split list before migration, thus,
>>>> the destination THP is never put on deferred split list even if the source
>>>> THP might be. The opportunity of reclaiming free pages in a partially
>>>> mapped THP during deferred list scanning is lost, but no other harmful
>>>> consequence is present[1].
>>>>
>>>>   From v4:
>>>> 1. Simplify _deferred_list check without locking and do not count as
>>>>      migration failures. (per Matthew Wilcox)
>>>>
>>>>   From v3:
>>>> 1. Guarded deferred list code behind CONFIG_TRANSPARENT_HUGEPAGE to avoid
>>>>      compilation error (per SeongJae Park).
>>>>
>>>>   From v2:
>>>> 1. Split the source folio instead of migrating it (per Matthew Wilcox)[2].
>>>>
>>>>   From v1:
>>>> 1. Used dst to get correct deferred split list after migration
>>>>      (per Ryan Roberts).
>>>>
>>>> [1]: https://lore.kernel.org/linux-mm/03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com/
>>>> [2]: https://lore.kernel.org/linux-mm/Ze_P6xagdTbcu1Kz@casper.infradead.org/
>>>>
>>>> Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
>>>> Signed-off-by: Zi Yan <ziy@...dia.com>
>>>> ---
>>>>    mm/migrate.c | 23 +++++++++++++++++++++++
>>>>    1 file changed, 23 insertions(+)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index ab9856f5931b..6bd9319624a3 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1652,6 +1652,29 @@ static int migrate_pages_batch(struct list_head *from,
>>>>     			cond_resched();
>>>>   +			/*
>>>> +			 * The rare folio on the deferred split list should
>>>> +			 * be split now. It should not count as a failure.
>>>> +			 * Only check it without removing it from the list.
>>>> +			 * Since the folio can be on deferred_split_scan()
>>>> +			 * local list and removing it can cause the local list
>>>> +			 * corruption. Folio split process below can handle it
>>>> +			 * with the help of folio_ref_freeze().
>>>> +			 *
>>>> +			 * nr_pages > 2 is needed to avoid checking order-1
>>>> +			 * page cache folios. They exist, in contrast to
>>>> +			 * non-existent order-1 anonymous folios, and do not
>>>> +			 * use _deferred_list.
>>>> +			 */
>>>> +			if (nr_pages > 2 &&
>>>> +			   !list_empty(&folio->_deferred_list)) {
>>>> +				if (try_split_folio(folio, from) == 0) {
>>>
>>> IMO, we should move the split folios into the 'split_folios' list instead of the 'from' list, otherwise there might be unhandled folios remaining in the from list.
>>
>> Can you elaborate on the actual situation you are thinking about? Thanks.
>
> Sure.
>
> Suppose there is only one large folio in the from list that needs to be migrated, and this large folio is in the _deferred_list, which means it needs to be split. Your patch will re-add the split base pages back into the 'from' list. However, please see the list_for_each_entry_safe macro:
>
> #define list_for_each_entry_safe(pos, n, head, member)			\
> 	for (pos = list_first_entry(head, typeof(*pos), member),	\
> 		n = list_next_entry(pos, member);			\
> 	     !list_entry_is_head(pos, head, member); 			\
> 	     pos = n, n = list_next_entry(n, member))
>
> It will terminate the iteration early because the next entry 'n' taken out in advance is already the head, leading to the remaining split base pages still in the from list. This can cause the following crash when I did some migration testing:
>
> [  412.576943] ------------[ cut here ]------------
> [  412.576947] kernel BUG at mm/migrate.c:2634!
> [  412.577132] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [  412.577201] CPU: 59 PID: 9581 Comm: numa01 Kdump: loaded Tainted: G          E      6.9.0-rc1+ #69
> ........
> [  412.578651] Call Trace:
> [  412.578692]  <TASK>
> [  412.578730]  ? die+0x33/0x90
> [  412.578770]  ? do_trap+0xdf/0x110
> [  412.578815]  ? migrate_misplaced_folio+0x1f2/0x2b0
> [  412.578875]  ? do_error_trap+0x65/0x80
> [  412.578922]  ? migrate_misplaced_folio+0x1f2/0x2b0
> [  412.578977]  ? exc_invalid_op+0x4e/0x70
> [  412.579048]  ? migrate_misplaced_folio+0x1f2/0x2b0
> [  412.579131]  ? asm_exc_invalid_op+0x16/0x20
> [  412.579182]  ? migrate_misplaced_folio+0x1f2/0x2b0
> [  412.579255]  do_numa_page+0x205/0x5b0
> [  412.579305]  __handle_mm_fault+0x2b0/0x6c0
> [  412.579354]  handle_mm_fault+0x105/0x270
> [  412.579404]  do_user_addr_fault+0x214/0x6b0
> [  412.579453]  exc_page_fault+0x64/0x140
> [  412.579509]  asm_exc_page_fault+0x22/0x30
>
> 2583 int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
> 2584                             int node)
> 2585 {
> 		......
>
> 2628         if (nr_succeeded) {
> 2629                 count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
> 2630                 if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node))
> 2631                         mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
> 2632                                             nr_succeeded);
> 2633         }
> 2634         BUG_ON(!list_empty(&migratepages));
> 2635         return isolated;
> 2636
> 2637 out:

Got it. Thanks.

>
> After changing as below, the system crash issue is gone.
>
> +++ b/mm/migrate.c
> @@ -1668,7 +1668,7 @@ static int migrate_pages_batch(struct list_head *from,
>                          */
>                         if (nr_pages > 2 &&
>                            !list_empty(&folio->_deferred_list)) {
> -                               if (try_split_folio(folio, from) == 0) {
> +                               if (try_split_folio(folio, split_folios) == 0) {
>                                         stats->nr_thp_split += is_thp;
>                                         stats->nr_split++;
>                                         continue;

Let me resend with this fix.

--
Best Regards,
Yan, Zi

Download attachment "signature.asc" of type "application/pgp-signature" (855 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ