lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <906590D4-04E2-40CB-A853-25FE6212700C@nvidia.com>
Date: Sat, 05 Jul 2025 21:34:20 -0400
From: Zi Yan <ziy@...dia.com>
To: Balbir Singh <balbirs@...dia.com>
Cc: linux-mm@...ck.org, akpm@...ux-foundation.org,
 linux-kernel@...r.kernel.org, Karol Herbst <kherbst@...hat.com>,
 Lyude Paul <lyude@...hat.com>, Danilo Krummrich <dakr@...nel.org>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 Jérôme Glisse <jglisse@...hat.com>,
 Shuah Khan <shuah@...nel.org>, David Hildenbrand <david@...hat.com>,
 Barry Song <baohua@...nel.org>, Baolin Wang <baolin.wang@...ux.alibaba.com>,
 Ryan Roberts <ryan.roberts@....com>, Matthew Wilcox <willy@...radead.org>,
 Peter Xu <peterx@...hat.com>, Kefeng Wang <wangkefeng.wang@...wei.com>,
 Jane Chu <jane.chu@...cle.com>, Alistair Popple <apopple@...dia.com>,
 Donet Tom <donettom@...ux.ibm.com>
Subject: Re: [v1 resend 08/12] mm/thp: add split during migration support

On 5 Jul 2025, at 21:15, Balbir Singh wrote:

> On 7/5/25 11:55, Zi Yan wrote:
>> On 4 Jul 2025, at 20:58, Balbir Singh wrote:
>>
>>> On 7/4/25 21:24, Zi Yan wrote:
>>>>
>>>> s/pages/folio
>>>>
>>>
>>> Thanks, will make the changes
>>>
>>>> Why name it isolated if the folio is unmapped? Isolated folios often mean
>>>> they are removed from LRU lists. isolated here causes confusion.
>>>>
>>>
>>> Ack, will change the name
>>>
>>>
>>>>>   *
>>>>>   * It calls __split_unmapped_folio() to perform uniform and non-uniform split.
>>>>>   * It is in charge of checking whether the split is supported or not and
>>>>> @@ -3800,7 +3799,7 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order,
>>>>>   */
>>>>>  static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>  		struct page *split_at, struct page *lock_at,
>>>>> -		struct list_head *list, bool uniform_split)
>>>>> +		struct list_head *list, bool uniform_split, bool isolated)
>>>>>  {
>>>>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>>>  	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
>>>>> @@ -3846,14 +3845,16 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>  		 * is taken to serialise against parallel split or collapse
>>>>>  		 * operations.
>>>>>  		 */
>>>>> -		anon_vma = folio_get_anon_vma(folio);
>>>>> -		if (!anon_vma) {
>>>>> -			ret = -EBUSY;
>>>>> -			goto out;
>>>>> +		if (!isolated) {
>>>>> +			anon_vma = folio_get_anon_vma(folio);
>>>>> +			if (!anon_vma) {
>>>>> +				ret = -EBUSY;
>>>>> +				goto out;
>>>>> +			}
>>>>> +			anon_vma_lock_write(anon_vma);
>>>>>  		}
>>>>>  		end = -1;
>>>>>  		mapping = NULL;
>>>>> -		anon_vma_lock_write(anon_vma);
>>>>>  	} else {
>>>>>  		unsigned int min_order;
>>>>>  		gfp_t gfp;
>>>>> @@ -3920,7 +3921,8 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>  		goto out_unlock;
>>>>>  	}
>>>>>
>>>>> -	unmap_folio(folio);
>>>>> +	if (!isolated)
>>>>> +		unmap_folio(folio);
>>>>>
>>>>>  	/* block interrupt reentry in xa_lock and spinlock */
>>>>>  	local_irq_disable();
>>>>> @@ -3973,14 +3975,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>
>>>>>  		ret = __split_unmapped_folio(folio, new_order,
>>>>>  				split_at, lock_at, list, end, &xas, mapping,
>>>>> -				uniform_split);
>>>>> +				uniform_split, isolated);
>>>>>  	} else {
>>>>>  		spin_unlock(&ds_queue->split_queue_lock);
>>>>>  fail:
>>>>>  		if (mapping)
>>>>>  			xas_unlock(&xas);
>>>>>  		local_irq_enable();
>>>>> -		remap_page(folio, folio_nr_pages(folio), 0);
>>>>> +		if (!isolated)
>>>>> +			remap_page(folio, folio_nr_pages(folio), 0);
>>>>>  		ret = -EAGAIN;
>>>>>  	}
>>>>
>>>> These "isolated" special handlings does not look good, I wonder if there
>>>> is a way of letting split code handle device private folios more gracefully.
>>>> It also causes confusions, since why does "isolated/unmapped" folios
>>>> not need to unmap_page(), remap_page(), or unlock?
>>>>
>>>>
>>>
>>> There are two reasons for going down the current code path
>>
>> After thinking more, I think adding isolated/unmapped is not the right
>> way, since unmapped folio is a very generic concept. If you add it,
>> one can easily misuse the folio split code by first unmapping a folio
>> and trying to split it with unmapped = true. I do not think that is
>> supported and your patch does not prevent that from happening in the future.
>>
>
> I don't understand the misuse case you mention, I assume you mean someone can
> get the usage wrong? The responsibility is on the caller to do the right thing
> if calling the API with unmapped

Before your patch, there is no use case of splitting unmapped folios.
Your patch only adds support for device private page split, not any unmapped
folio split. So using a generic isolated/unmapped parameter is not OK.

>
>> You should teach different parts of folio split code path to handle
>> device private folios properly. Details are below.
>>
>>>
>>> 1. if the isolated check is not present, folio_get_anon_vma will fail and cause
>>>    the split routine to return with -EBUSY
>>
>> You do something below instead.
>>
>> if (!anon_vma && !folio_is_device_private(folio)) {
>> 	ret = -EBUSY;
>> 	goto out;
>> } else if (anon_vma) {
>> 	anon_vma_lock_write(anon_vma);
>> }
>>
>
> folio_get_anon() cannot be called for unmapped folios. In our case the page has
> already been unmapped. Is there a reason why you mix anon_vma_lock_write with
> the check for device private folios?

Oh, I did not notice that anon_vma = folio_get_anon_vma(folio) is also
in if (!isolated) branch. In that case, just do

if (folio_is_device_private(folio) {
...
} else if (is_anon) {
...
} else {
...
}

>
>> People can know device private folio split needs a special handling.
>>
>> BTW, why a device private folio can also be anonymous? Does it mean
>> if a page cache folio is migrated to device private, kernel also
>> sees it as both device private and file-backed?
>>
>
> FYI: device private folios only work with anonymous private pages, hence
> the name device private.

OK.

>
>>
>>> 2. Going through unmap_page(), remap_page() causes a full page table walk, which
>>>    the migrate_device API has already just done as a part of the migration. The
>>>    entries under consideration are already migration entries in this case.
>>>    This is wasteful and in some case unexpected.
>>
>> unmap_folio() already adds TTU_SPLIT_HUGE_PMD to try to split
>> PMD mapping, which you did in migrate_vma_split_pages(). You probably
>> can teach either try_to_migrate() or try_to_unmap() to just split
>> device private PMD mapping. Or if that is not preferred,
>> you can simply call split_huge_pmd_address() when unmap_folio()
>> sees a device private folio.
>>
>> For remap_page(), you can simply return for device private folios
>> like it is currently doing for non anonymous folios.
>>
>
> Doing a full rmap walk does not make sense with unmap_folio() and
> remap_folio(), because
>
> 1. We need to do a page table walk/rmap walk again
> 2. We'll need special handling of migration <-> migration entries
>    in the rmap handling (set/remove migration ptes)
> 3. In this context, the code is already in the middle of migration,
>    so trying to do that again does not make sense.

Why doing split in the middle of migration? Existing split code
assumes to-be-split folios are mapped.

What prevents doing split before migration?

>
>
>>
>> For lru_add_split_folio(), you can skip it if a device private
>> folio is seen.
>>
>> Last, for unlock part, why do you need to keep all after-split folios
>> locked? It should be possible to just keep the to-be-migrated folio
>> locked and unlock the rest for a later retry. But I could miss something
>> since I am not familiar with device private migration code.
>>
>
> Not sure I follow this comment

Because the patch is doing split in the middle of migration and existing
split code never supports. My comment is based on the assumption that
the split is done when a folio is mapped.

--
Best Regards,
Yan, Zi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ