[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e4b9927-562d-4cfa-9362-f23e3bcfc454@linux.alibaba.com>
Date: Thu, 20 Feb 2025 17:27:19 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: Zi Yan <ziy@...dia.com>
Cc: Matthew Wilcox <willy@...radead.org>, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>, Kairui Song <kasong@...cent.com>,
Miaohe Lin <linmiaohe@...wei.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/2] mm/shmem: use xas_try_split() in
shmem_split_large_entry()
On 2025/2/20 17:07, Baolin Wang wrote:
>
>
> On 2025/2/20 00:10, Zi Yan wrote:
>> On 19 Feb 2025, at 5:04, Baolin Wang wrote:
>>
>>> Hi Zi,
>>>
>>> Sorry for the late reply due to being busy with other things:)
>>
>> Thank you for taking a look at the patches. :)
>>
>>>
>>> On 2025/2/19 07:54, Zi Yan wrote:
>>>> During shmem_split_large_entry(), large swap entries are covering n
>>>> slots
>>>> and an order-0 folio needs to be inserted.
>>>>
>>>> Instead of splitting all n slots, only the 1 slot covered by the folio
>>>> need to be split and the remaining n-1 shadow entries can be
>>>> retained with
>>>> orders ranging from 0 to n-1. This method only requires
>>>> (n/XA_CHUNK_SHIFT) new xa_nodes instead of (n % XA_CHUNK_SHIFT) *
>>>> (n/XA_CHUNK_SHIFT) new xa_nodes, compared to the original
>>>> xas_split_alloc() + xas_split() one.
>>>>
>>>> For example, to split an order-9 large swap entry (assuming
>>>> XA_CHUNK_SHIFT
>>>> is 6), 1 xa_node is needed instead of 8.
>>>>
>>>> xas_try_split_min_order() is used to reduce the number of calls to
>>>> xas_try_split() during split.
>>>
>>> For shmem swapin, if we cannot swap in the whole large folio by
>>> skipping the swap cache, we will split the large swap entry stored in
>>> the shmem mapping into order-0 swap entries, rather than splitting it
>>> into other orders of swap entries. This is because the next time we
>>> swap in a shmem folio through shmem_swapin_cluster(), it will still
>>> be an order 0 folio.
>>
>> Right. But the swapin is one folio at a time, right?
>> shmem_split_large_entry()
>
> Yes, now we always swapin an order-0 folio from the async swap device at
> a time. However, for sync swap device, we will skip the swapcache and
> swapin the whole large folio by commit 1dd44c0af4fa, so it will not call
> shmem_split_large_entry() in this case.
>
>> should split the large swap entry and give you a slot to store the
>> order-0 folio.
>> For example, with an order-9 large swap entry, to swap in first
>> order-0 folio,
>> the large swap entry will become order-0, order-0, order-1, order-2,…
>> order-8,
>> after the split. Then the first order-0 swap entry can be used.
>> Then, when a second order-0 is swapped in, the second order-0 can be
>> used.
>> When the last order-0 is swapped in, the order-8 would be split to
>> order-7,order-6,…,order-1,order-0, order-0, and the last order-0 will
>> be used.
>
> Yes, understood. However, for the sequential swapin scenarios, where
> originally only one split operation is needed. However, your approach
> increases the number of split operations. Of course, I understand that
> in non-sequential swapin scenarios, your patch will save some xarray
> memory. It might be necessary to evaluate whether the increased split
> operations will have a significant impact on the performance of
> sequential swapin?
>
>> Maybe the swapin assumes after shmem_split_large_entry(), all swap
>> entries
>> are order-0, which can lead to issues. There should be some check like
>> if the swap entry order > folio_order, shmem_split_large_entry() should
>> be used.
>>>
>>> Moreover I did a quick test with swapping in order 6 shmem folios,
>>> however, my test hung, and the console was continuously filled with
>>> the following information. It seems there are some issues with shmem
>>> swapin handling. Anyway, I need more time to debug and test.
>> To swap in order-6 folios, shmem_split_large_entry() does not allocate
>> any new xa_node, since XA_CHUNK_SHIFT is 6. It is weird to see OOM
>> error below. Let me know if there is anything I can help.
>
> I encountered some issues while testing order 4 and order 6 swapin with
> your patches. And I roughly reviewed the patch, and it seems that the
> new swap entry stored in the shmem mapping was not correctly updated
> after the split.
>
> The following logic is to reset the swap entry after split, and I assume
> that the large swap entry is always split to order 0 before. As your
> patch suggests, if a non-uniform split is used, then the logic for
> resetting the swap entry needs to be changed? Please correct me if I
> missed something.
>
> /*
> * Re-set the swap entry after splitting, and the swap
> * offset of the original large entry must be continuous.
> */
> for (i = 0; i < 1 << order; i++) {
> pgoff_t aligned_index = round_down(index, 1 << order);
> swp_entry_t tmp;
>
> tmp = swp_entry(swp_type(swap), swp_offset(swap) + i);
> __xa_store(&mapping->i_pages, aligned_index + i,
> swp_to_radix_entry(tmp), 0);
> }
In addition, after your patch, the shmem_split_large_entry() seems
always return 0 even though it splits a large swap entry, but we still
need re-calculate the swap entry value after splitting, otherwise it may
return errors due to shmem_confirm_swap() validation failure.
/*
* If the large swap entry has already been split, it is
* necessary to recalculate the new swap entry based on
* the old order alignment.
*/
if (split_order > 0) {
pgoff_t offset = index - round_down(index, 1 << split_order);
swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
}
Powered by blists - more mailing lists