[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5db265de-b069-41c0-b8c2-a119dfb83485@linux.alibaba.com>
Date: Thu, 29 Jan 2026 10:27:09 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: Kairui Song <ryncsn@...il.com>, Chris Mason <clm@...a.com>,
linux-mm@...ck.org
Cc: Hugh Dickins <hughd@...gle.com>, Andrew Morton
<akpm@...ux-foundation.org>, Kemeng Shi <shikemeng@...weicloud.com>,
Nhat Pham <nphamcs@...il.com>, Chris Li <chrisl@...nel.org>,
Baoquan He <bhe@...hat.com>, Barry Song <baohua@...nel.org>,
linux-kernel@...r.kernel.org, Kairui Song <kasong@...cent.com>,
stable@...r.kernel.org
Subject: Re: [PATCH v3] mm/shmem, swap: fix race of truncate and swap entry
split
On 1/29/26 12:52 AM, Kairui Song wrote:
> On Wed, Jan 28, 2026 at 05:02:34AM +0800, Chris Mason wrote:
>>
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>
>> [ ... ]
>>
>>> @@ -1191,12 +1203,23 @@
>>> folio = fbatch.folios[i];
>>>
>>> if (xa_is_value(folio)) {
>>> + int order;
>>> long swaps_freed;
>>>
>>> if (unfalloc)
>>> continue;
>>> - swaps_freed = shmem_free_swap(mapping, indices[i], folio);
>>> + swaps_freed = shmem_free_swap(mapping, indices[i],
>>> + end - 1, folio);
>>> if (!swaps_freed) {
>>> + /*
>>> + * If found a large swap entry cross the end border,
>>> + * skip it as the truncate_inode_partial_folio above
>>> + * should have at least zerod its content once.
>>> + */
>>> + order = shmem_confirm_swap(mapping, indices[i],
>>> + radix_to_swp_entry(folio));
>>> + if (order > 0 && indices[i] + (1 << order) > end)
>>> + continue;
>>> /* Swap was replaced by page: retry */
>>> index = indices[i];
>>> break;
>>
>> Could this code infinite loop if a large swap entry crosses the START
>> boundary instead of the END boundary?
>
> Thanks for the review! It's really helpful :)
>
>> When shmem_free_swap() returns 0, it could be either because:
>> (a) The entry was replaced by a page, or
>> (b) The entry failed the boundary check (base < index ||
>> base + nr_pages - 1 > end)
>>
>> The boundary check here only handles case (b) for entries crossing the
>> END boundary:
>>
>> if (order > 0 && indices[i] + (1 << order) > end)
>> continue;
>>
>> But what happens if the entry crosses the START boundary? If
>> find_get_entries() returns a large swap entry at indices[i] where
>> the entry's base (calculated as indices[i] & ~((1 << order) - 1)) is
>> less than the truncation start point, then shmem_free_swap() will
>> return 0 due to the "base < index" check. The code will then call
>> shmem_confirm_swap(), get the order, check if it crosses the END
>> boundary (which it doesn't), and retry with the same index:
>>
>> index = indices[i];
>> break;
>>
>> The next iteration will find the same entry again at the same index,
>> leading to an infinite loop. For example:
>>
>> - Truncating range [18, 30]
>> - Large swap entry at [16, 23] (order 3, 8 pages)
>> - indices[i] = 18
>> - shmem_free_swap() sees base=16 < index=18, returns 0
>> - Check: 18 + 8 > 30 is false (26 <= 30)
>> - Retries with index=18
>> - Loop repeats indefinitely
>
> I think this is a valid issue. And it's worse than that, during the `while (index < end)` loop a new large entry can land anywhere in the range, if one interaction's starting `index` points to the middle of any large entry, an infinite loop will occur: indices[0] are always equal to the `index` iteration value of that moments, shmem_free_swap will fail because the swap entry's index doesn't match indices[0], and so the `index = indices[i]; break;` keep it loop forever.
>
> The chance seems very low though.
>
>> Should the boundary check also handle the START case, perhaps:
>>
>> if (order > 0) {
>> pgoff_t base = indices[i] & ~((1UL << order) - 1);
>> if (base + (1 << order) - 1 > end || base < start)
>> continue;
>> }
>
> This still doesn't cover the case when a new large entry somehow lands in the range during the loop.
>
>> where 'start' is preserved from before the loop?
>
> How about following patch:
>
> From 863f38c757ee0898b6b7f0f8c695f551a1380ce8 Mon Sep 17 00:00:00 2001
> From: Kairui Song <kasong@...cent.com>
> Date: Thu, 29 Jan 2026 00:19:23 +0800
> Subject: [PATCH] mm, shmem: prevent infinite loop on truncate race
>
> When truncating a large swap entry, shmem_free_swap() returns 0 when the
> entry's index doesn't match the given index due to lookup alignment. The
> failure fallback path checks if the entry crosses the end border and
> aborts when it happens, so truncate won't erase an unexpected entry or
> range. But one scenario was ignored.
>
> When `index` points to the middle of a large swap entry, and the large
> swap entry doesn't go across the end border, find_get_entries() will
> return that large swap entry as the first item in the batch with
> `indices[0]` equal to `index`. The entry's base index will be smaller
> than `indices[0]`, so shmem_free_swap() will fail and return 0 due to
> the "base < index" check. The code will then call shmem_confirm_swap(),
> get the order, check if it crosses the END boundary (which it doesn't),
> and retry with the same index.
>
> The next iteration will find the same entry again at the same index with
> same indices, leading to an infinite loop.
>
> Fix this by retrying with a round-down index, and abort if the index is
> smaller than the truncate range.
>
> Reported-by: Chris Mason <clm@...a.com>
> Closes: https://lore.kernel.org/linux-mm/20260128130336.727049-1-clm@meta.com/
> Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
> Fixes: 8a1968bd997f ("mm/shmem, swap: fix race of truncate and swap entry split")
> Signed-off-by: Kairui Song <kasong@...cent.com>
> ---
Thanks. The fix looks good to me.
Reviewed-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
(BTW, I think we can simplify the logic by moving the boundary
validation into shmem_free_swap() in the future).
> mm/shmem.c | 23 ++++++++++++++---------
> 1 file changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index b9ddd38621a0..fe3719eb5a3c 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1211,17 +1211,22 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, uoff_t lend,
> swaps_freed = shmem_free_swap(mapping, indices[i],
> end - 1, folio);
> if (!swaps_freed) {
> - /*
> - * If found a large swap entry cross the end border,
> - * skip it as the truncate_inode_partial_folio above
> - * should have at least zerod its content once.
> - */
> + pgoff_t base = indices[i];
> +
> order = shmem_confirm_swap(mapping, indices[i],
> radix_to_swp_entry(folio));
> - if (order > 0 && indices[i] + (1 << order) > end)
> - continue;
> - /* Swap was replaced by page: retry */
> - index = indices[i];
> + /*
> + * If found a large swap entry cross the end or start
> + * border, skip it as the truncate_inode_partial_folio
> + * above should have at least zerod its content once.
> + */
> + if (order > 0) {
> + base = round_down(base, 1 << order);
> + if (base < start || base + (1 << order) > end)
> + continue;
> + }
> + /* Swap was replaced by page or extended, retry */
> + index = base;
> break;
> }
> nr_swaps_freed += swaps_freed;
Powered by blists - more mailing lists