lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5db265de-b069-41c0-b8c2-a119dfb83485@linux.alibaba.com>
Date: Thu, 29 Jan 2026 10:27:09 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: Kairui Song <ryncsn@...il.com>, Chris Mason <clm@...a.com>,
 linux-mm@...ck.org
Cc: Hugh Dickins <hughd@...gle.com>, Andrew Morton
 <akpm@...ux-foundation.org>, Kemeng Shi <shikemeng@...weicloud.com>,
 Nhat Pham <nphamcs@...il.com>, Chris Li <chrisl@...nel.org>,
 Baoquan He <bhe@...hat.com>, Barry Song <baohua@...nel.org>,
 linux-kernel@...r.kernel.org, Kairui Song <kasong@...cent.com>,
 stable@...r.kernel.org
Subject: Re: [PATCH v3] mm/shmem, swap: fix race of truncate and swap entry
 split



On 1/29/26 12:52 AM, Kairui Song wrote:
> On Wed, Jan 28, 2026 at 05:02:34AM +0800, Chris Mason wrote:
>>
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>
>> [ ... ]
>>
>>> @@ -1191,12 +1203,23 @@
>>>   		folio = fbatch.folios[i];
>>>
>>>   		if (xa_is_value(folio)) {
>>> +			int order;
>>>   			long swaps_freed;
>>>
>>>   			if (unfalloc)
>>>   				continue;
>>> -			swaps_freed = shmem_free_swap(mapping, indices[i], folio);
>>> +			swaps_freed = shmem_free_swap(mapping, indices[i],
>>> +						      end - 1, folio);
>>>   			if (!swaps_freed) {
>>> +				/*
>>> +				 * If found a large swap entry cross the end border,
>>> +				 * skip it as the truncate_inode_partial_folio above
>>> +				 * should have at least zerod its content once.
>>> +				 */
>>> +				order = shmem_confirm_swap(mapping, indices[i],
>>> +							   radix_to_swp_entry(folio));
>>> +				if (order > 0 && indices[i] + (1 << order) > end)
>>> +					continue;
>>>   				/* Swap was replaced by page: retry */
>>>   				index = indices[i];
>>>   				break;
>>
>> Could this code infinite loop if a large swap entry crosses the START
>> boundary instead of the END boundary?
> 
> Thanks for the review! It's really helpful :)
> 
>> When shmem_free_swap() returns 0, it could be either because:
>> (a) The entry was replaced by a page, or
>> (b) The entry failed the boundary check (base < index ||
>>      base + nr_pages - 1 > end)
>>
>> The boundary check here only handles case (b) for entries crossing the
>> END boundary:
>>
>>      if (order > 0 && indices[i] + (1 << order) > end)
>>          continue;
>>
>> But what happens if the entry crosses the START boundary? If
>> find_get_entries() returns a large swap entry at indices[i] where
>> the entry's base (calculated as indices[i] & ~((1 << order) - 1)) is
>> less than the truncation start point, then shmem_free_swap() will
>> return 0 due to the "base < index" check. The code will then call
>> shmem_confirm_swap(), get the order, check if it crosses the END
>> boundary (which it doesn't), and retry with the same index:
>>
>>      index = indices[i];
>>      break;
>>
>> The next iteration will find the same entry again at the same index,
>> leading to an infinite loop. For example:
>>
>> - Truncating range [18, 30]
>> - Large swap entry at [16, 23] (order 3, 8 pages)
>> - indices[i] = 18
>> - shmem_free_swap() sees base=16 < index=18, returns 0
>> - Check: 18 + 8 > 30 is false (26 <= 30)
>> - Retries with index=18
>> - Loop repeats indefinitely
> 
> I think this is a valid issue. And it's worse than that, during the `while (index < end)` loop a new large entry can land anywhere in the range, if one interaction's starting `index` points to the middle of any large entry, an infinite loop will occur: indices[0] are always equal to the `index` iteration value of that moments, shmem_free_swap will fail because the swap entry's index doesn't match indices[0], and so the `index = indices[i]; break;` keep it loop forever.
> 
> The chance seems very low though.
> 
>> Should the boundary check also handle the START case, perhaps:
>>
>>      if (order > 0) {
>>          pgoff_t base = indices[i] & ~((1UL << order) - 1);
>>          if (base + (1 << order) - 1 > end || base < start)
>>              continue;
>>      }
> 
> This still doesn't cover the case when a new large entry somehow lands in the range during the loop.
> 
>> where 'start' is preserved from before the loop?
> 
> How about following patch:
> 
>  From 863f38c757ee0898b6b7f0f8c695f551a1380ce8 Mon Sep 17 00:00:00 2001
> From: Kairui Song <kasong@...cent.com>
> Date: Thu, 29 Jan 2026 00:19:23 +0800
> Subject: [PATCH] mm, shmem: prevent infinite loop on truncate race
> 
> When truncating a large swap entry, shmem_free_swap() returns 0 when the
> entry's index doesn't match the given index due to lookup alignment. The
> failure fallback path checks if the entry crosses the end border and
> aborts when it happens, so truncate won't erase an unexpected entry or
> range. But one scenario was ignored.
> 
> When `index` points to the middle of a large swap entry, and the large
> swap entry doesn't go across the end border, find_get_entries() will
> return that large swap entry as the first item in the batch with
> `indices[0]` equal to `index`. The entry's base index will be smaller
> than `indices[0]`, so shmem_free_swap() will fail and return 0 due to
> the "base < index" check. The code will then call shmem_confirm_swap(),
> get the order, check if it crosses the END boundary (which it doesn't),
> and retry with the same index.
> 
> The next iteration will find the same entry again at the same index with
> same indices, leading to an infinite loop.
> 
> Fix this by retrying with a round-down index, and abort if the index is
> smaller than the truncate range.
> 
> Reported-by: Chris Mason <clm@...a.com>
> Closes: https://lore.kernel.org/linux-mm/20260128130336.727049-1-clm@meta.com/
> Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
> Fixes: 8a1968bd997f ("mm/shmem, swap: fix race of truncate and swap entry split")
> Signed-off-by: Kairui Song <kasong@...cent.com>
> ---

Thanks. The fix looks good to me.
Reviewed-by: Baolin Wang <baolin.wang@...ux.alibaba.com>

(BTW, I think we can simplify the logic by moving the boundary 
validation into shmem_free_swap() in the future).

>   mm/shmem.c | 23 ++++++++++++++---------
>   1 file changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index b9ddd38621a0..fe3719eb5a3c 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1211,17 +1211,22 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, uoff_t lend,
>   				swaps_freed = shmem_free_swap(mapping, indices[i],
>   							      end - 1, folio);
>   				if (!swaps_freed) {
> -					/*
> -					 * If found a large swap entry cross the end border,
> -					 * skip it as the truncate_inode_partial_folio above
> -					 * should have at least zerod its content once.
> -					 */
> +					pgoff_t base = indices[i];
> +
>   					order = shmem_confirm_swap(mapping, indices[i],
>   								   radix_to_swp_entry(folio));
> -					if (order > 0 && indices[i] + (1 << order) > end)
> -						continue;
> -					/* Swap was replaced by page: retry */
> -					index = indices[i];
> +					/*
> +					 * If found a large swap entry cross the end or start
> +					 * border, skip it as the truncate_inode_partial_folio
> +					 * above should have at least zerod its content once.
> +					 */
> +					if (order > 0) {
> +						base = round_down(base, 1 << order);
> +						if (base < start || base + (1 << order) > end)
> +							continue;
> +					}
> +					/* Swap was replaced by page or extended, retry */
> +					index = base;
>   					break;
>   				}
>   				nr_swaps_freed += swaps_freed;


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ