[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c50aeb15-d0a1-4eaf-9d14-05c4f2a9f2aa@linux.alibaba.com>
Date: Tue, 25 Feb 2025 09:07:57 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: Kairui Song <ryncsn@...il.com>
Cc: akpm@...ux-foundation.org, alex_y_xu@...oo.ca, baohua@...nel.org,
da.gomez@...sung.com, david@...hat.com, hughd@...gle.com,
ioworker0@...il.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
ryan.roberts@....com, wangkefeng.wang@...wei.com, willy@...radead.org,
ziy@...dia.com
Subject: Re: [PATCH] mm: shmem: fix potential data corruption during shmem
swapin
On 2025/2/25 01:50, Kairui Song wrote:
> On Mon, Feb 24, 2025 at 4:47 PM Baolin Wang
> <baolin.wang@...ux.alibaba.com> wrote:
>>
>> Alex and Kairui reported some issues (system hang or data corruption) when
>> swapping out or swapping in large shmem folios. This is especially easy to
>> reproduce when the tmpfs is mount with the 'huge=within_size' parameter.
>> Thanks to Kairui's reproducer, the issue can be easily replicated.
>>
>> The root cause of the problem is that swap readahead may asynchronously
>> swap in order 0 folios into the swap cache, while the shmem mapping can
>> still store large swap entries. Then an order 0 folio is inserted into
>> the shmem mapping without splitting the large swap entry, which overwrites
>> the original large swap entry, leading to data corruption.
>>
>> When getting a folio from the swap cache, we should split the large swap
>> entry stored in the shmem mapping if the orders do not match, to fix this
>> issue.
>>
>> Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
>> Reported-by: Alex Xu (Hello71) <alex_y_xu@...oo.ca>
>> Reported-by: Kairui Song <ryncsn@...il.com>
>
> Maybe you can add a Closes:?
Yes. Hope Andrew can help add this:
Closes: https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/
>> Signed-off-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
>> ---
>> mm/shmem.c | 31 +++++++++++++++++++++++++++----
>> 1 file changed, 27 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 4ea6109a8043..cebbac97a221 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -2253,7 +2253,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>> struct folio *folio = NULL;
>> bool skip_swapcache = false;
>> swp_entry_t swap;
>> - int error, nr_pages;
>> + int error, nr_pages, order, split_order;
>>
>> VM_BUG_ON(!*foliop || !xa_is_value(*foliop));
>> swap = radix_to_swp_entry(*foliop);
>> @@ -2272,10 +2272,9 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>>
>> /* Look it up and read it in.. */
>> folio = swap_cache_get_folio(swap, NULL, 0);
>> + order = xa_get_order(&mapping->i_pages, index);
>> if (!folio) {
>> - int order = xa_get_order(&mapping->i_pages, index);
>> bool fallback_order0 = false;
>> - int split_order;
>>
>> /* Or update major stats only when swapin succeeds?? */
>> if (fault_type) {
>> @@ -2339,6 +2338,29 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>> error = -ENOMEM;
>> goto failed;
>> }
>> + } else if (order != folio_order(folio)) {
>> + /*
>> + * Swap readahead may swap in order 0 folios into swapcache
>> + * asynchronously, while the shmem mapping can still stores
>> + * large swap entries. In such cases, we should split the
>> + * large swap entry to prevent possible data corruption.
>> + */
>> + split_order = shmem_split_large_entry(inode, index, swap, gfp);
>> + if (split_order < 0) {
>> + error = split_order;
>> + goto failed;
>> + }
>> +
>> + /*
>> + * If the large swap entry has already been split, it is
>> + * necessary to recalculate the new swap entry based on
>> + * the old order alignment.
>> + */
>> + if (split_order > 0) {
>> + pgoff_t offset = index - round_down(index, 1 << split_order);
>> +
>> + swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
>> + }
>> }
>>
>> alloced:
>> @@ -2346,7 +2368,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>> folio_lock(folio);
>> if ((!skip_swapcache && !folio_test_swapcache(folio)) ||
>> folio->swap.val != swap.val ||
>> - !shmem_confirm_swap(mapping, index, swap)) {
>> + !shmem_confirm_swap(mapping, index, swap) ||
>> + xa_get_order(&mapping->i_pages, index) != folio_order(folio)) {
>> error = -EEXIST;
>> goto unlock;
>> }
>> --
>> 2.43.5
>>
>
> Thanks for the fix, it works for me.
>
> Tested-by: Kairui Song <kasong@...cent.com>
Thanks for testing :)
Powered by blists - more mailing lists