lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e836546e-5146-40fe-5515-0a185b72bdb2@huaweicloud.com>
Date: Wed, 11 Jun 2025 15:54:21 +0800
From: Kemeng Shi <shikemeng@...weicloud.com>
To: Kairui Song <ryncsn@...il.com>
Cc: akpm@...ux-foundation.org, bhe@...hat.com, hannes@...xchg.org,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to
 avoid potensial deadloop



on 5/26/2025 1:08 AM, Kairui Song wrote:
> On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@...weicloud.com> wrote:
>>
>> We use maxpages from read_swap_header() to initialize swap_info_struct,
>> however the maxpages might be reduced in setup_swap_extents() and the
>> si->max is assigned with the reduced maxpages from the
>> setup_swap_extents().
>>
>> Obviously, this could lead to memory waste as we allocated memory based on
>> larger maxpages, besides, this could lead to a potensial deadloop as
>> following:
>> 1) When calling setup_clusters() with larger maxpages, unavailable pages
>> within range [si->max, larger maxpages) are not accounted with
>> inc_cluster_info_page(). As a result, these pages are assumed available
>> but can not be allocated. The cluster contains these pages can be moved
>> to frag_clusters list after it's all available pages were allocated.
>> 2) When the cluster mentioned in 1) is the only cluster in frag_clusters
>> list, cluster_alloc_swap_entry() assume order 0 allocation will never
>> failed and will enter a deadloop by keep trying to allocate page from the
>> only cluster in frag_clusters which contains no actually available page.
>>
>> Call setup_swap_extents() to get the final maxpages before swap_info_struct
>> initialization to fix the issue.
>>
>> Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned")
>>
>> Signed-off-by: Kemeng Shi <shikemeng@...weicloud.com>
>> ---
>>  mm/swapfile.c | 47 ++++++++++++++++++++---------------------------
>>  1 file changed, 20 insertions(+), 27 deletions(-)
>>
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 75b69213c2e7..a82f4ebefca3 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si,
>>         return maxpages;
>>  }
>>
>> -static int setup_swap_map_and_extents(struct swap_info_struct *si,
>> -                                       union swap_header *swap_header,
>> -                                       unsigned char *swap_map,
>> -                                       unsigned long maxpages,
>> -                                       sector_t *span)
>> +static int setup_swap_map(struct swap_info_struct *si,
>> +                         union swap_header *swap_header,
>> +                         unsigned char *swap_map,
>> +                         unsigned long maxpages)
>>  {
>> -       unsigned int nr_good_pages;
>>         unsigned long i;
>> -       int nr_extents;
>> -
>> -       nr_good_pages = maxpages - 1;   /* omit header page */
>>
>> +       swap_map[0] = SWAP_MAP_BAD; /* omit header page */
>>         for (i = 0; i < swap_header->info.nr_badpages; i++) {
>>                 unsigned int page_nr = swap_header->info.badpages[i];
>>                 if (page_nr == 0 || page_nr > swap_header->info.last_page)
>>                         return -EINVAL;
>>                 if (page_nr < maxpages) {
>>                         swap_map[page_nr] = SWAP_MAP_BAD;
>> -                       nr_good_pages--;
>> +                       si->pages--;
>>                 }
>>         }
>>
>> -       if (nr_good_pages) {
>> -               swap_map[0] = SWAP_MAP_BAD;
>> -               si->max = maxpages;
>> -               si->pages = nr_good_pages;
>> -               nr_extents = setup_swap_extents(si, span);
>> -               if (nr_extents < 0)
>> -                       return nr_extents;
>> -               nr_good_pages = si->pages;
>> -       }
>> -       if (!nr_good_pages) {
>> +       if (!si->pages) {
>>                 pr_warn("Empty swap-file\n");
>>                 return -EINVAL;
>>         }
>>
>>
>> -       return nr_extents;
>> +       return 0;
>>  }
>>
>>  #define SWAP_CLUSTER_INFO_COLS                                         \
>> @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si,
>>          * Mark unusable pages as unavailable. The clusters aren't
>>          * marked free yet, so no list operations are involved yet.
>>          *
>> -        * See setup_swap_map_and_extents(): header page, bad pages,
>> +        * See setup_swap_map(): header page, bad pages,
>>          * and the EOF part of the last cluster.
>>          */
>>         inc_cluster_info_page(si, cluster_info, 0);
>> @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>>                 goto bad_swap_unlock_inode;
>>         }
>>
>> +       si->max = maxpages;
>> +       si->pages = maxpages - 1;
>> +       nr_extents = setup_swap_extents(si, &span);
>> +       if (nr_extents < 0) {
>> +               error = nr_extents;
>> +               goto bad_swap_unlock_inode;
>> +       }
>> +       maxpages = si->max;
> 
Hello,
> There seems to be a trivial problem here, previously the si->pages
> will be seen by swap_activate after bad blocks have been counted and
> si->pages means the actual available slots. But now si->pages will be
> seen by swap_active as `maxpages - 1`.
> 
> One current side effect now is the span value will not be updated
> properly so the pr_info in swap on may print a larger value, if the
> swap header contains badblocks and swapfile is on nfs/cifs.
Thanks for point this out. But I think the larger value is actually
correct result.
In summary, there are two kinds of swapfile_activate operations.
1. Filesystem style: Treat all blocks logical continuity and find
useable physical extents in logical range. In this way, si->pages
will be actual useable physical blocks and span will be "1 +
highest_block - lowest_block".
2. Block device style: Treat all blocks physically continue and
only one single extent is added. In this way, si->pages will be
si->max and span will be "si->pages - 1".
Actually, si->pages and si->max is only used in block device style
and span value is set with si->pages. As a result, span value in
block device style will become a larger value as you mentioned.

I think larger value is correct based on:
1. Span value in filesystem style is "1 + highest_block -
lowest_block" which is the range cover all possible phisical blocks
including the badblocks.
2. For block device style, si->pages is the actual useable block
number and is already in pr_info. The orignal span value before
this patch is also refer to useable block number which is redundant
in pr_info.
> 
> This should not be a problem but it's better to mention or add
> comments about it
I'd like to mention this change as a fix in changelog in next version.
> 
> And I think it's better to add a sanity check here to check if
> si->pages still equal to si->max - 1,  setup_swap_map_and_extents /
> setup_swap_map assumes the header section was already counted. This
> also helps indicate the setup_swap_extents may shrink and modify these
> two values.
Sure, will add this in next version.
> 
> BTW, I was thinking that we should get rid of the whole extents design
> after the swap table series is ready, so mTHP allocation will be
> usable for swap over fs too.
I also noticed this limitation but have not taken a deep look. Look
forward to your solution in future.
> 
>>         /* OK, set up the swap map and apply the bad block list */
>>         swap_map = vzalloc(maxpages);
>>         if (!swap_map) {
>> @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>>         if (error)
>>                 goto bad_swap_unlock_inode;
>>
>> -       nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map,
>> -                                               maxpages, &span);
>> -       if (unlikely(nr_extents < 0)) {
>> -               error = nr_extents;
>> +       error = setup_swap_map(si, swap_header, swap_map, maxpages);
>> +       if (error)
>>                 goto bad_swap_unlock_inode;
>> -       }
>>
>>         /*
>>          * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might
>> --
>> 2.30.0
>>
> 
> Other than that:
> 
> Reviewed-by: Kairui Song <kasong@...cent.com>
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ