lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <85804484-9973-41a1-a05d-000833285f39@gmail.com>
Date: Thu, 13 Jun 2024 12:37:44 +0100
From: Usama Arif <usamaarif642@...il.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: akpm@...ux-foundation.org, hannes@...xchg.org, shakeel.butt@...ux.dev,
 david@...hat.com, ying.huang@...el.com, hughd@...gle.com,
 willy@...radead.org, nphamcs@...il.com, chengming.zhou@...ux.dev,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH v4 1/2] mm: store zero pages to be swapped out in a bitmap


On 12/06/2024 21:13, Yosry Ahmed wrote:
> On Wed, Jun 12, 2024 at 01:43:35PM +0100, Usama Arif wrote:
> [..]
>
> Hi Usama,
>
> A few more comments/questions, sorry for not looking closely earlier.
No worries, Thanks for the reviews!
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index f1e559e216bd..48d8dca0b94b 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -453,6 +453,8 @@ static unsigned int cluster_list_del_first(struct swap_cluster_list *list,
>>   static void swap_cluster_schedule_discard(struct swap_info_struct *si,
>>   		unsigned int idx)
>>   {
>> +	unsigned int i;
>> +
>>   	/*
>>   	 * If scan_swap_map_slots() can't find a free cluster, it will check
>>   	 * si->swap_map directly. To make sure the discarding cluster isn't
>> @@ -461,6 +463,13 @@ static void swap_cluster_schedule_discard(struct swap_info_struct *si,
>>   	 */
>>   	memset(si->swap_map + idx * SWAPFILE_CLUSTER,
>>   			SWAP_MAP_BAD, SWAPFILE_CLUSTER);
>> +	/*
>> +	 * zeromap can see updates from concurrent swap_writepage() and swap_read_folio()
>> +	 * call on other slots, hence use atomic clear_bit for zeromap instead of the
>> +	 * non-atomic bitmap_clear.
>> +	 */
> I don't think this is accurate. swap_read_folio() does not update the
> zeromap. I think the need for an atomic operation here is because we may
> be updating adjacent bits simulatenously, so we may cause lost updates
> otherwise (i.e. corrupting adjacent bits).


Thanks, will change to "Use atomic clear_bit instead of non-atomic 
bitmap_clear to prevent adjacent bits corruption due to simultaneous 
writes." in the next revision

>
>> +	for (i = 0; i < SWAPFILE_CLUSTER; i++)
>> +		clear_bit(idx * SWAPFILE_CLUSTER + i, si->zeromap);
> Could you explain why we need to clear the zeromap here?
>
> swap_cluster_schedule_discard() is called from:
> - swap_free_cluster() -> free_cluster()
>
> This is already covered below.
>
> - swap_entry_free() -> dec_cluster_info_page() -> free_cluster()
>
> Each entry in the cluster should have its zeromap bit cleared in
> swap_entry_free() before the entire cluster is free and we call
> free_cluster().
>
> Am I missing something?

Yes, it looks like this one is not needed as swap_entry_free and 
swap_free_cluster would already have cleared the bit. Will remove it.

I had initially started checking what codepaths zeromap would need to be 
cleared. But then thought I could do it wherever si->swap_map is cleared 
or set to SWAP_MAP_BAD, which is why I added it here.

>>   
>>   	cluster_list_add_tail(&si->discard_clusters, si->cluster_info, idx);
>>   
>> @@ -482,7 +491,7 @@ static void __free_cluster(struct swap_info_struct *si, unsigned long idx)
>>   static void swap_do_scheduled_discard(struct swap_info_struct *si)
>>   {
>>   	struct swap_cluster_info *info, *ci;
>> -	unsigned int idx;
>> +	unsigned int idx, i;
>>   
>>   	info = si->cluster_info;
>>   
>> @@ -498,6 +507,8 @@ static void swap_do_scheduled_discard(struct swap_info_struct *si)
>>   		__free_cluster(si, idx);
>>   		memset(si->swap_map + idx * SWAPFILE_CLUSTER,
>>   				0, SWAPFILE_CLUSTER);
>> +		for (i = 0; i < SWAPFILE_CLUSTER; i++)
>> +			clear_bit(idx * SWAPFILE_CLUSTER + i, si->zeromap);
> Same here. I didn't look into the specific code paths, but shouldn't the
> cluster be unused (and hence its zeromap bits already cleared?).
>
I think this one is needed (or atleast very good to have). There are 2 
paths:

1) swap_cluster_schedule_discard (clears zeromap) -> swap_discard_work 
-> swap_do_scheduled_discard (clears zeromap)

Path 1 doesnt need it as swap_cluster_schedule_discard already clears it.

2) scan_swap_map_slots -> scan_swap_map_try_ssd_cluster -> 
swap_do_scheduled_discard (clears zeromap)

Path 2 might need it as zeromap isnt cleared earlier I believe 
(eventhough I think it might already be 0).

Even if its cleared in path 2, I think its good to keep this one, as the 
function is swap_do_scheduled_discard, i.e. incase it gets directly 
called or si->discard_work gets scheduled anywhere else in the future, 
it should do as the function name suggests, i.e. swap discard(clear 
zeromap).

>>   		unlock_cluster(ci);
>>   	}
>>   }
>> @@ -1059,9 +1070,12 @@ static void swap_free_cluster(struct swap_info_struct *si, unsigned long idx)
>>   {
>>   	unsigned long offset = idx * SWAPFILE_CLUSTER;
>>   	struct swap_cluster_info *ci;
>> +	unsigned int i;
>>   
>>   	ci = lock_cluster(si, offset);
>>   	memset(si->swap_map + offset, 0, SWAPFILE_CLUSTER);
>> +	for (i = 0; i < SWAPFILE_CLUSTER; i++)
>> +		clear_bit(offset + i, si->zeromap);
>>   	cluster_set_count_flag(ci, 0, 0);
>>   	free_cluster(si, idx);
>>   	unlock_cluster(ci);
>> @@ -1336,6 +1350,7 @@ static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry)
>>   	count = p->swap_map[offset];
>>   	VM_BUG_ON(count != SWAP_HAS_CACHE);
>>   	p->swap_map[offset] = 0;
>> +	clear_bit(offset, p->zeromap);
> I think instead of clearing the zeromap in swap_free_cluster() and here
> separately, we can just do it in swap_range_free(). I suspect this may
> be the only place we really need to clear the zero in the swapfile code.

Sure, we could move it to swap_range_free, but then also move the 
clearing of swap_map.

When it comes to clearing zeromap, I think its just generally a good 
idea to clear it wherever swap_map is cleared.

So the diff over v4 looks like below (should address all comments but 
not remove it from swap_do_scheduled_discard, and 
move si->swap_map/zeromap clearing from 
swap_free_cluster/swap_entry_free to swap_range_free):



diff --git a/mm/swapfile.c b/mm/swapfile.c
index 48d8dca0b94b..39cad0d09525 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -463,13 +463,6 @@ static void swap_cluster_schedule_discard(struct 
swap_info_struct *si,
          */
         memset(si->swap_map + idx * SWAPFILE_CLUSTER,
                         SWAP_MAP_BAD, SWAPFILE_CLUSTER);
-       /*
-        * zeromap can see updates from concurrent swap_writepage() and 
swap_read_folio()
-        * call on other slots, hence use atomic clear_bit for zeromap 
instead of the
-        * non-atomic bitmap_clear.
-        */
-       for (i = 0; i < SWAPFILE_CLUSTER; i++)
-               clear_bit(idx * SWAPFILE_CLUSTER + i, si->zeromap);

         cluster_list_add_tail(&si->discard_clusters, si->cluster_info, 
idx);

@@ -758,6 +751,15 @@ static void swap_range_free(struct swap_info_struct 
*si, unsigned long offset,
         unsigned long begin = offset;
         unsigned long end = offset + nr_entries - 1;
         void (*swap_slot_free_notify)(struct block_device *, unsigned 
long);
+       unsigned int i;
+
+       memset(si->swap_map + offset, 0, nr_entries);
+       /*
+        * Use atomic clear_bit operations only on zeromap instead of 
non-atomic
+        * bitmap_clear to prevent adjacent bits corruption due to 
simultaneous writes.
+        */
+       for (i = 0; i < nr_entries; i++)
+               clear_bit(offset + i, si->zeromap);

         if (offset < si->lowest_bit)
                 si->lowest_bit = offset;
@@ -1070,12 +1072,8 @@ static void swap_free_cluster(struct 
swap_info_struct *si, unsigned long idx)
  {
         unsigned long offset = idx * SWAPFILE_CLUSTER;
         struct swap_cluster_info *ci;
-       unsigned int i;

         ci = lock_cluster(si, offset);
-       memset(si->swap_map + offset, 0, SWAPFILE_CLUSTER);
-       for (i = 0; i < SWAPFILE_CLUSTER; i++)
-               clear_bit(offset + i, si->zeromap);
         cluster_set_count_flag(ci, 0, 0);
         free_cluster(si, idx);
         unlock_cluster(ci);
@@ -1349,8 +1347,6 @@ static void swap_entry_free(struct 
swap_info_struct *p, swp_entry_t entry)
         ci = lock_cluster(p, offset);
         count = p->swap_map[offset];
         VM_BUG_ON(count != SWAP_HAS_CACHE);
-       p->swap_map[offset] = 0;
-       clear_bit(offset, p->zeromap);
         dec_cluster_info_page(p, p->cluster_info, offset);
         unlock_cluster(ci);




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ