[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <85804484-9973-41a1-a05d-000833285f39@gmail.com>
Date: Thu, 13 Jun 2024 12:37:44 +0100
From: Usama Arif <usamaarif642@...il.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: akpm@...ux-foundation.org, hannes@...xchg.org, shakeel.butt@...ux.dev,
david@...hat.com, ying.huang@...el.com, hughd@...gle.com,
willy@...radead.org, nphamcs@...il.com, chengming.zhou@...ux.dev,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH v4 1/2] mm: store zero pages to be swapped out in a bitmap
On 12/06/2024 21:13, Yosry Ahmed wrote:
> On Wed, Jun 12, 2024 at 01:43:35PM +0100, Usama Arif wrote:
> [..]
>
> Hi Usama,
>
> A few more comments/questions, sorry for not looking closely earlier.
No worries, Thanks for the reviews!
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index f1e559e216bd..48d8dca0b94b 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -453,6 +453,8 @@ static unsigned int cluster_list_del_first(struct swap_cluster_list *list,
>> static void swap_cluster_schedule_discard(struct swap_info_struct *si,
>> unsigned int idx)
>> {
>> + unsigned int i;
>> +
>> /*
>> * If scan_swap_map_slots() can't find a free cluster, it will check
>> * si->swap_map directly. To make sure the discarding cluster isn't
>> @@ -461,6 +463,13 @@ static void swap_cluster_schedule_discard(struct swap_info_struct *si,
>> */
>> memset(si->swap_map + idx * SWAPFILE_CLUSTER,
>> SWAP_MAP_BAD, SWAPFILE_CLUSTER);
>> + /*
>> + * zeromap can see updates from concurrent swap_writepage() and swap_read_folio()
>> + * call on other slots, hence use atomic clear_bit for zeromap instead of the
>> + * non-atomic bitmap_clear.
>> + */
> I don't think this is accurate. swap_read_folio() does not update the
> zeromap. I think the need for an atomic operation here is because we may
> be updating adjacent bits simulatenously, so we may cause lost updates
> otherwise (i.e. corrupting adjacent bits).
Thanks, will change to "Use atomic clear_bit instead of non-atomic
bitmap_clear to prevent adjacent bits corruption due to simultaneous
writes." in the next revision
>
>> + for (i = 0; i < SWAPFILE_CLUSTER; i++)
>> + clear_bit(idx * SWAPFILE_CLUSTER + i, si->zeromap);
> Could you explain why we need to clear the zeromap here?
>
> swap_cluster_schedule_discard() is called from:
> - swap_free_cluster() -> free_cluster()
>
> This is already covered below.
>
> - swap_entry_free() -> dec_cluster_info_page() -> free_cluster()
>
> Each entry in the cluster should have its zeromap bit cleared in
> swap_entry_free() before the entire cluster is free and we call
> free_cluster().
>
> Am I missing something?
Yes, it looks like this one is not needed as swap_entry_free and
swap_free_cluster would already have cleared the bit. Will remove it.
I had initially started checking what codepaths zeromap would need to be
cleared. But then thought I could do it wherever si->swap_map is cleared
or set to SWAP_MAP_BAD, which is why I added it here.
>>
>> cluster_list_add_tail(&si->discard_clusters, si->cluster_info, idx);
>>
>> @@ -482,7 +491,7 @@ static void __free_cluster(struct swap_info_struct *si, unsigned long idx)
>> static void swap_do_scheduled_discard(struct swap_info_struct *si)
>> {
>> struct swap_cluster_info *info, *ci;
>> - unsigned int idx;
>> + unsigned int idx, i;
>>
>> info = si->cluster_info;
>>
>> @@ -498,6 +507,8 @@ static void swap_do_scheduled_discard(struct swap_info_struct *si)
>> __free_cluster(si, idx);
>> memset(si->swap_map + idx * SWAPFILE_CLUSTER,
>> 0, SWAPFILE_CLUSTER);
>> + for (i = 0; i < SWAPFILE_CLUSTER; i++)
>> + clear_bit(idx * SWAPFILE_CLUSTER + i, si->zeromap);
> Same here. I didn't look into the specific code paths, but shouldn't the
> cluster be unused (and hence its zeromap bits already cleared?).
>
I think this one is needed (or atleast very good to have). There are 2
paths:
1) swap_cluster_schedule_discard (clears zeromap) -> swap_discard_work
-> swap_do_scheduled_discard (clears zeromap)
Path 1 doesnt need it as swap_cluster_schedule_discard already clears it.
2) scan_swap_map_slots -> scan_swap_map_try_ssd_cluster ->
swap_do_scheduled_discard (clears zeromap)
Path 2 might need it as zeromap isnt cleared earlier I believe
(eventhough I think it might already be 0).
Even if its cleared in path 2, I think its good to keep this one, as the
function is swap_do_scheduled_discard, i.e. incase it gets directly
called or si->discard_work gets scheduled anywhere else in the future,
it should do as the function name suggests, i.e. swap discard(clear
zeromap).
>> unlock_cluster(ci);
>> }
>> }
>> @@ -1059,9 +1070,12 @@ static void swap_free_cluster(struct swap_info_struct *si, unsigned long idx)
>> {
>> unsigned long offset = idx * SWAPFILE_CLUSTER;
>> struct swap_cluster_info *ci;
>> + unsigned int i;
>>
>> ci = lock_cluster(si, offset);
>> memset(si->swap_map + offset, 0, SWAPFILE_CLUSTER);
>> + for (i = 0; i < SWAPFILE_CLUSTER; i++)
>> + clear_bit(offset + i, si->zeromap);
>> cluster_set_count_flag(ci, 0, 0);
>> free_cluster(si, idx);
>> unlock_cluster(ci);
>> @@ -1336,6 +1350,7 @@ static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry)
>> count = p->swap_map[offset];
>> VM_BUG_ON(count != SWAP_HAS_CACHE);
>> p->swap_map[offset] = 0;
>> + clear_bit(offset, p->zeromap);
> I think instead of clearing the zeromap in swap_free_cluster() and here
> separately, we can just do it in swap_range_free(). I suspect this may
> be the only place we really need to clear the zero in the swapfile code.
Sure, we could move it to swap_range_free, but then also move the
clearing of swap_map.
When it comes to clearing zeromap, I think its just generally a good
idea to clear it wherever swap_map is cleared.
So the diff over v4 looks like below (should address all comments but
not remove it from swap_do_scheduled_discard, and
move si->swap_map/zeromap clearing from
swap_free_cluster/swap_entry_free to swap_range_free):
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 48d8dca0b94b..39cad0d09525 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -463,13 +463,6 @@ static void swap_cluster_schedule_discard(struct
swap_info_struct *si,
*/
memset(si->swap_map + idx * SWAPFILE_CLUSTER,
SWAP_MAP_BAD, SWAPFILE_CLUSTER);
- /*
- * zeromap can see updates from concurrent swap_writepage() and
swap_read_folio()
- * call on other slots, hence use atomic clear_bit for zeromap
instead of the
- * non-atomic bitmap_clear.
- */
- for (i = 0; i < SWAPFILE_CLUSTER; i++)
- clear_bit(idx * SWAPFILE_CLUSTER + i, si->zeromap);
cluster_list_add_tail(&si->discard_clusters, si->cluster_info,
idx);
@@ -758,6 +751,15 @@ static void swap_range_free(struct swap_info_struct
*si, unsigned long offset,
unsigned long begin = offset;
unsigned long end = offset + nr_entries - 1;
void (*swap_slot_free_notify)(struct block_device *, unsigned
long);
+ unsigned int i;
+
+ memset(si->swap_map + offset, 0, nr_entries);
+ /*
+ * Use atomic clear_bit operations only on zeromap instead of
non-atomic
+ * bitmap_clear to prevent adjacent bits corruption due to
simultaneous writes.
+ */
+ for (i = 0; i < nr_entries; i++)
+ clear_bit(offset + i, si->zeromap);
if (offset < si->lowest_bit)
si->lowest_bit = offset;
@@ -1070,12 +1072,8 @@ static void swap_free_cluster(struct
swap_info_struct *si, unsigned long idx)
{
unsigned long offset = idx * SWAPFILE_CLUSTER;
struct swap_cluster_info *ci;
- unsigned int i;
ci = lock_cluster(si, offset);
- memset(si->swap_map + offset, 0, SWAPFILE_CLUSTER);
- for (i = 0; i < SWAPFILE_CLUSTER; i++)
- clear_bit(offset + i, si->zeromap);
cluster_set_count_flag(ci, 0, 0);
free_cluster(si, idx);
unlock_cluster(ci);
@@ -1349,8 +1347,6 @@ static void swap_entry_free(struct
swap_info_struct *p, swp_entry_t entry)
ci = lock_cluster(p, offset);
count = p->swap_map[offset];
VM_BUG_ON(count != SWAP_HAS_CACHE);
- p->swap_map[offset] = 0;
- clear_bit(offset, p->zeromap);
dec_cluster_info_page(p, p->cluster_info, offset);
unlock_cluster(ci);
Powered by blists - more mailing lists