linux-kernel - Re: [PATCH 1/2] mm: swap: update inuse

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <878r4ftodl.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Wed, 24 Jan 2024 11:13:26 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,  Johannes Weiner
 <hannes@...xchg.org>,  Nhat Pham <nphamcs@...il.com>,  Chris Li
 <chrisl@...nel.org>,  Chengming Zhou <zhouchengming@...edance.com>,
  linux-mm@...ck.org,  linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] mm: swap: update inuse_pages after all cleanups are
 done

Yosry Ahmed <yosryahmed@...gle.com> writes:

> On Tue, Jan 23, 2024 at 1:01 AM Huang, Ying <ying.huang@...el.com> wrote:
>>
>> Yosry Ahmed <yosryahmed@...gle.com> writes:
>>
>> > In swap_range_free(), we update inuse_pages then do some cleanups (arch
>> > invalidation, zswap invalidation, swap cache cleanups, etc). During
>> > swapoff, try_to_unuse() uses inuse_pages to make sure all swap entries
>> > are freed. Make sure we only update inuse_pages after we are done with
>> > the cleanups.
>> >
>> > In practice, this shouldn't matter, because swap_range_free() is called
>> > with the swap info lock held, and the swapoff code will spin for that
>> > lock after try_to_unuse() anyway.
>> >
>> > The goal is to make it obvious and more future proof that once
>> > try_to_unuse() returns, all cleanups are done.
>>
>> Defines "all cleanups".  Apparently, some other operations are still
>> to be done after try_to_unuse() in swap_off().
>
> I am referring to the cleanups in swap_range_free() that I mentioned above.
>
> How about s/all the cleanups/all the cleanups in swap_range_free()?

Sounds good for me.

>>
>> > This also facilitates a
>> > following zswap cleanup patch which uses this fact to simplify
>> > zswap_swapoff().
>> >
>> > Signed-off-by: Yosry Ahmed <yosryahmed@...gle.com>
>> > ---
>> >  mm/swapfile.c | 4 ++--
>> >  1 file changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/mm/swapfile.c b/mm/swapfile.c
>> > index 556ff7347d5f0..2fedb148b9404 100644
>> > --- a/mm/swapfile.c
>> > +++ b/mm/swapfile.c
>> > @@ -737,8 +737,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset,
>> >               if (was_full && (si->flags & SWP_WRITEOK))
>> >                       add_to_avail_list(si);
>> >       }
>> > -     atomic_long_add(nr_entries, &nr_swap_pages);
>> > -     WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries);
>> >       if (si->flags & SWP_BLKDEV)
>> >               swap_slot_free_notify =
>> >                       si->bdev->bd_disk->fops->swap_slot_free_notify;
>> > @@ -752,6 +750,8 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset,
>> >               offset++;
>> >       }
>> >       clear_shadow_from_swap_cache(si->type, begin, end);
>> > +     atomic_long_add(nr_entries, &nr_swap_pages);
>> > +     WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries);
>>
>> This isn't enough.  You need to use smp_wmb() here and smp_rmb() in
>> somewhere reading si->inuse_pages.
>
> Hmm, good point. Although as I mentioned in the commit message, this
> shouldn't matter today as swap_range_free() executes with the lock
> held, and we spin on the lock after try_to_unuse() returns.

Yes.  IIUC, this patch isn't needed too because we have spinlock already.

> It may still be more future-proof to add the memory barriers.

Yes.  Without memory barriers, moving code doesn't guarantee memory
order.

> In swap_range_free, we want to make sure that the write to
> si->inuse_pages in swap_range_free() happens *after* the cleanups
> (specifically zswap_invalidate() in this case).
> In swap_off, we want to make sure that the cleanups following
> try_to_unuse() (e.g. zswap_swapoff) happen *after* reading
> si->inuse_pages == 0 in try_to_unuse().
>
> So I think we want smp_wmb() in swap_range_free() and smp_mb() in
> try_to_unuse(). Does the below look correct to you?
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 2fedb148b9404..a2fa2f65a8ddd 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -750,6 +750,12 @@ static void swap_range_free(struct
> swap_info_struct *si, unsigned long offset,
>                 offset++;
>         }
>         clear_shadow_from_swap_cache(si->type, begin, end);
> +
> +       /*
> +        * Make sure that try_to_unuse() observes si->inuse_pages reaching 0
> +        * only after the above cleanups are done.
> +        */
> +       smp_wmb();
>         atomic_long_add(nr_entries, &nr_swap_pages);
>         WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries);
>  }
> @@ -2130,6 +2136,11 @@ static int try_to_unuse(unsigned int type)
>                 return -EINTR;
>         }
>
> +       /*
> +        * Make sure that further cleanups after try_to_unuse() returns happen
> +        * after swap_range_free() reduces si->inuse_pages to 0.
> +        */
> +       smp_mb();
>         return 0;
>  }

We need to take care of "si->inuse_pages" checking at the beginning of
try_to_unuse() too.  Otherwise, it looks good to me.

> Alternatively, we may just hold the spinlock in try_to_unuse() when we
> check si->inuse_pages at the end. This will also ensure that any calls
> to swap_range_free() have completed. Let me know what you prefer.

Personally, I prefer memory barriers here.

--
Best Regards,
Huang, Ying