[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4yMxNsmPJn0W9puKWcQD3T7RDyQ=QmPhAtoq=3_u=m+TQ@mail.gmail.com>
Date: Wed, 25 Sep 2024 14:26:52 +0800
From: Barry Song <21cnbao@...il.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Nhat Pham <nphamcs@...il.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>,
akpm@...ux-foundation.org, hannes@...xchg.org, hughd@...gle.com,
shakeel.butt@...ux.dev, ryan.roberts@....com, ying.huang@...el.com,
chrisl@...nel.org, david@...hat.com, kasong@...cent.com, willy@...radead.org,
viro@...iv.linux.org.uk, chengming.zhou@...ux.dev, linux-mm@...ck.org,
kernel-team@...a.com, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/2] remove SWAP_MAP_SHMEM
On Wed, Sep 25, 2024 at 2:12 AM Yosry Ahmed <yosryahmed@...gle.com> wrote:
>
> [..]
> > > > > > Apparently __swap_duplicate() does not currently handle increasing the
> > > > > > swap count for multiple swap entries by 1 (i.e. usage == 1) because it
> > > > > > does not handle rolling back count increases when
> > > > > > swap_count_continued() fails.
> > > > > >
> > > > > > I guess this voids my Reviewed-by until we sort this out. Technically
> > > > > > swap_count_continued() won't ever be called for shmem because we only
> > > > > > ever increment the count by 1, but there is no way to know this in
> > > > > > __swap_duplicate() without SWAP_HAS_SHMEM.
> > > >
> > > > Ah this is my bad. I compiled with CONFIG_THP_SWAP, but forgot to
> > > > remove the swapfile check (that's another can of worms, but I need
> > > > data before submitting the patch to remove it...)
> > > >
> > > > One thing we can do is instead of warning here, we can handle it in
> > > > the for loop check, where we have access to count - that's the point
> > > > of having that for-loop check anyway? :)
> > > >
> > > > There's a couple of ways to go about it:
> > > >
> > > > 1. VM_WARN_ON(usage == 1 && nr > 1 && count != 0 );
> > >
> > > Hmm that should work, although it's a bit complicated tbh.
> > >
> > > > (or more accurately, (count & ~COUNT_CONTINUED) >= SWAP_MAP_MAX))
> > >
> > > I think this will make the warning very hard to hit if there's a
> > > misuse of __swap_duplicate(). It will only be hit when an entry needs
> > > count continuation, which I am not sure is very common. If there's a
> > > bug, the warning will potentially catch it too late, if ever.
> > >
> > > The side effect here is failing to decrement the swap count of some
> > > swap entries which will lead to them never being freed, essentially
> > > leaking swap capacity slowly over time. I am not sure if there are
> > > more detrimental effects.
> > >
> > > >
> > > > 2. Alternatively, instead of warning here, we can simply return
> > > > -ENOMEM. Then, at shmem callsite, have a VM_WARN_ON/VM_BUG_ON(), since
> > > > this MUST succeed.
> > >
> > > We still fail to rollback incremented counts though when we return
> > > -ENOMEM, right? Maybe I didn't get what you mean.
> >
> > My understanding now is that there are two for loops. One for loop
> > that checks the entry's states, and one for loop that does the actual
> > incrementing work (or state modification).
> >
> > We can check in the first for loop, if it is safe to proceed:
> >
> > if (!count && !has_cache) {
> > err = -ENOENT;
> > } else if (usage == SWAP_HAS_CACHE) {
> > if (has_cache)
> > err = -EEXIST;
> > } else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) {
> > err = -EINVAL;
> > } else if (usage == 1 && nr > 1 && (count & ~COUNT_CONTINUED) >=
> > SWAP_MAP_MAX)) {
> > /* the batched variants currently do not support rollback */
> > err = -ENOMEM;
> > }
>
> Hmm yeah I think something like this should work and is arguably
> better than just warning, although this needs cleaning up:
> - We already know usage != SWAP_HAS_CACHE, so no need to check if usage == 1.
> - We already know (count & ~COUNT_CONTINUED) is larger than
> SWAP_MAP_MAX, so we should check if it's equal to SWAP_MAP_MAX.
> - We should probably just calculate count & ~COUNT_CONTINUED above the
> if conditions at this point.
>
> I would also like to hear what Barry thinks since he added this (and I
> just realized he is not CC'd).
I am perfectly fine with the approach, in the first loop, if we find all entries
don't need CONTINUED, we can run the 2nd loop even for usage==1
and nr > 1. this is almost always true for a real product where anon folios
are unlikely to be fork-shared by so many processes.
but we need fall back to iterating nr times if this really happens:
int swap_duplicate_nr(swp_entry_t entry, int nr)
{
....
if (nr > 1 and ENOMEM) {
for(nr entries) {
__swap_duplicate(entry, 1, 1);
entry = next_entry;
}
}
seems a bit ugly?
maybe we can keep the swap_duplicate(swp_entry_t entry)
there? then avoid __swap_duplicate(entry, 1, 1);?
Thanks
Barry
Powered by blists - more mailing lists