[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <11226930.BYJfa7kJGD@nvdebian>
Date: Fri, 3 Dec 2021 16:33:43 +1100
From: Alistair Popple <apopple@...dia.com>
To: Peter Xu <peterx@...hat.com>
CC: <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
David Hildenbrand <david@...hat.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Yang Shi <shy828301@...il.com>,
Vlastimil Babka <vbabka@...e.cz>,
Hugh Dickins <hughd@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
"Kirill A . Shutemov" <kirill@...temov.name>
Subject: Re: [PATCH RFC v2 1/2] mm: Don't skip swap entry even if zap_details specified
On Friday, 3 December 2021 2:21:41 PM AEDT Peter Xu wrote:
> On Thu, Dec 02, 2021 at 10:06:46PM +1100, Alistair Popple wrote:
> > On Tuesday, 16 November 2021 12:49:50 AM AEDT Peter Xu wrote:
> > > This check existed since the 1st git commit of Linux repository, but at that
> > > time there's no page migration yet so I think it's okay.
> > >
> > > With page migration enabled, it should logically be possible that we zap some
> > > shmem pages during migration. When that happens, IIUC the old code could have
> > > the RSS counter accounted wrong on MM_SHMEMPAGES because we will zap the ptes
> > > without decreasing the counters for the migrating entries. I have no unit test
> > > to prove it as I don't know an easy way to trigger this condition, though.
> > >
> > > Besides, the optimization itself is already confusing IMHO to me in a few points:
> >
> > I've spent a bit of time looking at this and think it would be good to get
> > cleaned up as I've found it hard to follow in the past. What I haven't been
> > able to confirm is if anything relies on skipping swap entries or not. From
> > you're description it sounds like skipping swap entries was done as an
> > optimisation rather than for some functional reason is that correct?
>
> Thanks again for looking into this patch, Alistair. I appreciate it a lot.
>
> I should say that it's how I understand this, and I could be wrong, that's the
That makes two of us!
> major reason why I marked this patch as RFC.
>
> As I mentioned this behavior existed in the 1st commit of git history of Linux,
> that's the time when there's no special swap entries at all but all the swap
> entries are "real" swap entries for anonymous.
>
> That's why I think it should be an optimization because when previously
> zap_details (along with zap_details->mapping in the old code) is non-null, and
> that's definitely not an anonymous page. Then skipping swap entry for file
> backed memory sounds like a good optimization.
Thanks. That was the detail I was trying to figure out. Ie. why might something
want to skip swap entries. I will spend some more time looking to be sure
though.
> However after that we've got all kinds of swap entries introduced, and as you
> spotted at least the migration entry should be able to exist to some file
> backed memory type (shmem).
>
> >
> > > - The wording "skip swap entries" is confusing, because we're not skipping all
> > > swap entries - we handle device private/exclusive pages before that.
> > >
> > > - The skip behavior is enabled as long as zap_details pointer passed over.
> > > It's very hard to figure that out for a new zap caller because it's unclear
> > > why we should skip swap entries when we have zap_details specified.
> > >
> > > - With modern systems, especially performance critical use cases, swap
> > > entries should be rare, so I doubt the usefulness of this optimization
> > > since it should be on a slow path anyway.
> > >
> > > - It is not aligned with what we do with huge pmd swap entries, where in
> > > zap_huge_pmd() we'll do the accounting unconditionally.
> > >
> > > This patch drops that trick, so we handle swap ptes coherently. Meanwhile we
> > > should do the same mapping check upon migration entries too.
> >
> > I agree, and I'm not convinced the current handling is very good - if we
> > skip zapping a migration entry then the page mapping might get restored when
> > the migration entry is removed.
> >
> > In practice I don't think that is a problem as the migration entry target page
> > will be locked, and if I'm understanding things correctly callers of
> > unmap_mapping_*() need to have the page(s) locked anyway if they want to be
> > sure the page is unmapped. But it seems removing the migration entries better
> > matches the intent and I can't think of a reason why they should be skipped.
>
> Exactly, that's what I see this too.
>
> I used to think there is a bug for shmem migration (if you still remember I
> mentioned it in some of my previous patchset cover letters), but then I found
> migration requires page lock then it's probably not a real bug at all. However
> that's never a convincing reason to ignore swap entries.
Right, it also took me a while to convince myself there wasn't a bug there so
if for some reason this patch doesn't end up going in I think we should still
treat migration entries the same way as device-private entries.
> I wanted to "ignore" this problem by the "adding a flag to skip swap entry"
> patch, but as you saw it was very not welcomed anyway, so I have no choice to
> try find the fundamental reason for skipping swap entries. When I figured I
> cannot really find any good reason and skipping seems to be even buggy, hence
> this patch. If this is the right way, the zap pte path can be simplified quite
> a lot after patch 2 of this series.
Yep, I think it's definitely worth trying to figure out. And if it turns out
there is some good reason for skipping we better make sure to document it in a
comment somewhere so none of this good research is lost. However I haven't yet
come up with a reason why they need to be skipped either.
- Alistair
Powered by blists - more mailing lists