lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <11226930.BYJfa7kJGD@nvdebian>
Date:   Fri, 3 Dec 2021 16:33:43 +1100
From:   Alistair Popple <apopple@...dia.com>
To:     Peter Xu <peterx@...hat.com>
CC:     <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
        David Hildenbrand <david@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Yang Shi <shy828301@...il.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Kirill A . Shutemov" <kirill@...temov.name>
Subject: Re: [PATCH RFC v2 1/2] mm: Don't skip swap entry even if zap_details specified

On Friday, 3 December 2021 2:21:41 PM AEDT Peter Xu wrote:
> On Thu, Dec 02, 2021 at 10:06:46PM +1100, Alistair Popple wrote:
> > On Tuesday, 16 November 2021 12:49:50 AM AEDT Peter Xu wrote:
> > > This check existed since the 1st git commit of Linux repository, but at that
> > > time there's no page migration yet so I think it's okay.
> > > 
> > > With page migration enabled, it should logically be possible that we zap some
> > > shmem pages during migration.  When that happens, IIUC the old code could have
> > > the RSS counter accounted wrong on MM_SHMEMPAGES because we will zap the ptes
> > > without decreasing the counters for the migrating entries.  I have no unit test
> > > to prove it as I don't know an easy way to trigger this condition, though.
> > > 
> > > Besides, the optimization itself is already confusing IMHO to me in a few points:
> > 
> > I've spent a bit of time looking at this and think it would be good to get
> > cleaned up as I've found it hard to follow in the past. What I haven't been
> > able to confirm is if anything relies on skipping swap entries or not. From
> > you're description it sounds like skipping swap entries was done as an
> > optimisation rather than for some functional reason is that correct?
> 
> Thanks again for looking into this patch, Alistair.  I appreciate it a lot.
> 
> I should say that it's how I understand this, and I could be wrong, that's the

That makes two of us!

> major reason why I marked this patch as RFC.
>
> As I mentioned this behavior existed in the 1st commit of git history of Linux,
> that's the time when there's no special swap entries at all but all the swap
> entries are "real" swap entries for anonymous.
> 
> That's why I think it should be an optimization because when previously
> zap_details (along with zap_details->mapping in the old code) is non-null, and
> that's definitely not an anonymous page.  Then skipping swap entry for file
> backed memory sounds like a good optimization.

Thanks. That was the detail I was trying to figure out. Ie. why might something
want to skip swap entries. I will spend some more time looking to be sure
though.

> However after that we've got all kinds of swap entries introduced, and as you
> spotted at least the migration entry should be able to exist to some file
> backed memory type (shmem).
> 
> > 
> > >   - The wording "skip swap entries" is confusing, because we're not skipping all
> > >     swap entries - we handle device private/exclusive pages before that.
> > > 
> > >   - The skip behavior is enabled as long as zap_details pointer passed over.
> > >     It's very hard to figure that out for a new zap caller because it's unclear
> > >     why we should skip swap entries when we have zap_details specified.
> > > 
> > >   - With modern systems, especially performance critical use cases, swap
> > >     entries should be rare, so I doubt the usefulness of this optimization
> > >     since it should be on a slow path anyway.
> > > 
> > >   - It is not aligned with what we do with huge pmd swap entries, where in
> > >     zap_huge_pmd() we'll do the accounting unconditionally.
> > > 
> > > This patch drops that trick, so we handle swap ptes coherently.  Meanwhile we
> > > should do the same mapping check upon migration entries too.
> > 
> > I agree, and I'm not convinced the current handling is very good - if we
> > skip zapping a migration entry then the page mapping might get restored when
> > the migration entry is removed.
> > 
> > In practice I don't think that is a problem as the migration entry target page
> > will be locked, and if I'm understanding things correctly callers of
> > unmap_mapping_*() need to have the page(s) locked anyway if they want to be
> > sure the page is unmapped. But it seems removing the migration entries better
> > matches the intent and I can't think of a reason why they should be skipped.
> 
> Exactly, that's what I see this too.
> 
> I used to think there is a bug for shmem migration (if you still remember I
> mentioned it in some of my previous patchset cover letters), but then I found
> migration requires page lock then it's probably not a real bug at all.  However
> that's never a convincing reason to ignore swap entries.

Right, it also took me a while to convince myself there wasn't a bug there so
if for some reason this patch doesn't end up going in I think we should still
treat migration entries the same way as device-private entries.

> I wanted to "ignore" this problem by the "adding a flag to skip swap entry"
> patch, but as you saw it was very not welcomed anyway, so I have no choice to
> try find the fundamental reason for skipping swap entries.  When I figured I
> cannot really find any good reason and skipping seems to be even buggy, hence
> this patch.  If this is the right way, the zap pte path can be simplified quite
> a lot after patch 2 of this series.

Yep, I think it's definitely worth trying to figure out. And if it turns out
there is some good reason for skipping we better make sure to document it in a
comment somewhere so none of this good research is lost. However I haven't yet
come up with a reason why they need to be skipped either.

 - Alistair



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ