lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YamNRcrLDOPjG9wg@xz-m1.local>
Date:   Fri, 3 Dec 2021 11:21:41 +0800
From:   Peter Xu <peterx@...hat.com>
To:     Alistair Popple <apopple@...dia.com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        David Hildenbrand <david@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Yang Shi <shy828301@...il.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Kirill A . Shutemov" <kirill@...temov.name>
Subject: Re: [PATCH RFC v2 1/2] mm: Don't skip swap entry even if zap_details
 specified

On Thu, Dec 02, 2021 at 10:06:46PM +1100, Alistair Popple wrote:
> On Tuesday, 16 November 2021 12:49:50 AM AEDT Peter Xu wrote:
> > This check existed since the 1st git commit of Linux repository, but at that
> > time there's no page migration yet so I think it's okay.
> > 
> > With page migration enabled, it should logically be possible that we zap some
> > shmem pages during migration.  When that happens, IIUC the old code could have
> > the RSS counter accounted wrong on MM_SHMEMPAGES because we will zap the ptes
> > without decreasing the counters for the migrating entries.  I have no unit test
> > to prove it as I don't know an easy way to trigger this condition, though.
> > 
> > Besides, the optimization itself is already confusing IMHO to me in a few points:
> 
> I've spent a bit of time looking at this and think it would be good to get
> cleaned up as I've found it hard to follow in the past. What I haven't been
> able to confirm is if anything relies on skipping swap entries or not. From
> you're description it sounds like skipping swap entries was done as an
> optimisation rather than for some functional reason is that correct?

Thanks again for looking into this patch, Alistair.  I appreciate it a lot.

I should say that it's how I understand this, and I could be wrong, that's the
major reason why I marked this patch as RFC.

As I mentioned this behavior existed in the 1st commit of git history of Linux,
that's the time when there's no special swap entries at all but all the swap
entries are "real" swap entries for anonymous.

That's why I think it should be an optimization because when previously
zap_details (along with zap_details->mapping in the old code) is non-null, and
that's definitely not an anonymous page.  Then skipping swap entry for file
backed memory sounds like a good optimization.

However after that we've got all kinds of swap entries introduced, and as you
spotted at least the migration entry should be able to exist to some file
backed memory type (shmem).

> 
> >   - The wording "skip swap entries" is confusing, because we're not skipping all
> >     swap entries - we handle device private/exclusive pages before that.
> > 
> >   - The skip behavior is enabled as long as zap_details pointer passed over.
> >     It's very hard to figure that out for a new zap caller because it's unclear
> >     why we should skip swap entries when we have zap_details specified.
> > 
> >   - With modern systems, especially performance critical use cases, swap
> >     entries should be rare, so I doubt the usefulness of this optimization
> >     since it should be on a slow path anyway.
> > 
> >   - It is not aligned with what we do with huge pmd swap entries, where in
> >     zap_huge_pmd() we'll do the accounting unconditionally.
> > 
> > This patch drops that trick, so we handle swap ptes coherently.  Meanwhile we
> > should do the same mapping check upon migration entries too.
> 
> I agree, and I'm not convinced the current handling is very good - if we
> skip zapping a migration entry then the page mapping might get restored when
> the migration entry is removed.
> 
> In practice I don't think that is a problem as the migration entry target page
> will be locked, and if I'm understanding things correctly callers of
> unmap_mapping_*() need to have the page(s) locked anyway if they want to be
> sure the page is unmapped. But it seems removing the migration entries better
> matches the intent and I can't think of a reason why they should be skipped.

Exactly, that's what I see this too.

I used to think there is a bug for shmem migration (if you still remember I
mentioned it in some of my previous patchset cover letters), but then I found
migration requires page lock then it's probably not a real bug at all.  However
that's never a convincing reason to ignore swap entries.

I wanted to "ignore" this problem by the "adding a flag to skip swap entry"
patch, but as you saw it was very not welcomed anyway, so I have no choice to
try find the fundamental reason for skipping swap entries.  When I figured I
cannot really find any good reason and skipping seems to be even buggy, hence
this patch.  If this is the right way, the zap pte path can be simplified quite
a lot after patch 2 of this series.

-- 
Peter Xu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ