lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=HUj4yhMLnBNpumxC4urSY2Js5XuekzGP+UOXJmUV=k5nx=A@mail.gmail.com>
Date:   Tue, 7 Feb 2023 10:37:06 +0900
From:   David Stevens <stevensd@...omium.org>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Peter Xu <peterx@...hat.com>, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Kirill A . Shutemov" <kirill@...temov.name>,
        Yang Shi <shy828301@...il.com>,
        David Hildenbrand <david@...hat.com>,
        Hugh Dickins <hughd@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm/khugepaged: skip shmem with userfaultfd

On Tue, Feb 7, 2023 at 6:50 AM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Mon, Feb 06, 2023 at 03:52:19PM -0500, Peter Xu wrote:
> > On Mon, Feb 06, 2023 at 07:01:39PM +0000, Matthew Wilcox wrote:
> > > On Mon, Feb 06, 2023 at 08:28:56PM +0900, David Stevens wrote:
> > > > This change first makes sure that the intermediate page cache state
> > > > during collapse is not visible by moving when gaps are filled to after
> > > > the page cache lock is acquired for the final time. This is necessary
> > > > because the synchronization provided by locking hpage is insufficient
> > > > for functions which operate on the page cache without actually locking
> > > > individual pages to examine their content (e.g. shmem_mfill_atomic_pte).
> > >
> > > I've been a little scared of touching khugepaged because, well, look at
> > > that function.  But if we are going to touch it, how about this patch
> > > first?  It does _part_ of what you need by not filling in the holes,
> > > but obviously not the part that looks at uffd.
> > >
> > > It leaves the old pages in-place and frozen.  I think this should be
> > > safe, but I haven't booted it (not entirely sure what test I'd run
> > > to prove that it's not broken)
> >
> > That logic existed since Kirill's original commit to add shmem thp support
> > on khugepaged, so Kirill should be the best to tell.. but so far it seems
> > reasonalbe to me to have that extra operation.
> >
> > The problem is khugepaged will release pgtable lock during collapsing, so
> > AFAICT there can be a race where some other thread tries to insert pages
> > into page cache in parallel with khugepaged right after khugepaged released
> > the page cache lock.
> >
> > For example, it seems to me new page cache can be inserted when khugepaged
> > is copying small page content to the new hpage.

This particular race can't happen with either patch, since the missing
page cache entries are filled when we create the multi-index entry for
hpage.

> Mmm, yes, we need to have _something_ in the page cache to block new
> pages from being added.  It can be either the new or the old pages,
> but it can't be NULL.  It could even be a RETRY entry, since that'll
> have the same effect as a frozen page.
>
> But both David's patch and mine are wrong.  Not sure what to do for
> David's problem -- maybe it's OK to have the holes temporarily filled
> with frozen / RETRY entries until we get to the point where we check
> for an uffd marker?

My patch re-counts the holes after acquiring the page cache lock for
the final time, right before creating the final hpage multi-index
entry. Since we lock present pages while iterating over the target
range, they can't have been truncated before our re-validation of
nr_none. So if the number of missing pages is still equal to nr_none,
then we know that nothing has come along and filled in a missing page.
Compared to adding some sort of marker for missing pages, this does
add another failure path for collapse, but I don't think there is any
race.

-David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ