lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f81bfe4-4cc7-4754-b92f-db3a4e549f86@lucifer.local>
Date: Sat, 22 Mar 2025 07:21:49 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Jann Horn <jannh@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Matthew Wilcox <willy@...radead.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/7] mm/mremap: introduce more mergeable mremap via
 MREMAP_RELOCATE_ANON

On Sat, Mar 22, 2025 at 07:17:05AM +0100, David Hildenbrand wrote:
> On 22.03.25 06:33, David Hildenbrand wrote:
> > On 22.03.25 01:14, Jann Horn wrote:
> > > On Fri, Mar 21, 2025 at 10:54 PM Lorenzo Stoakes
> > > <lorenzo.stoakes@...cle.com> wrote:
> > > > diff --git a/mm/mremap.c b/mm/mremap.c
> > > > index 0865387531ed..bb67562a0114 100644
> > > > --- a/mm/mremap.c
> > > > +++ b/mm/mremap.c
> > > [...]
> > > > +/*
> > > > + * If the folio mapped at the specified pte entry can have its index and mapping
> > > > + * relocated, then do so.
> > > > + *
> > > > + * Returns the number of pages we have traversed, or 0 if the operation failed.
> > > > + */
> > > > +static unsigned long relocate_anon(struct pagetable_move_control *pmc,
> > > > +               unsigned long old_addr, unsigned long new_addr, pte_t pte,
> > > > +               bool undo)
> > > > +{
> > > > +       struct page *page;
> > > > +       struct folio *folio;
> > > > +       struct vm_area_struct *old, *new;
> > > > +       pgoff_t new_index;
> > > > +       unsigned long ret = 1;
> > > > +
> > > > +       old = pmc->old;
> > > > +       new = pmc->new;
> > > > +
> > > > +       /* Ensure we have truly got an anon folio. */
> > > > +       page = vm_normal_page(old, old_addr, pte);
> > > > +       if (!page)
> > > > +               return ret;
> > > > +       folio = page_folio(page);
> > > > +       folio_lock(folio);
> > > > +
> > > > +       /* no-op. */
> > > > +       if (!folio_test_anon(folio) || folio_test_ksm(folio))
> > > > +               goto out;
> > > > +
> > > > +       /*
> > > > +        * This should not happen as we explicitly disallow this, but check
> > > > +        * anyway.
> > > > +        */
> > > > +       if (folio_test_large(folio)) {
> > > > +               ret = 0;
> > > > +               goto out;
> > > > +       }
> > >
> > > Do I understand correctly that you assume here that the page is
> > > exclusively mapped? Maybe we could at least
> > > WARN_ON(folio_mapcount(folio) != 1) or something like that?
> > >
> > > (I was also wondering if the PageAnonExclusive bit is somehow
> > > relevant, but we should probably not look at or touch that here,
> > > unless we want to think about cases where we _used to_ have a child
> > > from which the page may have been GUP'd...)
> >
> > UFFDIO_MOVE implements something similar. Right now we keep it simple:
> >
> > 	if (folio_test_large(src_folio) ||
> > 	    folio_maybe_dma_pinned(src_folio) ||
> > 	    !PageAnonExclusive(&src_folio->page)) {
> > 		err = -EBUSY;
> > 		goto out;
> > 	}
> >
> > Whereby we
> >
> > a) Make sure we cover all PTEs (-> small folio, single PTE). Large
> > PTE-mapped folios are split.
> >
> > b) Make sure there are no GUP pins (maybe not required here?)
> >
> > c) The folio is exclusive to this process
>
> On additional note as my memory comes back: if PAE is set, there cannot be
> other (inactive) mappings from the swapcache. So whenever we use folio lock
> + mapcount data, the possibility of the swapcache (having inactive mappings
> from other processes etc.) must be considered.

Ack, do you have a feel for how such a check would work?

>
> --
> Cheers,
>
> David / dhildenb
>

An aside in general:

I realise all of this is very fiddly, this (albeit early) RFC is several
revisions deep and fully expect considerably more fiddly cases to come up :)

I plan to do some deeper testing on real iron running very heavy workloads to
encourage reclaim and races btw.

One test I've done in past is to just hack in forced-on MREMAP_RELOCATE_ANON so
_all_ mremap()'s that move do it, which has been a good way of finding issues,
but also in tests I add in series I intentionally trigger reclaim via
MADV_PAGEOUT.

So in general - I'm going to proactively try to really eek out all and any weird
edge case stuff where possible before more seriously pushing this series
forward.

Thanks again for early review, it's much appreciated! :) and see you at the
topic I'm giving on this and the pub after... ;)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ