[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Ygw1Zn+WUuY5WkZy@casper.infradead.org>
Date: Tue, 15 Feb 2022 23:21:10 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Hugh Dickins <hughd@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...e.com>,
Vlastimil Babka <vbabka@...e.cz>,
"Kirill A. Shutemov" <kirill@...temov.name>,
David Hildenbrand <david@...hat.com>,
Alistair Popple <apopple@...dia.com>,
Johannes Weiner <hannes@...xchg.org>,
Rik van Riel <riel@...riel.com>,
Suren Baghdasaryan <surenb@...gle.com>,
Yu Zhao <yuzhao@...gle.com>, Greg Thelen <gthelen@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
Yang Li <yang.lee@...ux.alibaba.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v2 04/13] mm/munlock: rmap call mlock_vma_page()
munlock_vma_page()
On Tue, Feb 15, 2022 at 01:38:20PM -0800, Hugh Dickins wrote:
> On Tue, 15 Feb 2022, Matthew Wilcox wrote:
> > On Mon, Feb 14, 2022 at 06:26:39PM -0800, Hugh Dickins wrote:
> > > Add vma argument to mlock_vma_page() and munlock_vma_page(), make them
> > > inline functions which check (vma->vm_flags & VM_LOCKED) before calling
> > > mlock_page() and munlock_page() in mm/mlock.c.
> > >
> > > Add bool compound to mlock_vma_page() and munlock_vma_page(): this is
> > > because we have understandable difficulty in accounting pte maps of THPs,
> > > and if passed a PageHead page, mlock_page() and munlock_page() cannot
> > > tell whether it's a pmd map to be counted or a pte map to be ignored.
> > >
> > [...]
> > >
> > > Mlock accounting on THPs has been hard to define, differed between anon
> > > and file, involved PageDoubleMap in some places and not others, required
> > > clear_page_mlock() at some points. Keep it simple now: just count the
> > > pmds and ignore the ptes, there is no reason for ptes to undo pmd mlocks.
> >
> > How would you suggest we handle the accounting for folios which are
> > intermediate in size between PMDs and PTEs? eg, an order-4 page?
> > Would it make sense to increment mlock_count by HUGE_PMD_NR for
> > each PMD mapping and by 1 for each PTE mapping?
>
> I think you're asking the wrong question here, but perhaps you've
> already decided there's only one satisfactory answer to the right question.
Or I've gravely misunderstood the situation. Or explained my concern
badly. The possibilities are endless!
My concern is that a filesystem may create an order-4 folio, an
application mmaps the folio and then calls mlock() (either over a portion
or the entirety of the folio). As far as I can tell, we then do not
move the folio onto the unevictable list because it is of order >0 and
is only mapped by PTEs. This presumably then has performance problems
(or we wouldn't need to have an unevictable list in the first place).
> The question I thought you should be asking is about how to count them
> in Mlocked. That's tough; but I take it for granted that you would not
> want per-subpage flags and counts involved (or not unless forced to do
> so by some regression that turns out to matter). And I think the only
> satisfactory answer is to count the whole compound_nr() as Mlocked
> when any part of it (a single pte, a series of ptes, a pmd) is mlocked;
> and (try to) move folio to Unevictable whenever any part of it is mlocked.
I think that makes sense. As with so many other things, we choose to
manage memory in >PAGE_SIZE chunks. If you mlock() a part of a folio,
we lock the whole folio in memory, and it all counts as being locked.
Powered by blists - more mailing lists