[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <72e044ba-64af-49c0-8b87-ead508654fb7@lucifer.local>
Date: Thu, 20 Feb 2025 09:47:32 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Kalesh Singh <kaleshsingh@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Suren Baghdasaryan <surenb@...gle.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Matthew Wilcox <willy@...radead.org>, Vlastimil Babka <vbabka@...e.cz>,
"Paul E . McKenney" <paulmck@...nel.org>, Jann Horn <jannh@...gle.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Shuah Khan <shuah@...nel.org>, linux-kselftest@...r.kernel.org,
linux-api@...r.kernel.org, John Hubbard <jhubbard@...dia.com>,
Juan Yescas <jyescas@...gle.com>
Subject: Re: [PATCH 0/4] mm: permit guard regions for file-backed/shmem
mappings
On Thu, Feb 20, 2025 at 10:23:57AM +0100, David Hildenbrand wrote:
> On 20.02.25 10:04, Lorenzo Stoakes wrote:
> > On Thu, Feb 20, 2025 at 09:57:37AM +0100, David Hildenbrand wrote:
> > > On 20.02.25 09:51, Lorenzo Stoakes wrote:
> > > > On Wed, Feb 19, 2025 at 12:56:31PM -0800, Kalesh Singh wrote:
> > > > > > We also can't change smaps in the way you want, it _has_ to still give
> > > > > > output per VMA information.
> > > > >
> > > > > Sorry I wasn't suggesting to change the entries in smaps, rather
> > > > > agreeing to your marker suggestion. Maybe a set of ranges for each
> > > > > smaps entry that has guards? It doesn't solve the use case, but does
> > > > > make these regions visible to userspace.
> > > >
> > > > No, you are not providing a usecase for this. /proc/$pid/pagemaps does not
> > > > contaminate the smaps output, mess with efforts to make it RCU readable,
> > > > require updating the ioctl interface, etc. so it is clearly the better
> > > > choice.
> > > >
> > > > >
> > > > > >
> > > > > > The proposed change that would be there would be a flag or something
> > > > > > indicating that the VMA has guard regions _SOMEWHERE_ in it.
> > > > > >
> > > > > > Since this doesn't solve your problem, adds complexity, and nobody else
> > > > > > seems to need it, I would suggest this is not worthwhile and I'd rather not
> > > > > > do this.
> > > > > >
> > > > > > Therefore for your needs there are literally only two choices here:
> > > > > >
> > > > > > 1. Add a bit to /proc/$pid/pagemap OR
> > > > > > 2. a new interface.
> > > > > >
> > > > > > I am not in favour of a new interface here, if we can just extend pagemap.
> > > > > >
> > > > > > What you'd have to do is:
> > > > > >
> > > > > > 1. Find virtual ranges via /proc/$pid/maps
> > > > > > 2. iterate through /proc/$pid/pagemaps to retrieve state for all ranges.
> > > > > >
> > > > >
> > > > > Could we also consider an smaps field like:
> > > > >
> > > > > VmGuards: [AAA, BBB), [CCC, DDD), ...
> > > > >
> > > > > or something of that sort?
> > > >
> > > > No, absolutely, categorically not. You realise these could be thousands of
> > > > characters long right?
> > > >
> > > > /proc/$pid/pagemaps resolves this without contaminating this output.
> > > >
> > > > > > Well I'm glad that you guys find it useful for _something_ ;)
> > > > > >
> > > > > > Again this wasn't written only for you (it is broadly a good feature for
> > > > > > upstream), but I did have your use case in mind, so I'm a little
> > > > > > disappointed that it doesn't help, as I like to solve problems.
> > > > > >
> > > > > > But I'm glad it solves at least some for you...
> > > > >
> > > > > I recall Liam had a proposal to store the guard ranges in the maple tree?
> > > > >
> > > > > I wonder if that can be used in combination with this approach to have
> > > > > a better representation of this?
> > > >
> > > > This was an alternative proposal made prior to the feature being
> > > > implemented (and you and others at Google were welcome to comment and many
> > > > were cc'd, etc.).
> > > >
> > > > There is no 'in combination with'. This feature would take weeks/months to
> > > > implement, fundamentally impact the maple tree VMA implementation
> > > > and... not actually achieve anything + immediately be redundant.
> > > >
> > > > Plus it'd likely be slower, have locking implications, would have kernel
> > > > memory allocation implications, a lot more complexity and probably other
> > > > problems besides (we discussed this at length at the time and a number of
> > > > issues came up, I can't recall all of them).
> > > >
> > > > To be crystal clear - we are empathically NOT changing /proc/$pid/maps to
> > > > lie about VMAs regardless of underlying implementation, nor adding
> > > > thousands of characters to /proc/$pid/smaps entries.
> > >
> > > Yes. Calling it a "guard region" might be part of the problem
> > > (/"misunderstanding"), because it reminds people of "virtual memory
> > > regions".
> > >
> > > "Guard markers" or similar might have been clearer that these operate on
> > > individual PTEs, require page table scanning etc ... which makes them a lot
> > > more scalable and fine-grained and provides all these benfits, with the
> > > downside being that we don't end up with that many "virtual memory regions"
> > > that maps/smaps operate on.
> >
> > Honestly David you and the naming... :P
> >
> > I disagree, sorry. Saying 'guard' anything might make people think one>
> thing or another. We can't account for that. I mean don't get me started on
> > 'pinning' or any of the million other overloaded terms we use...
> >
> > I _hugely_ publicly went out of my way to express the limitations, I gave
> a> talk, we had meetings, I mentioned it in the series.
> >
> > Honestly if at that point you still don't realise, that's not a naming
> > problem. It's a 'did not participate with upstream' problem.
> >
> > I like guard regions, as they're not pages as we previously referred to
> > them. People have no idea what a marker is, it doesn't sound like it spans
> > ranges, no don't like it sorry.
> >
> > And sorry but this naming topic is closed :) I already let you change the
> > naming of the MADV_'s, which broke my heart, there will not be a second
> > heart breaking...
>
> Lorenzo, I was not pushing for it to be changed *now*, that ship has sailed,
> and personally, I *don't* find it confusing because I know how it works
> under the hood.
>
> I was trying to find a reason *why* people would thing that it would show up
> in smaps in the first place. For example, just when reading the MAN page
> *today*.
>
> Doesn't really matter now, it is named the way it is, and all we can do is
> try making documentation clearer if it keeps confusing people.
Right, but I disagree with your analysis. In case as you say, it doesn't
really matter at this stage.
>
> Your conclusion is 'did not participate with upstream'; I don't agree with
> that. But maybe you and Kalesh have a history on that that let's you react
> on his questions IMHO more emotionally than it should have been.
This is wholly unfair, I have been very reasonable in response to this
thread. I have offered to find solutions, I have tried to understand the
problem in spite of having gone to great lengths to try to discuss the
limitations of the proposed approach in every venue I possibly could.
I go out of my way to deal professionally and objectively with what is
presented. Nothing here is emotional. So I'd ask that you please abstain
from making commentary like this which has no basis.
My point about not participating with upstream is that Kalesh is now asking
for an -entirely different- approach to guard regions than was implemented,
months after it was merged.
This is after many discussion were had including with people at Google
among other organisations, an RFC which clearly delineated the limitations,
a talk at LPC also.
I feel I communite these things better than many people actually, I go out
of my way to document, add extensive self-documenting tests, etc. etc. and
to participate, engage and take onboard feedback from others.
So I'm suggesting that clearly - something broke down here. There was a
miscommunication, or there was a lack of awareness of a key requirement.
I mean a large motivator for file-backed support here came from the LPC
talk give by Kalesh and Juan re: ELF guard regions. So obviously that this
breakdown in communication with upstream occurred is very unfortunate.
I am not blaming anybody or being 'emotional'. I am simply stating what
seems to me to be a clear fact.
I genuinely don't understand how it could be seen any other way? How can
requesting that an entirely different alternative approach months after the
fact be anything other than some failure to engage with upstream?
I am emphatically _not_ blaming Kalesh by the way, whom I respect and with
whom I have no problem, in any way whatsoever. I apologised when I realised
that he simply was not aware of this limitation at LPC if you look through
the thread, having politely suggested disappointment at this not having
been brought up then.
I am suggesting that within Google clearly there has been some form of
miscommunication or failure for one aspect of things (this limitation) to
be expressed at the right time to engage with upstream.
Sorry if you misinterpreted that as something else, this is the _only_
point I am making here.
And again, I am going out of my way to find practical and helpful ways
forward - though I cannot see how we can possibly fulfil Kalesh's needs
here.
>
>
> >
> > >
> > > [...]
> > >
> > > >
> > > > As I said to you earlier, the _best_ we could do in smaps would be to add a
> > > > flag like 'Grd' or something to indicate some part of the VMA is
> > > > guarded. But I won't do that unless somebody has an -actual use case- for
> > > > it.
> > >
> > > Right, and that would limit where you have to manually scan. Something
> > > similar is being done with uffd-wp markers IIRC.
> >
> > Yeah that's a good point, but honestly if you're reading smaps that reads
> > the page tables, then reading /proc/$pid/pagemaps and reading page tables
> > TWICE that seems inefficient vs. just reading /proc/$pid/maps, then reading
> > /proc/$pid/pagemaps and reading page tables once.
>
> Right; I recently wished that we would have an interface to obtain more VMA
> flags without having to go through smaps
Well maybe that lends itself to the idea of adding a whole new interface in
general...
>
> --
> Cheers,
>
> David / dhildenb
>
Powered by blists - more mailing lists