linux-kernel - Re: [PATCH v2 1/2] mm/vma: use lockdep where we can, reduce duplication

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2e7c8808-99bc-411b-8d54-d84d8b3858a9@lucifer.local>
Date: Wed, 21 Jan 2026 09:07:16 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...nel.org>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Shakeel Butt <shakeel.butt@...ux.dev>, Jann Horn <jannh@...gle.com>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        linux-rt-devel@...ts.linux.dev, Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Boqun Feng <boqun.feng@...il.com>, Waiman Long <longman@...hat.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Clark Williams <clrkwllms@...nel.org>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH v2 1/2] mm/vma: use lockdep where we can, reduce
 duplication

On Tue, Jan 20, 2026 at 10:28:15PM +0100, Vlastimil Babka wrote:
> On 1/20/26 18:49, Lorenzo Stoakes wrote:
> > On Tue, Jan 20, 2026 at 02:53:30PM +0100, Vlastimil Babka wrote:
> >> On 1/19/26 21:59, Lorenzo Stoakes wrote:
> >> > We introduce vma_is_read_locked(), which must deal with the case in which
> >> > VMA write lock sets refcnt to VMA_LOCK_OFFSET or VMA_LOCK_OFFSET +
> >> > 1. Luckily is_vma_writer_only() already exists which we can use to check
> >> > this.
> >>
> >> So I think there's a bit of a caveat in that
> >>
> >> - is_vma_writer_only() may be a false positive if there is a temporary
> >> reader of a detached vma (per comments in vma_mark_detached() and
> >> vma_mark_detached())
> >
> > vma_mark_detached() and vma_mark_attached() I sasume you mean.
>
> Right.
>
> > OK so this is all very confusing indeed.
> >
> > This function is _only_ referring to the situation between
> > __vma_enter_locked() and __vma_exit_locked().
> >
> > Despite their names, suggesting maybe they happen on lock and unlock
> > respectively, that's not the case, they're both invoked on lock and enter
> > on start of TAKING lock and exit on completion of TAKING the
> > lock.
>
> IIUC yes, for the vma "write lock".

Yes.

>
> > It seems __vma_enter_locked() is more about getting into this state with
> > refcnt equal to VMA_LOCK_OFFSET (detaching - note elsewhere we say detached
> > of course) or VMA_LOCK_OFFSET + 1 if attached and waiting on readers who
> > are spuriously increasing the reference count.
>
> Yes.
>
> > So fundamentally is_vma_writer_only() is actually asking 'are we in the
> > midst of a VMA write lock acquisiton having finally set the VMA's refcnt to
> > VMA_LOCK_OFFSET or VMA_LOCK_OFFSET+1 but haven't yet completed acquiring
> > the lock' - e.g. having not yet called __vma_exit_locked().
>
> IIUC in the current code it's not used in that "are *we* in the midst..."
> sense but "is there a writer in that phase that we are supposed to wake up
> because we are the last reader", where a rare false positive answer only
> results in an unnecessary wakeup of said wanna-be writer, but nothing worse.

Yeah sorry I phrased that badly, 'is there a writer in the midst of...', not us
of course.

And yes this seems to be the case.

>
> And AFAIU this patch tries to reuse the function to ask "is the vma read
> locked?" (and we presume it's by us).
>
> > With __vma_enter_locked() called from:
> >
> > vma_start_write() / vma_start_write_killable()
> > -> __vma_start_write()
> > -> __vma_enter_locked()
> >
> > vma_mark_detached()
> > -> __vma_enter_locked()
> >
> > OK so in __vma_enter_locked() we add VMA_LOCK_OFFSET but then wait until we
> > get to either VMA_LOCK_OFFSET + 1 (attached) or VMA_LOCK_OFFSET (detached),
> > since presumably refcnt == 0 is detached, refcnt == 1 means write lock
> > finally acquired (but you have to check the sequence number).
> >
> > And _there_ we could have spurious readers.
>
> Yes.
>
> >
> >>
> >> - hence vma_is_read_locked() may be a false negative
> >
> > Yup.
> >
> >>
> >> - hence vma_assert_locked() might assume wrongly that we should not assert
> >> being a reader, so we vma_assert_write_locked() instead, and fail
> >
> > Aside ->
> >
> > 	Every time I come to this code it's like this - having to refresh
> > 	my memory as to how any of it works, getting confused, etc.
> >
> > 	This speaks to this being a broken abstraction similar to anon_vma.
> >
> > 	What I mean by leaked abstraction is that you seem to need to
> > 	maintain the _implementation_ context in your head to be able to
> > 	correctly implement anything. We simply are not abstracting details
> > 	here really well at all.
>
> I think this would be true if it was applied to users of the high-level API
> of the code - actual locking and unlocking for read/write. Do they have to
> care about the implementation details? Hopefully not.

No I disagree, a broken abstraction applies to maintenance too. We have to
consider _everything at once_ even trying to do change that are relevant
only to one part of the mechanism.

It's not the case in other parts of mm (apart from anon_vma) that I need to
remind myself of -the entirety of an incredibly complicated self-rolled
locking mechanism- to do _anything at all_.

>
> Here we have to think about the implementation because we are trying to
> improve the API (to add assertions) so that's not surprising? If your

Err what?^W^W OK just read the 'intermediate abstraction' bit below - yes
this is what I mean :) not the public API which is relatively OK, I mean
the intermediate levels which are very much not ;)

I'm trying to add a very basic and simple assertion of 'is lock A or lock
B' taken. And _just look_ at how difficult it's been.

This isn't a big change. This isn't a fundamental change. It's an
absolutely minor change, and frankly something that should have been in
place from the start.

I decided to add it to be a good kernel citizen (I have a MILLION things to
do) having (actually it turns out incorrectly) felt that the hard-coded
version of this was incorrect as well as wanting to be able to assert this
fundamental state in those places that we need it (very many).

After the feedback from Peter + Sebastian it seemed obviously sensible to
try to use lockdep as much as possible. And thus the voyage to the lands of
insanity began...

As always, no good deed goes unpunished :)

It's a hallmark of code that is not well abstracted that you have to
disentangle the _whole thing_ to be able to do simple things.

It's also a hallmark of code that I feel could do with being simplified and
more clearly documented.

So in the respin I'll do this.

The point I'm making really is there are _levels_ of abstraction both in
the public API _and_ the internal implementation.

The public API abstraction is generally reasonably OK. The truly broken
abstraction is in the layers below.

> complaing is about an "intermediate" abstractions level like the
> "is_vma_writer_only()" function then yeah it's far from perfect.

Right yes. This haha.

>
> > 	The fact I got this wrong despite staring at this code for ages is
> > 	indicative of that.
> >
> > 	Also the fact an ostensibly simple series has turned into a
> > 	'restore the context' discussion this long and taken this many
> > 	hours is further suggestive.
> >
> > 	I think we can do better. I'd rather not do more 'cleanup' series,
> > 	but I think this badly needs it.
> >
> > 	So maybe I'll convert this series into something that addresses
> > 	some of this stuff.
> >
> > <- Aside
> >
> >
> >>
> >> Howevever the above should mean it could be only us who is the temporary
> >> reader. And we are not going to use vma_assert_locked() during the temporary
> >> reader part (in vma_start_read()).
> >
> > I don't think so? Spurious readers can arise at any time incrementing the
> > refcnt via vma_start_read(), so the temporary readers could be anybody.
> >
> > But they'd not get the read lock, and so shouldn't call anything asserting
> > read lock.
> >
> > Anyway I think my use of is_vma_writer_only() is just broken then.
>
> AFAIU the whole thing (vma_assert_locked() after this patch) would be broken
> in a case where we are really a reader and vma_is_read_locked() returns
> false wrongly, and thus makes us perform vma_assert_write_locked() and fail.
> So the scenarios with spurious readers can't cause that I think.
>
> What I think could cause that is there being a writer (not us), causing
> is_vma_writer_only() return true even when there's also a reader (us). And I
> concluded that could happen only in case where we would be the spurious
> reader racing with a detaching writer. But when we are in the temporary
> spurious reader situation, we don't perform vma_is_read_locked() there.
>
> > We _do_ need to account for the VMA write lock scenario here, but I think
> > instead we should be good with refcnt > 1 && refcnt < VMA_LOCK_OFFSET no?
>
> That check would tell us there is no writer. But there might be a writer
> (not us) while we're a reader, and thus that check won't work as a signal
> for "we must have the read lock"?

Would it? A writer lock implies refcnt = 1 (or 0 if detached) right?

If another writer/detacher is in the midst of taking the lock then it'd be
>= VMA_LOCK_OFFSET.

We might actually encounter an issue where another thread's
vma_start_read() happens to pass the lockless check, then increments refcnt
before realising write locked and decrementing, but we get just the wrong
timing and observe refcnt > 1 && < VMA_LOCK_OFFSET, but that's very
unlikely.

Anyway the conclusion is we can't have vma_is_read_locked() at all without
lockdep. Fun! :)

I want to assert the right so I guess we'll end up with:

	if (lock_is_held(...))
		lockdep_assert(lock_is_held(...));


>
> >> So it's probably fine, but maybe worth some comments to prevent people
> >> getting suspicious and reconstructing this?
> >
> > But yeah we shouldn't be asserting this anywhere during which this should
> > be the case.
> >
> > So hopefully the above resolves the issue?
>
> I don't follow, but perhaps I misunderstood you above and it's late here now.

Well I think we've reached some point of zen now...

>
> > It's still racey (write lock might have been acquired since we checked but
> > that's just the nature of it.
> >
> > But if we use lockdep as you mention we can actually do the same 'precise
> > if lockdep, otherwise not so precise' approach in the stabilised check.
> >
> > Let me respin.
> >
> >>
> >> But I think perhaps also vma_assert_locked() could, with lockdep enabled
> >> (similarly to vma_assert_stabilised() in patch 2), use the
> >> "lock_is_held(&vma->vmlock_dep_map)" condition (without immediately
> >> asserting it) for the primary reader vs writer decision, and not rely on
> >> vma_is_read_locked()? Because lockdep has the precise information.
> >>
> >> It would likely make things more ugly, or require more refactoring, but
> >> hopefully worthwhile?
> >
> > Yeah good idea. Will do.
>
> Ack, thanks.
>
> > Maybe I need to make this into a broader refactoring series. Because this
> > so badly needs it.
>
> Yep :/
>
> > Cheers, Lorenzo
>

Back to a respin before I forget how all this works (again)...

Cheers, Lorenzo