[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <61ca35cd-5c08-4196-89b6-ec3feda69e36@lucifer.local>
Date: Tue, 20 Jan 2026 17:49:19 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...nel.org>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Shakeel Butt <shakeel.butt@...ux.dev>, Jann Horn <jannh@...gle.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
linux-rt-devel@...ts.linux.dev, Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
Boqun Feng <boqun.feng@...il.com>, Waiman Long <longman@...hat.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Clark Williams <clrkwllms@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH v2 1/2] mm/vma: use lockdep where we can, reduce
duplication
On Tue, Jan 20, 2026 at 02:53:30PM +0100, Vlastimil Babka wrote:
> On 1/19/26 21:59, Lorenzo Stoakes wrote:
> > We introduce vma_is_read_locked(), which must deal with the case in which
> > VMA write lock sets refcnt to VMA_LOCK_OFFSET or VMA_LOCK_OFFSET +
> > 1. Luckily is_vma_writer_only() already exists which we can use to check
> > this.
>
> So I think there's a bit of a caveat in that
>
> - is_vma_writer_only() may be a false positive if there is a temporary
> reader of a detached vma (per comments in vma_mark_detached() and
> vma_mark_detached())
vma_mark_detached() and vma_mark_attached() I sasume you mean.
OK so this is all very confusing indeed.
This function is _only_ referring to the situation between
__vma_enter_locked() and __vma_exit_locked().
Despite their names, suggesting maybe they happen on lock and unlock
respectively, that's not the case, they're both invoked on lock and enter
on start of TAKING lock and exit on completion of TAKING the
lock.
It seems __vma_enter_locked() is more about getting into this state with
refcnt equal to VMA_LOCK_OFFSET (detaching - note elsewhere we say detached
of course) or VMA_LOCK_OFFSET + 1 if attached and waiting on readers who
are spuriously increasing the reference count.
So fundamentally is_vma_writer_only() is actually asking 'are we in the
midst of a VMA write lock acquisiton having finally set the VMA's refcnt to
VMA_LOCK_OFFSET or VMA_LOCK_OFFSET+1 but haven't yet completed acquiring
the lock' - e.g. having not yet called __vma_exit_locked().
With __vma_enter_locked() called from:
vma_start_write() / vma_start_write_killable()
-> __vma_start_write()
-> __vma_enter_locked()
vma_mark_detached()
-> __vma_enter_locked()
OK so in __vma_enter_locked() we add VMA_LOCK_OFFSET but then wait until we
get to either VMA_LOCK_OFFSET + 1 (attached) or VMA_LOCK_OFFSET (detached),
since presumably refcnt == 0 is detached, refcnt == 1 means write lock
finally acquired (but you have to check the sequence number).
And _there_ we could have spurious readers.
>
> - hence vma_is_read_locked() may be a false negative
Yup.
>
> - hence vma_assert_locked() might assume wrongly that we should not assert
> being a reader, so we vma_assert_write_locked() instead, and fail
Aside ->
Every time I come to this code it's like this - having to refresh
my memory as to how any of it works, getting confused, etc.
This speaks to this being a broken abstraction similar to anon_vma.
What I mean by leaked abstraction is that you seem to need to
maintain the _implementation_ context in your head to be able to
correctly implement anything. We simply are not abstracting details
here really well at all.
The fact I got this wrong despite staring at this code for ages is
indicative of that.
Also the fact an ostensibly simple series has turned into a
'restore the context' discussion this long and taken this many
hours is further suggestive.
I think we can do better. I'd rather not do more 'cleanup' series,
but I think this badly needs it.
So maybe I'll convert this series into something that addresses
some of this stuff.
<- Aside
>
> Howevever the above should mean it could be only us who is the temporary
> reader. And we are not going to use vma_assert_locked() during the temporary
> reader part (in vma_start_read()).
I don't think so? Spurious readers can arise at any time incrementing the
refcnt via vma_start_read(), so the temporary readers could be anybody.
But they'd not get the read lock, and so shouldn't call anything asserting
read lock.
Anyway I think my use of is_vma_writer_only() is just broken then.
We _do_ need to account for the VMA write lock scenario here, but I think
instead we should be good with refcnt > 1 && refcnt < VMA_LOCK_OFFSET no?
> So it's probably fine, but maybe worth some comments to prevent people
> getting suspicious and reconstructing this?
But yeah we shouldn't be asserting this anywhere during which this should
be the case.
So hopefully the above resolves the issue?
It's still racey (write lock might have been acquired since we checked but
that's just the nature of it.
But if we use lockdep as you mention we can actually do the same 'precise
if lockdep, otherwise not so precise' approach in the stabilised check.
Let me respin.
>
> But I think perhaps also vma_assert_locked() could, with lockdep enabled
> (similarly to vma_assert_stabilised() in patch 2), use the
> "lock_is_held(&vma->vmlock_dep_map)" condition (without immediately
> asserting it) for the primary reader vs writer decision, and not rely on
> vma_is_read_locked()? Because lockdep has the precise information.
>
> It would likely make things more ugly, or require more refactoring, but
> hopefully worthwhile?
Yeah good idea. Will do.
Maybe I need to make this into a broader refactoring series. Because this
so badly needs it.
Cheers, Lorenzo
Powered by blists - more mailing lists