[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpHAZ4gwh14wi8M3jt8HPwwV_P9W29qzOXwypgUk72VBgA@mail.gmail.com>
Date: Mon, 16 Dec 2024 13:53:06 -0800
From: Suren Baghdasaryan <surenb@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: akpm@...ux-foundation.org, willy@...radead.org, liam.howlett@...cle.com,
lorenzo.stoakes@...cle.com, mhocko@...e.com, vbabka@...e.cz,
hannes@...xchg.org, mjguzik@...il.com, oliver.sang@...el.com,
mgorman@...hsingularity.net, david@...hat.com, peterx@...hat.com,
oleg@...hat.com, dave@...olabs.net, paulmck@...nel.org, brauner@...nel.org,
dhowells@...hat.com, hdanton@...a.com, hughd@...gle.com,
lokeshgidra@...gle.com, minchan@...gle.com, jannh@...gle.com,
shakeel.butt@...ux.dev, souravpanda@...gle.com, pasha.tatashin@...een.com,
klarasmodin@...il.com, corbet@....net, linux-doc@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, kernel-team@...roid.com
Subject: Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a
reference count
On Mon, Dec 16, 2024 at 1:15 PM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Mon, Dec 16, 2024 at 11:24:13AM -0800, Suren Baghdasaryan wrote:
>
> FWIW, I find the whole VMA_STATE_{A,DE}TATCHED thing awkward. And
I'm bad with naming things, so any better suggestions are welcome.
Are you suggesting to drop VMA_STATE_{A,DE}TATCHED nomenclature and
use 0/1 values directly?
> perhaps s/VMA_STATE_LOCKED/VMA_LOCK_OFFSET/ ?
Sounds good. I'll change it to VMA_LOCK_OFFSET.
>
> Also, perhaps:
>
> #define VMA_REF_LIMIT (VMA_LOCK_OFFSET - 2)
Ack.
>
> > @@ -699,10 +700,27 @@ static inline void vma_numab_state_free(struct vm_area_struct *vma) {}
> > #ifdef CONFIG_PER_VMA_LOCK
> > static inline void vma_lock_init(struct vm_area_struct *vma)
> > {
> > - init_rwsem(&vma->vm_lock.lock);
> > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > + static struct lock_class_key lockdep_key;
> > +
> > + lockdep_init_map(&vma->vmlock_dep_map, "vm_lock", &lockdep_key, 0);
> > +#endif
> > + refcount_set(&vma->vm_refcnt, VMA_STATE_DETACHED);
> > vma->vm_lock_seq = UINT_MAX;
>
> Depending on how you do the actual allocation (GFP_ZERO) you might want
> to avoid that vm_refcount store entirely.
I think we could initialize it to 0 in the slab cache constructor and
when vma is freed we already ensure it's 0. So, even when reused it
will be in the correct 0 state.
>
> Perhaps instead write: VM_WARN_ON(refcount_read(&vma->vm_refcnt));
Yes, with the above approach that should work.
>
> > @@ -813,25 +849,42 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma)
> >
> > static inline void vma_assert_locked(struct vm_area_struct *vma)
> > {
> > - if (!rwsem_is_locked(&vma->vm_lock.lock))
> > + if (refcount_read(&vma->vm_refcnt) <= VMA_STATE_ATTACHED)
> if (is_vma_detached(vma))
>
> > vma_assert_write_locked(vma);
> > }
> >
> > -static inline void vma_mark_attached(struct vm_area_struct *vma)
> > +/*
> > + * WARNING: to avoid racing with vma_mark_attached(), should be called either
> > + * under mmap_write_lock or when the object has been isolated under
> > + * mmap_write_lock, ensuring no competing writers.
> > + */
> > +static inline bool is_vma_detached(struct vm_area_struct *vma)
> > {
> > - vma->detached = false;
> > + return refcount_read(&vma->vm_refcnt) == VMA_STATE_DETACHED;
> return !refcount_read(&vma->vm_refcnt);
> > }
> >
> > -static inline void vma_mark_detached(struct vm_area_struct *vma)
> > +static inline void vma_mark_attached(struct vm_area_struct *vma)
> > {
> > - /* When detaching vma should be write-locked */
> > vma_assert_write_locked(vma);
> > - vma->detached = true;
> > +
> > + if (is_vma_detached(vma))
> > + refcount_set(&vma->vm_refcnt, VMA_STATE_ATTACHED);
>
> Urgh, so it would be really good to not call this at all them not 0.
> I've not tried to untangle the mess, but this is really awkward. Surely
> you don't add it to the mas multiple times either.
The issue is that when we merge/split/shrink/grow vmas, we skip on
marking them detached while modifying them. Therefore from
vma_mark_attached() POV it will look like we are attaching an already
attached vma. I can try to clean that up if this is really a concern.
>
> Also:
>
> refcount_set(&vma->vm_refcnt, 1);
>
> is so much clearer.
Ok, IIUC you are in favour of dropping VMA_STATE_ATTACHED/VMA_STATE_DETACHED.
>
> That is, should this not live in vma_iter_store*(), right before
> mas_store_gfp() ?
Currently it's done right *after* mas_store_gfp() but I was debating
with myself if it indeed should be *before* insertion into the tree...
>
> Also, ISTR having to set vm_lock_seq right before it?
Yes, vma_mark_attached() requires vma to be write-locked beforehand,
hence the above vma_assert_write_locked(). But oftentimes it's locked
not right before vma_mark_attached() because some other modification
functions also require vma to be write-locked.
>
> > }
> >
> > -static inline bool is_vma_detached(struct vm_area_struct *vma)
> > +static inline void vma_mark_detached(struct vm_area_struct *vma)
> > {
> > - return vma->detached;
> > + vma_assert_write_locked(vma);
> > +
> > + if (is_vma_detached(vma))
> > + return;
>
> Again, this just reads like confusion :/ Surely you don't have the same
> with mas_detach?
I'll double-check if we ever double-mark vma as detached.
Thanks for the review!
>
> > +
> > + /* We are the only writer, so no need to use vma_refcount_put(). */
> > + if (!refcount_dec_and_test(&vma->vm_refcnt)) {
> > + /*
> > + * Reader must have temporarily raised vm_refcnt but it will
> > + * drop it without using the vma since vma is write-locked.
> > + */
> > + }
> > }
Powered by blists - more mailing lists