linux-kernel - Re: [PATCH 1/1] mm: replace vma_start_write() with vma_start_write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpGKoy2Aj9f-gfKDmsa5wWvv9=b3mS6hRgaADQGrd8dYEQ@mail.gmail.com>
Date: Tue, 10 Feb 2026 15:41:05 -0800
From: Suren Baghdasaryan <surenb@...gle.com>
To: Jann Horn <jannh@...gle.com>
Cc: akpm@...ux-foundation.org, willy@...radead.org, david@...nel.org, 
	ziy@...dia.com, matthew.brost@...el.com, joshua.hahnjy@...il.com, 
	rakie.kim@...com, byungchul@...com, gourry@...rry.net, 
	ying.huang@...ux.alibaba.com, apopple@...dia.com, lorenzo.stoakes@...cle.com, 
	baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com, npache@...hat.com, 
	ryan.roberts@....com, dev.jain@....com, baohua@...nel.org, 
	lance.yang@...ux.dev, vbabka@...e.cz, rppt@...nel.org, mhocko@...e.com, 
	pfalcato@...e.de, kees@...nel.org, maddy@...ux.ibm.com, npiggin@...il.com, 
	mpe@...erman.id.au, chleroy@...nel.org, linux-mm@...ck.org, 
	linuxppc-dev@...ts.ozlabs.org, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/1] mm: replace vma_start_write() with vma_start_write_killable()

On Tue, Feb 10, 2026 at 1:19 PM Jann Horn <jannh@...gle.com> wrote:
>
> On Mon, Feb 9, 2026 at 11:08 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
> > Now that we have vma_start_write_killable() we can replace most of the
> > vma_start_write() calls with it, improving reaction time to the kill
> > signal.
> [...]
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index dbd48502ac24..3de7ab4f4cee 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> [...]
> > @@ -1808,7 +1817,11 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
> >                         break;
> >                 }
> >
> > -               vma_start_write(vma);
> > +               if (vma_start_write_killable(vma)) {
> > +                       err = -EINTR;
>
> Doesn't this need mpol_put(new)? Or less complicated, move the
> vma_start_write_killable() up to somewhere above the mpol_dup() call.

Thanks for the review, Jann!

Yes you are right. I'll move it before mpol_dup().

>
> > +                       break;
> > +               }
> > +
> >                 new->home_node = home_node;
> >                 err = mbind_range(&vmi, vma, &prev, start, end, new);
> >                 mpol_put(new);
> [...]
> > diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> > index a94c401ab2cf..dc9f7a7709c6 100644
> > --- a/mm/pagewalk.c
> > +++ b/mm/pagewalk.c
> > @@ -425,14 +425,13 @@ static inline void process_mm_walk_lock(struct mm_struct *mm,
> >                 mmap_assert_write_locked(mm);
> >  }
> >
> > -static inline void process_vma_walk_lock(struct vm_area_struct *vma,
> > +static inline int process_vma_walk_lock(struct vm_area_struct *vma,
> >                                          enum page_walk_lock walk_lock)
> >  {
> >  #ifdef CONFIG_PER_VMA_LOCK
> >         switch (walk_lock) {
> >         case PGWALK_WRLOCK:
> > -               vma_start_write(vma);
> > -               break;
> > +               return vma_start_write_killable(vma);
>
> There are two users of PGWALK_WRLOCK in arch/s390/mm/gmap.c code that
> don't check pagewalk return values, have you checked that they are not
> negatively affected by this new possible error return?

Uh, even the ones which check for the error like queue_pages_range()
don't seem to handle it well. I'll split this part into a separate
patch as I think it will be sizable and will go over all users to
ensure they handle the new error.

>
> >         case PGWALK_WRLOCK_VERIFY:
> >                 vma_assert_write_locked(vma);
> >                 break;
> [...]
> > diff --git a/mm/vma.c b/mm/vma.c
> > index be64f781a3aa..3cfb81b3b7cf 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -540,8 +540,12 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >         if (new->vm_ops && new->vm_ops->open)
> >                 new->vm_ops->open(new);
> >
> > -       vma_start_write(vma);
> > -       vma_start_write(new);
> > +       err = vma_start_write_killable(vma);
> > +       if (err)
> > +               goto out_fput;
> > +       err = vma_start_write_killable(new);
> > +       if (err)
> > +               goto out_fput;
>
> What about the new->vm_ops->open() call and the anon_vma_clone()
> above? I don't think the error path properly undoes either. These
> calls should probably be moved further up, so that the point of no
> return in this function stays where it was.

Ack.

>
> >         init_vma_prep(&vp, vma);
> >         vp.insert = new;
> [...]
> > @@ -1155,10 +1168,12 @@ int vma_expand(struct vma_merge_struct *vmg)
> >         struct vm_area_struct *next = vmg->next;
> >         bool remove_next = false;
> >         vm_flags_t sticky_flags;
> > -       int ret = 0;
> > +       int ret;
> >
> >         mmap_assert_write_locked(vmg->mm);
> > -       vma_start_write(target);
> > +       ret = vma_start_write_killable(target);
> > +       if (ret)
> > +               return ret;
> >
> >         if (next && target != next && vmg->end == next->vm_end)
> >                 remove_next = true;
> > @@ -1186,17 +1201,19 @@ int vma_expand(struct vma_merge_struct *vmg)
> >          * Note that, by convention, callers ignore OOM for this case, so
> >          * we don't need to account for vmg->give_up_on_mm here.
> >          */
> > -       if (remove_next)
> > +       if (remove_next) {
> > +               ret = vma_start_write_killable(next);
> > +               if (ret)
> > +                       return ret;
> >                 ret = dup_anon_vma(target, next, &anon_dup);
> > +       }
> >         if (!ret && vmg->copied_from)
> >                 ret = dup_anon_vma(target, vmg->copied_from, &anon_dup);
> >         if (ret)
> >                 return ret;
>
> nit: the control flow here is kinda chaotic, with some "if (ret)
> return ret;" mixed with "if (!ret && ...) ret = ...;".

I'll see what I can do about it but probably as a separate patch.

>
> >
> > -       if (remove_next) {
> > -               vma_start_write(next);
> > +       if (remove_next)
> >                 vmg->__remove_next = true;
> > -       }
> >         if (commit_merge(vmg))
> >                 goto nomem;
> >
> [...]
> > @@ -2211,9 +2240,8 @@ int mm_take_all_locks(struct mm_struct *mm)
> >          * is reached.
> >          */
> >         for_each_vma(vmi, vma) {
> > -               if (signal_pending(current))
> > +               if (vma_start_write_killable(vma))
> >                         goto out_unlock;
> > -               vma_start_write(vma);
>
> nit: might want to keep the signal_pending() so that this can sort of
> be interrupted by non-fatal signals, which seems to be the intention

Yes, I will bring back that check.

>
> >         }
> >
> >         vma_iter_init(&vmi, mm, 0);
> > @@ -2549,7 +2577,9 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
> >  #endif
> >
> >         /* Lock the VMA since it is modified after insertion into VMA tree */
> > -       vma_start_write(vma);
> > +       error = vma_start_write_killable(vma);
> > +       if (error)
> > +               goto free_iter_vma;
>
> This seems way past the point of no return, we've already called the
> ->mmap() handler which I think means removing the VMA again would
> require a ->close() call. The VMA should be locked further up if we
> want to do it killably.

Yeah, I realized this big issue after posting the patch. Moving it up
seems possible, so I'll try that.

Thanks,
Suren.

>
> >         vma_iter_store_new(vmi, vma);
> >         map->mm->map_count++;
> >         vma_link_file(vma, map->hold_file_rmap_lock);
> >