[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez29frEbJG+ySVARX-bO_QWe8eUbZiV8Jq2sqYemfuqP_g@mail.gmail.com>
Date: Tue, 29 Apr 2025 17:55:59 +0200
From: Jann Horn <jannh@...gle.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Suren Baghdasaryan <surenb@...gle.com>, akpm@...ux-foundation.org, Liam.Howlett@...cle.com,
lorenzo.stoakes@...cle.com, david@...hat.com, vbabka@...e.cz,
peterx@...hat.com, hannes@...xchg.org, mhocko@...nel.org, paulmck@...nel.org,
shuah@...nel.org, adobriyan@...il.com, brauner@...nel.org,
josef@...icpanda.com, yebin10@...wei.com, linux@...ssschuh.net,
willy@...radead.org, osalvador@...e.de, andrii@...nel.org,
ryan.roberts@....com, christophe.leroy@...roup.eu,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-mm@...ck.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v3 8/8] mm/maps: execute PROCMAP_QUERY ioctl under RCU
On Wed, Apr 23, 2025 at 12:54 AM Andrii Nakryiko
<andrii.nakryiko@...il.com> wrote:
> On Fri, Apr 18, 2025 at 10:50 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
> > Utilize speculative vma lookup to find and snapshot a vma without
> > taking mmap_lock during PROCMAP_QUERY ioctl execution. Concurrent
> > address space modifications are detected and the lookup is retried.
> > While we take the mmap_lock for reading during such contention, we
> > do that momentarily only to record new mm_wr_seq counter.
>
> PROCMAP_QUERY is an even more obvious candidate for fully lockless
> speculation, IMO (because it's more obvious that vma's use is
> localized to do_procmap_query(), instead of being spread across
> m_start/m_next and m_show as with seq_file approach). We do
> rcu_read_lock(), mmap_lock_speculate_try_begin(), query for VMA (no
> mmap_read_lock), use that VMA to produce (speculative) output, and
> then validate that VMA or mm_struct didn't change with
> mmap_lock_speculate_retry(). If it did - retry, if not - we are done.
> No need for vma_copy and any gets/puts, no?
I really strongly dislike this "fully lockless" approach because it
means we get data races all over the place, and it gets hard to reason
about what happens especially if we do anything other than reading
plain data from the VMA. When reading the implementation of
do_procmap_query(), at basically every memory read you'd have to think
twice as hard to figure out which fields can be concurrently updated
elsewhere and whether the subsequent sequence count recheck can
recover from the resulting badness.
Just as one example, I think get_vma_name() could (depending on
compiler optimizations) crash with a NULL deref if the VMA's ->vm_ops
pointer is concurrently changed to &vma_dummy_vm_ops by vma_close()
between "if (vma->vm_ops && vma->vm_ops->name)" and
"vma->vm_ops->name(vma)". And I think this illustrates how the "fully
lockless" approach creates more implicit assumptions about the
behavior of core MM code, which could be broken by future changes to
MM code.
Powered by blists - more mailing lists