lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez29frEbJG+ySVARX-bO_QWe8eUbZiV8Jq2sqYemfuqP_g@mail.gmail.com>
Date: Tue, 29 Apr 2025 17:55:59 +0200
From: Jann Horn <jannh@...gle.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Suren Baghdasaryan <surenb@...gle.com>, akpm@...ux-foundation.org, Liam.Howlett@...cle.com, 
	lorenzo.stoakes@...cle.com, david@...hat.com, vbabka@...e.cz, 
	peterx@...hat.com, hannes@...xchg.org, mhocko@...nel.org, paulmck@...nel.org, 
	shuah@...nel.org, adobriyan@...il.com, brauner@...nel.org, 
	josef@...icpanda.com, yebin10@...wei.com, linux@...ssschuh.net, 
	willy@...radead.org, osalvador@...e.de, andrii@...nel.org, 
	ryan.roberts@....com, christophe.leroy@...roup.eu, 
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	linux-mm@...ck.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v3 8/8] mm/maps: execute PROCMAP_QUERY ioctl under RCU

On Wed, Apr 23, 2025 at 12:54 AM Andrii Nakryiko
<andrii.nakryiko@...il.com> wrote:
> On Fri, Apr 18, 2025 at 10:50 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
> > Utilize speculative vma lookup to find and snapshot a vma without
> > taking mmap_lock during PROCMAP_QUERY ioctl execution. Concurrent
> > address space modifications are detected and the lookup is retried.
> > While we take the mmap_lock for reading during such contention, we
> > do that momentarily only to record new mm_wr_seq counter.
>
> PROCMAP_QUERY is an even more obvious candidate for fully lockless
> speculation, IMO (because it's more obvious that vma's use is
> localized to do_procmap_query(), instead of being spread across
> m_start/m_next and m_show as with seq_file approach). We do
> rcu_read_lock(), mmap_lock_speculate_try_begin(), query for VMA (no
> mmap_read_lock), use that VMA to produce (speculative) output, and
> then validate that VMA or mm_struct didn't change with
> mmap_lock_speculate_retry(). If it did - retry, if not - we are done.
> No need for vma_copy and any gets/puts, no?

I really strongly dislike this "fully lockless" approach because it
means we get data races all over the place, and it gets hard to reason
about what happens especially if we do anything other than reading
plain data from the VMA. When reading the implementation of
do_procmap_query(), at basically every memory read you'd have to think
twice as hard to figure out which fields can be concurrently updated
elsewhere and whether the subsequent sequence count recheck can
recover from the resulting badness.

Just as one example, I think get_vma_name() could (depending on
compiler optimizations) crash with a NULL deref if the VMA's ->vm_ops
pointer is concurrently changed to &vma_dummy_vm_ops by vma_close()
between "if (vma->vm_ops && vma->vm_ops->name)" and
"vma->vm_ops->name(vma)". And I think this illustrates how the "fully
lockless" approach creates more implicit assumptions about the
behavior of core MM code, which could be broken by future changes to
MM code.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ