lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bulkje7nsdfikukca4g6lqnwda6ll7eu2pcdn5bdhkqeyl7auh@yzzc6xkqqllm>
Date: Wed, 9 Jul 2025 12:11:45 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Suren Baghdasaryan <surenb@...gle.com>
Cc: Vlastimil Babka <vbabka@...e.cz>,
        Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
        akpm@...ux-foundation.org, david@...hat.com, peterx@...hat.com,
        jannh@...gle.com, hannes@...xchg.org, mhocko@...nel.org,
        paulmck@...nel.org, shuah@...nel.org, adobriyan@...il.com,
        brauner@...nel.org, josef@...icpanda.com, yebin10@...wei.com,
        linux@...ssschuh.net, willy@...radead.org, osalvador@...e.de,
        andrii@...nel.org, ryan.roberts@....com, christophe.leroy@...roup.eu,
        tjmercier@...gle.com, kaleshsingh@...gle.com, aha310510@...il.com,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v6 7/8] fs/proc/task_mmu: read proc/pid/maps under
 per-vma lock

* Suren Baghdasaryan <surenb@...gle.com> [250709 11:06]:
> On Wed, Jul 9, 2025 at 3:03 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> >
> > On 7/9/25 16:43, Suren Baghdasaryan wrote:
> > > On Wed, Jul 9, 2025 at 1:57 AM Vlastimil Babka <vbabka@...e.cz> wrote:
> > >>
> > >> On 7/8/25 01:10, Suren Baghdasaryan wrote:
> > >> >>> +     rcu_read_unlock();
> > >> >>> +     vma = lock_vma_under_mmap_lock(mm, iter, address);
> > >> >>> +     rcu_read_lock();
> > >> >> OK I guess we hold the RCU lock the whole time as we traverse except when
> > >> >> we lock under mmap lock.
> > >> > Correct.
> > >>
> > >> I wonder if it's really necessary? Can't it be done just inside
> > >> lock_next_vma()? It would also avoid the unlock/lock dance quoted above.
> > >>
> > >> Even if we later manage to extend this approach to smaps and employ rcu
> > >> locking to traverse the page tables, I'd think it's best to separate and
> > >> fine-grain the rcu lock usage for vma iterator and page tables, if only to
> > >> avoid too long time under the lock.
> > >
> > > I thought we would need to be in the same rcu read section while
> > > traversing the maple tree using vma_next() but now looking at it,
> > > maybe we can indeed enter only while finding and locking the next
> > > vma...
> > > Liam, would that work? I see struct ma_state containing a node field.
> > > Can it be freed from under us if we find a vma, exit rcu read section
> > > then re-enter rcu and use the same iterator to find the next vma?
> >
> > If the rcu protection needs to be contigous, and patch 8 avoids the issue by
> > always doing vma_iter_init() after rcu_read_lock() (but does it really avoid
> > the issue or is it why we see the syzbot reports?) then I guess in the code
> > quoted above we also need a vma_iter_init() after the rcu_read_lock(),
> > because although the iterator was used briefly under mmap_lock protection,
> > that was then unlocked and there can be a race before the rcu_read_lock().
> 
> Quite true. So, let's wait for Liam's confirmation and based on his
> answer I'll change the patch by either reducing the rcu read section
> or adding the missing vma_iter_init() after we switch to mmap_lock.

You need to either be under rcu or mmap lock to ensure the node in the
maple state hasn't been freed (and potentially, reallocated).

So in this case, in the higher level, we can hold the rcu read lock for
a series of walks and avoid re-walking the tree then the performance
would be better.

When we return to userspace, then we should drop the rcu read lock and
will need to vma_iter_set()/vma_iter_invalidate() on return.  I thought
this was being done (through vma_iter_init()), but syzbot seems to
indicate a path that was missed?

This is the same thing that needed to be done previously with the mmap
lock, but now under the rcu lock.

I'm not sure how to mitigate the issue with the page table, maybe we
guess on the number of vmas that we were doing for 4k blocks of output
and just drop/reacquire then.  Probably a problem for another day
anyways.

Also, I think you can also change the vma_iter_init() to vma_iter_set(),
which is slightly less code under the hood.  Vlastimil asked about this
and it's probably a better choice.

Thanks,
Liam


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ