lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpFKNm6CEcfkuy+0o-Qu8xXppCFbOcYVXUFLeg10ztMFPw@mail.gmail.com>
Date: Wed, 9 Jul 2025 17:47:06 +0000
From: Suren Baghdasaryan <surenb@...gle.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>, Suren Baghdasaryan <surenb@...gle.com>, 
	Vlastimil Babka <vbabka@...e.cz>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, akpm@...ux-foundation.org, 
	david@...hat.com, peterx@...hat.com, jannh@...gle.com, hannes@...xchg.org, 
	mhocko@...nel.org, paulmck@...nel.org, shuah@...nel.org, adobriyan@...il.com, 
	brauner@...nel.org, josef@...icpanda.com, yebin10@...wei.com, 
	linux@...ssschuh.net, willy@...radead.org, osalvador@...e.de, 
	andrii@...nel.org, ryan.roberts@....com, christophe.leroy@...roup.eu, 
	tjmercier@...gle.com, kaleshsingh@...gle.com, aha310510@...il.com, 
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	linux-mm@...ck.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v6 7/8] fs/proc/task_mmu: read proc/pid/maps under per-vma lock

On Wed, Jul 9, 2025 at 4:12 PM Liam R. Howlett <Liam.Howlett@...cle.com> wrote:
>
> * Suren Baghdasaryan <surenb@...gle.com> [250709 11:06]:
> > On Wed, Jul 9, 2025 at 3:03 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> > >
> > > On 7/9/25 16:43, Suren Baghdasaryan wrote:
> > > > On Wed, Jul 9, 2025 at 1:57 AM Vlastimil Babka <vbabka@...e.cz> wrote:
> > > >>
> > > >> On 7/8/25 01:10, Suren Baghdasaryan wrote:
> > > >> >>> +     rcu_read_unlock();
> > > >> >>> +     vma = lock_vma_under_mmap_lock(mm, iter, address);
> > > >> >>> +     rcu_read_lock();
> > > >> >> OK I guess we hold the RCU lock the whole time as we traverse except when
> > > >> >> we lock under mmap lock.
> > > >> > Correct.
> > > >>
> > > >> I wonder if it's really necessary? Can't it be done just inside
> > > >> lock_next_vma()? It would also avoid the unlock/lock dance quoted above.
> > > >>
> > > >> Even if we later manage to extend this approach to smaps and employ rcu
> > > >> locking to traverse the page tables, I'd think it's best to separate and
> > > >> fine-grain the rcu lock usage for vma iterator and page tables, if only to
> > > >> avoid too long time under the lock.
> > > >
> > > > I thought we would need to be in the same rcu read section while
> > > > traversing the maple tree using vma_next() but now looking at it,
> > > > maybe we can indeed enter only while finding and locking the next
> > > > vma...
> > > > Liam, would that work? I see struct ma_state containing a node field.
> > > > Can it be freed from under us if we find a vma, exit rcu read section
> > > > then re-enter rcu and use the same iterator to find the next vma?
> > >
> > > If the rcu protection needs to be contigous, and patch 8 avoids the issue by
> > > always doing vma_iter_init() after rcu_read_lock() (but does it really avoid
> > > the issue or is it why we see the syzbot reports?) then I guess in the code
> > > quoted above we also need a vma_iter_init() after the rcu_read_lock(),
> > > because although the iterator was used briefly under mmap_lock protection,
> > > that was then unlocked and there can be a race before the rcu_read_lock().
> >
> > Quite true. So, let's wait for Liam's confirmation and based on his
> > answer I'll change the patch by either reducing the rcu read section
> > or adding the missing vma_iter_init() after we switch to mmap_lock.
>
> You need to either be under rcu or mmap lock to ensure the node in the
> maple state hasn't been freed (and potentially, reallocated).
>
> So in this case, in the higher level, we can hold the rcu read lock for
> a series of walks and avoid re-walking the tree then the performance
> would be better.

Got it. Thanks for confirming!

>
> When we return to userspace, then we should drop the rcu read lock and
> will need to vma_iter_set()/vma_iter_invalidate() on return.  I thought
> this was being done (through vma_iter_init()), but syzbot seems to
> indicate a path that was missed?

We do that in m_start()/m_stop() by calling
lock_vma_range()/unlock_vma_range() but I think I have two problems
here:
1. As Vlastimil mentioned I do not reset the iterator when falling
back to mmap_lock and exiting and then re-entering rcu read section;
2. I do not reset the iterator after exiting rcu read section in
m_stop() and re-entering it in m_start(), so the later call to
lock_next_vma() might be using an iterator with a node that was freed
(and possibly reallocated).

>
> This is the same thing that needed to be done previously with the mmap
> lock, but now under the rcu lock.
>
> I'm not sure how to mitigate the issue with the page table, maybe we
> guess on the number of vmas that we were doing for 4k blocks of output
> and just drop/reacquire then.  Probably a problem for another day
> anyways.
>
> Also, I think you can also change the vma_iter_init() to vma_iter_set(),
> which is slightly less code under the hood.  Vlastimil asked about this
> and it's probably a better choice.

Ack.
I'll update my series with these fixes and all comments I received so
far, will run the reproducers to confirm no issues and repost them
later today.
Thanks,
Suren.

>
> Thanks,
> Liam
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ