lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241210223850.GA2484@noisy.programming.kicks-ass.net>
Date: Tue, 10 Dec 2024 23:38:50 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Suren Baghdasaryan <surenb@...gle.com>
Cc: Matthew Wilcox <willy@...radead.org>, akpm@...ux-foundation.org,
	liam.howlett@...cle.com, lorenzo.stoakes@...cle.com,
	mhocko@...e.com, vbabka@...e.cz, hannes@...xchg.org,
	mjguzik@...il.com, oliver.sang@...el.com,
	mgorman@...hsingularity.net, david@...hat.com, peterx@...hat.com,
	oleg@...hat.com, dave@...olabs.net, paulmck@...nel.org,
	brauner@...nel.org, dhowells@...hat.com, hdanton@...a.com,
	hughd@...gle.com, minchan@...gle.com, jannh@...gle.com,
	shakeel.butt@...ux.dev, souravpanda@...gle.com,
	pasha.tatashin@...een.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, kernel-team@...roid.com
Subject: Re: [PATCH 3/4] mm: replace rw_semaphore with atomic_t in vma_lock

On Tue, Nov 12, 2024 at 07:18:45AM -0800, Suren Baghdasaryan wrote:
> On Mon, Nov 11, 2024 at 8:58 PM Matthew Wilcox <willy@...radead.org> wrote:
> >
> > On Mon, Nov 11, 2024 at 12:55:05PM -0800, Suren Baghdasaryan wrote:
> > > When a reader takes read lock, it increments the atomic, unless the
> > > top two bits are set indicating a writer is present.
> > > When writer takes write lock, it sets VMA_LOCK_WR_LOCKED bit if there
> > > are no readers or VMA_LOCK_WR_WAIT bit if readers are holding the lock
> > > and puts itself onto newly introduced mm.vma_writer_wait. Since all
> > > writers take mmap_lock in write mode first, there can be only one writer
> > > at a time. The last reader to release the lock will signal the writer
> > > to wake up.
> >
> > I don't think you need two bits.  You can do it this way:
> >
> > 0x8000'0000 - No readers, no writers
> > 0x1-7fff'ffff - Some number of readers
> > 0x0 - Writer held
> > 0x8000'0001-0xffff'ffff - Reader held, writer waiting
> >
> > A prospective writer subtracts 0x8000'0000.  If the result is 0, it got
> > the lock, otherwise it sleeps until it is 0.
> >
> > A writer unlocks by adding 0x8000'0000 (not by setting the value to
> > 0x8000'0000).
> >
> > A reader unlocks by adding 1.  If the result is 0, it wakes the writer.
> >
> > A prospective reader subtracts 1.  If the result is positive, it got the
> > lock, otherwise it does the unlock above (this might be the one which
> > wakes the writer).
> >
> > And ... that's it.  See how we use the CPU arithmetic flags to tell us
> > everything we need to know without doing arithmetic separately?
> 
> Yes, this is neat! You are using the fact that write-locked == no
> readers to eliminate unnecessary state. I'll give that a try. Thanks!

The reason I got here is that Vlastimil poked me about the whole
TYPESAFE_BY_RCU thing.

So the normal way those things work is with a refcount, if the refcount
is non-zero, the identifying fields should be stable and you can
determine if you have the right object, otherwise tough luck.

And I was thinking that since you abuse this rwsem you have, you might
as well turn that into a refcount with some extra.

So I would propose a slightly different solution.

Replace vm_lock with vm_refcnt. Replace vm_detached with vm_refcnt == 0
-- that is, attach sets refcount to 1 to indicate it is part of the mas,
detached is the final 'put'.

RCU lookup does the inc_not_zero thing, when increment succeeds, compare
mm/addr to validate.

vma_start_write() already relies on mmap_lock being held for writing,
and thus does not have to worry about writer-vs-writer contention, that
is fully resolved by mmap_sem. This means we only need to wait for
readers to drop out.

vma_start_write()
	add(0x8000'0001); // could fetch_add and double check the high
			  // bit wasn't already set.
	wait-until(refcnt == 0x8000'0002); // mas + writer ref
	WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
	sub(0x8000'0000);

vma_end_write()
	put();

vma_start_read() then becomes something like:

	if (vm_lock_seq == mm_lock_seq)
	  return false;

	cnt = fetch_inc(1);
	if (cnt & msb || vm_lock_seq == mm_lock_seq) {
	  put();
	  return false;
	}

	return true;
	
vma_end_read() then becomes:
	put();


and the down_read() from uffffffd requires mmap_read_lock() and thus
does not have to worry about writers, it can simpy be inc() and put(),
no?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ