lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <YnLzrGlBNCmCPLmS@casper.infradead.org>
Date:   Wed, 4 May 2022 22:44:12 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>
Cc:     "Paul E. McKenney" <paulmck@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        linux-kernel@...r.kernel.org
Subject: Wait for mutex to become unlocked

Paul, Liam and I were talking about some code we intend to write soon
and realised there's a missing function in the mutex & rwsem API.
We're intending to use it for an rwsem, but I think it applies equally
to mutexes.

The customer has a low priority task which wants to read /proc/pid/smaps
of a higher priority task.  Today, everything is awful; smaps acquires
mmap_sem read-only, is preempted, then the high-pri task calls mmap()
and the down_write(mmap_sem) blocks on the low-pri task.  Then all the
other threads in the high-pri task block on the mmap_sem as they take
page faults because we don't want writers to starve.

The approach we're looking at is to allow RCU lookup of VMAs, and then
take a per-VMA rwsem for read.  Because we're under RCU protection,
that looks a bit like this:

	rcu_read_lock();
	vma = vma_lookup();
	if (down_read_trylock(&vma->sem)) {
		rcu_read_unlock();
	} else {
		rcu_read_unlock();
		down_read(&mm->mmap_sem);
		vma = vma_lookup();
		down_read(&vma->sem);
		up_read(&mm->mmap_sem);
	}

(for clarity, I've skipped the !vma checks; don't take this too literally)

So this is Good.  For the vast majority of cases, we avoid taking the
mmap read lock and the problem will appear much less often.  But we can
do Better with a new API.  You see, for this case, we don't actually
want to acquire the mmap_sem; we're happy to spin a bit, but there's no
point in spinning waiting for the writer to finish when we can sleep.
I'd like to write this code:

again:
	rcu_read_lock();
	vma = vma_lookup();
	if (down_read_trylock(&vma->sem)) {
		rcu_read_unlock();
	} else {
		rcu_read_unlock();
		rwsem_wait_read(&mm->mmap_sem);
		goto again;
	}

That is, rwsem_wait_read() puts the thread on the rwsem's wait queue,
and wakes it up without giving it the lock.  Now this thread will never
be able to block any thread that tries to acquire mmap_sem for write.

Similarly, it may make sense to add rwsem_wait_write() and mutex_wait().
Perhaps also mutex_wait_killable() and mutex_wait_interruptible()
(the combinatoric explosion is a bit messy; I don't know that it makes
sense to do the _nested, _io variants).

Does any of this make sense?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ