lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 4 May 2022 21:11:49 -0400
From:   Waiman Long <longman@...hat.com>
To:     Matthew Wilcox <willy@...radead.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>
Cc:     "Paul E. McKenney" <paulmck@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        linux-kernel@...r.kernel.org
Subject: Re: Wait for mutex to become unlocked

On 5/4/22 17:44, Matthew Wilcox wrote:
> Paul, Liam and I were talking about some code we intend to write soon
> and realised there's a missing function in the mutex & rwsem API.
> We're intending to use it for an rwsem, but I think it applies equally
> to mutexes.
>
> The customer has a low priority task which wants to read /proc/pid/smaps
> of a higher priority task.  Today, everything is awful; smaps acquires
> mmap_sem read-only, is preempted, then the high-pri task calls mmap()
> and the down_write(mmap_sem) blocks on the low-pri task.  Then all the
> other threads in the high-pri task block on the mmap_sem as they take
> page faults because we don't want writers to starve.
>
> The approach we're looking at is to allow RCU lookup of VMAs, and then
> take a per-VMA rwsem for read.  Because we're under RCU protection,
> that looks a bit like this:
>
> 	rcu_read_lock();
> 	vma = vma_lookup();
> 	if (down_read_trylock(&vma->sem)) {
> 		rcu_read_unlock();
> 	} else {
> 		rcu_read_unlock();
> 		down_read(&mm->mmap_sem);
> 		vma = vma_lookup();
> 		down_read(&vma->sem);
> 		up_read(&mm->mmap_sem);
> 	}
>
> (for clarity, I've skipped the !vma checks; don't take this too literally)
>
> So this is Good.  For the vast majority of cases, we avoid taking the
> mmap read lock and the problem will appear much less often.  But we can
> do Better with a new API.  You see, for this case, we don't actually
> want to acquire the mmap_sem; we're happy to spin a bit, but there's no
> point in spinning waiting for the writer to finish when we can sleep.
> I'd like to write this code:
>
> again:
> 	rcu_read_lock();
> 	vma = vma_lookup();
> 	if (down_read_trylock(&vma->sem)) {
> 		rcu_read_unlock();
> 	} else {
> 		rcu_read_unlock();
> 		rwsem_wait_read(&mm->mmap_sem);
> 		goto again;
> 	}
>
> That is, rwsem_wait_read() puts the thread on the rwsem's wait queue,
> and wakes it up without giving it the lock.  Now this thread will never
> be able to block any thread that tries to acquire mmap_sem for write.

I suppose that a writer that needs to take a write lock on vma->sem will 
have to take a write lock on mmap_sem first, then it makes sense to me 
that you want to wait for all the vma->sem writers to finish by waiting 
on the wait queue of mmap_sem. By the time the waiting task is being 
woken up, there is no active write lock on the vma->sem and hopefully by 
the time the waiting process wakes up and do a down_read_trylock(), it 
will succeed. However, the time gap in the wakeup process may have 
another writer coming in taking the vma->sem write lock. It improves the 
chance of a successful trylock but it is not guaranteed. So you will 
need a retry count and revert back to a direct down_read() when there 
are too many retries.

Since the waiting process isn't taking any lock, the name 
rwsem_wait_read() may be somewhat misleading. I think a better name may 
be rwsem_flush_waiters(). So do you want to flush the waiters at the 
point this API is called or you want to wait until the wait queue is empty?

Cheers,
Longman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ