linux-kernel - Re: [PATCH] rwsem: reduce spinlock contention in wakeup code path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130928192123.GA8228@gmail.com>
Date:	Sat, 28 Sep 2013 21:21:23 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Waiman Long <Waiman.Long@...com>, Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Rik van Riel <riel@...hat.com>,
	Peter Hurley <peter@...leysoftware.com>,
	Davidlohr Bueso <davidlohr.bueso@...com>,
	Alex Shi <alex.shi@...el.com>,
	Tim Chen <tim.c.chen@...ux.intel.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Matthew R Wilcox <matthew.r.wilcox@...el.com>,
	Dave Hansen <dave.hansen@...el.com>,
	Michel Lespinasse <walken@...gle.com>,
	Andi Kleen <andi@...stfloor.org>,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	"Norton, Scott J" <scott.norton@...com>
Subject: Re: [PATCH] rwsem: reduce spinlock contention in wakeup code path


* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> On Sat, Sep 28, 2013 at 12:41 AM, Ingo Molnar <mingo@...nel.org> wrote:
> >
> >
> > Yeah, I fully agree. The reason I'm still very sympathetic to Tim's
> > efforts is that they address a regression caused by a mechanic
> > mutex->rwsem conversion:
> >
> >   5a505085f043 mm/rmap: Convert the struct anon_vma::mutex to an rwsem
> >
> > ... and Tim's patches turn that regression into an actual speedup.
> 
> Btw, I really hate that thing. I think we should turn it back into a 
> spinlock. None of what it protects needs a mutex or an rwsem.
> 
> Because you guys talk about the regression of turning it into a rwsem, 
> but nobody talks about the *original* regression.
> 
> And it *used* to be a spinlock, and it was changed into a mutex back in 
> 2011 by commit 2b575eb64f7a. That commit doesn't even have a reason 
> listed for it, although my dim memory of it is that the reason was 
> preemption latency.

Yeah, I think it was latency.

> And that caused big regressions too.
> 
> Of course, since then, we may well have screwed things up and now we 
> sleep under it, but I still really think it was a mistake to do it in 
> the first place.
> 
> So if the primary reason for this is really just that f*cking anon_vma 
> lock, then I would seriously suggest:
> 
>  - turn it back into a spinlock (or rwlock_t, since we subsequently
>    separated the read and write paths)
> 
>  - fix up any breakage (ie new scheduling points) that exposes
> 
>  - look at possible other approaches wrt latency on that thing.
> 
> Hmm?

If we do that then I suspect the next step will be queued rwlocks :-/ The 
current rwlock_t implementation is rather primitive by modern standards. 
(We'd probably have killed rwlock_t long ago if not for the 
tasklist_lock.)

But yeah, it would work and conceptually a hard spinlock fits something as 
lowlevel as the anon-vma lock.

I did a quick review pass and it appears nothing obvious is scheduling 
with the anon-vma lock held. If it did in a non-obvious way it's likely a 
bug anyway. The hugepage code grew a lot of logic running under the 
anon-vma lock, but it all seems atomic.

So a conversion to rwlock_t could be attempted. (It should be relatively 
easy patch as well, because the locking operation is now nicely abstracted 
out.)

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/