linux-kernel - Re: REGRESSION: Performance regressions from switching anon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTikt88KnxTy8TuGGVrBVnXvsnL7nMQ@mail.gmail.com>
Date:	Wed, 15 Jun 2011 12:11:19 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...e.hu>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>
Cc:	Tim Chen <tim.c.chen@...ux.intel.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Hugh Dickins <hughd@...gle.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	David Miller <davem@...emloft.net>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Russell King <rmk@....linux.org.uk>,
	Paul Mundt <lethal@...ux-sh.org>,
	Jeff Dike <jdike@...toit.com>,
	Richard Weinberger <richard@....at>,
	Tony Luck <tony.luck@...el.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Mel Gorman <mel@....ul.ie>, Nick Piggin <npiggin@...nel.dk>,
	Namhyung Kim <namhyung@...il.com>, ak@...ux.intel.com,
	shaohua.li@...el.com, alex.shi@...el.com,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	"Rafael J. Wysocki" <rjw@...k.pl>
Subject: Re: REGRESSION: Performance regressions from switching anon_vma->lock
 to mutex

On Wed, Jun 15, 2011 at 3:58 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>
> The first thing that stood out when running it was:
>
> 31694 root      20   0 26660 1460 1212 S 17.5  0.0   0:01.97 exim
>    7 root      -2  19     0    0    0 S 12.7  0.0   0:06.14 rcuc0
...
>
> Which is an impressive amount of RCU usage..

Gaah. Can we just revert that crazy "use threads for RCU" thing already?

It's wrong. It's clearly expensive. It's using threads FOR NO GOOD
REASON, since the only reason for using them are config options that
nobody even uses, for chissake!

And it results in real problems. For example, if you use "perf record"
to see what the hell is up, the use of kernel threads for RCU
callbacks means that the RCU cost is never even seen. I don't know how
Tim did his profiling to figure out the costs, and I don't know how he
decided that the spinlock to semaphore conversion was the culprit, but
it is entirely possible that Tim didn't actually bisect the problem,
but instead used "perf record" on the exim task, saw that the
semaphore costs had gone up, and decided that it must be the
conversion.

And sure, maybe 50% of it was the conversion, and maybe 50% of it the
RCU changes - and "perf record" just never showed the RCU component.
We already know that it causes huge slowdowns on some other loads. We
just don't know.

So using anonymous kernel threads is actually a real downside. It
makes it much less obvious what is going on. We saw that exact same
thing with the generic worker thread conversions: things that used to
have clear performance issues ("oh, the iwl-phy0 thread is using 3% of
CPU time because it is polling for IO, and I can see that in 'top'")
turned into much-harder-to-see issues ("oh, kwork0 us using 3% CPU
time according to 'top' - I have no idea why").

Now, with RCU using softirq's, clearly the costs of RCU can sometimes
be mis-attributed because it turns out that the softirq is run from
some other thread. But statistically, if you end up having a heavy
softirq load, it _usually_ ends up being triggered in the context of
whoever causes that load. Not always, and not reliably, but I suspect
it ends up being easier to see.

And quite frankly, just look at commit a26ac2455ffc: it sure as hell
isn't making anything simpler. It adds several hundred lines of code,
and it's already been implicated in one major performance regression,
and is a possible reason for this one.

So Ingo, Paul: can we *please* just revert it, and agree that if you
want to re-instate it, the code should be

 (a) only done for the case where it matters (ie for the RCUBOOST case)

 (b) tested better for performance issues (and maybe shared with the
tinyrcu case that also uses threads?)

Please? It's more than a revert of that one commit - there's tons of
commits on top of that to actually do the boosting etc (and fixing
some of the fallout). But really, by now I'd prefer to just revert it
all, rather than see if it can be fixed up.. According to Peter,
Shaohua Li's patch that largely fixes the performance issue for the
other load (by moving *some* of the RCU stuff back to softirq context)
helps, but still leaves the rcu threads with a lot of CPU time.

Considering that most of the RCU callbacks are not very CPU intensive,
I bet that there's a *ton* of them, and that the context switch
overhead is quite noticeable. And quite frankly, even if Shaohua Li's
patch largely fixes the performance issue, it does so by making the
RCU situation EVEN MORE COMPLEX, with RCU now using *both* threads and
softirq.

That's just crazy. If you really want to do both the threads and
softirq thing for the magical RCU_BOOST case, go ahead, but please
don't do crazy things for the sane configurations.

                                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/