linux-kernel - Re: [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131217175441.GI11295@suse.de>
Date:	Tue, 17 Dec 2013 17:54:41 +0000
From:	Mel Gorman <mgorman@...e.de>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Alex Shi <alex.shi@...aro.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Fengguang Wu <fengguang.wu@...el.com>,
	H Peter Anvin <hpa@...or.com>, Linux-X86 <x86@...nel.org>,
	Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB
 range flush v2

On Tue, Dec 17, 2013 at 03:42:14PM +0100, Ingo Molnar wrote:
> 
> * Mel Gorman <mgorman@...e.de> wrote:
> 
> > [...]
> >
> > At that point it'll be time to look at profiles and see where we are 
> > actually spending time because the possibilities of finding things 
> > to fix through bisection will be exhausted.
> 
> Yeah.
> 
> One (heavy handed but effective) trick that can be used in such a 
> situation is to just revert everything that is causing problems, and 
> continue reverting until we get back to a v3.4 baseline performance.
> 

Very tempted but the potential timeframe here is very large and the number
of patches could be considerable. Some patches cause a lot of noise. For
example, one patch enabled ACPI cpufreq driver loading which looks like
a regression during that window but it's a side-effect that gets fixed
later. It'll take time to identify all the patches that potentially cause
problems.

> Once such a 'clean' tree (or queue of patches) is achived, that can be 
> used as a measurement base and the individual features can be 
> re-applied again, one by one, with measurement and analysis becoming a 
> lot easier.
> 

Ordinarily I would agree with you but would prefer a shorter window for
that type of strategy.

> > > Also it appears the Ebizzy numbers ought to be stable enough now 
> > > to make the range-TLB-flush measurements more precise?
> > 
> > Right now, the tlbflush microbenchmark figures look awful on the 
> > 8-core machine when the tlbflush shift patch and the schedule domain 
> > fix are both applied.
> 
> I think that furthr strengthens the case for the 'clean base' approach 
> I outlined above - but it's your call obviously ...
> 

I'll keep it as plan b if it cannot be fixed with a direct approach.

> Thanks again for going through all this. Tracking multi-commit 
> performance regressions across 1.5 years worth of commits is generally 
> very hard. Does your testing effort comes from enterprise Linux QA 
> testing, or did you ran into this problem accidentally?
> 

It does not come from enterprise Linux QA testing but it's motivated by
it. I want to catch as many "obvious" performance bugs before they do as
it saves time and stress in the long run. To assist that, I setup continual
performance regression testing and ebizzy was included in the first report
I opened.  It makes me worry what the rest of the reports contain.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/