linux-kernel - Re: [PATCH 0/3] TLB flush multiple pages per IPI v5

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150611152559.GA15509@gmail.com>
Date:	Thu, 11 Jun 2015 17:26:00 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Mel Gorman <mgorman@...e.de>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>,
	Hugh Dickins <hughd@...gle.com>,
	Minchan Kim <minchan@...nel.org>,
	Dave Hansen <dave.hansen@...el.com>,
	Andi Kleen <andi@...stfloor.org>,
	H Peter Anvin <hpa@...or.com>, Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 0/3] TLB flush multiple pages per IPI v5


* Mel Gorman <mgorman@...e.de> wrote:

> > > I made a really clear and unambiguous chain of arguments:
> > > 
> > >  - I'm unconvinced about the benefits of INVLPG in general, and your patches adds
> > >    a whole new bunch of them. [...]
> > 
> > ... and note that your claim that 'we were doing them before, this is just an 
> > equivalent transformation' is utter bullsh*t technically: what we were doing 
> > previously was a hideously expensive IPI combined with an INVLPG.
> 
> And replacing it with an INVLPG without excessive IPI transmission is changing 
> one major variable. Going straight to a full TLB flush is changing two major 
> variables. [...]

But this argument employs the fallacy that the transition to 'batching with PFN 
tracking' is not a major variable in itself. In reality it is a major variable: it 
adds extra complexity, such as the cross CPU data flow (the pfn tracking), and it 
also changes the distribution of the flushes and related caching patterns.

> > [...]
> > 
> > The batching limit (which you set to 32) should then be tuned by comparing it 
> > to a working full-flushing batching logic, not by comparing it to the previous 
> > single IPI per single flush approach!
> 
> We can decrease it easily but increasing it means we also have to change 
> SWAP_CLUSTER_MAX because otherwise enough pages are not unmapped for flushes and 
> it is a requirement that we flush before freeing the pages. That changes another 
> complex variable because at the very least, it alters LRU lock hold times.

('should then be' implied 'as a separate patch/series', obviously.)

I.e. all I wanted to observe is that I think the series did not explore the 
performance impact of the batching limit, because it was too focused on the INVLPG 
approach which has an internal API limit of 33.

Now that the TLB flushing side is essentially limit-less, a future enhancement 
would be to further increase batching.

My suspicion is that say doubling SWAP_CLUSTER_MAX would possibly further reduce 
the IPI rate, at a negligible cost.

But this observation does not impact the current series in any case.

> > ... and if the benefits of a complex algorithm are not measurable and if there 
> > are doubts about the cost/benefit tradeoff then frankly it should not exist in 
> > the kernel in the first place. It's not like the Linux TLB flushing code is 
> > too boring due to overwhelming simplicity.
> > 
> > and yes, it's my job as a maintainer to request measurements justifying 
> > complexity and your ad hominem attacks against me are disgusting - you should 
> > know better.
> 
> It was not intended as an ad hominem attack and my apologies for that. I wanted 
> to express my frustration that a series that adjusted one variable with known 
> benefit will be rejected for a series that adjusts two major variables instead 
> with the second variable being very sensitive to workload and CPU.

... but that's not what it did: it adjusted multiple complex variables already, 
with a questionable rationale for more complexity.

And my argument was and continues to be: start with the simplest variant and 
iterate from there. Which you seem to have adapted in your latest series, so my 
concerns are addressed:

Acked-by: Ingo Molnar <mingo@...nel.org>

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/