linux-kernel - Re: [PATCH UPDATED] percpu: use dynamic percpu allocator as the default percpu allocator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090401190113.GA734@elte.hu>
Date:	Wed, 1 Apr 2009 21:01:13 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Christoph Lameter <cl@...ux.com>
Cc:	Tejun Heo <tj@...nel.org>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	rusty@...tcorp.com.au, tglx@...utronix.de, x86@...nel.org,
	linux-kernel@...r.kernel.org, hpa@...or.com,
	Paul Mundt <lethal@...ux-sh.org>, rmk@....linux.org.uk,
	starvik@...s.com, ralf@...ux-mips.org, davem@...emloft.net,
	cooloney@...nel.org, kyle@...artin.ca, matthew@....cx,
	grundler@...isc-linux.org, takata@...ux-m32r.org,
	benh@...nel.crashing.org, rth@...ddle.net,
	ink@...assic.park.msu.ru, heiko.carstens@...ibm.com,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Nick Piggin <npiggin@...e.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH UPDATED] percpu: use dynamic percpu allocator as the
	default percpu allocator

* Christoph Lameter <cl@...ux.com> wrote:

> __read_mostly should be packed as tightly as possible to increase 
> the chance that one cacheline includes multiple of the critical 
> variables for the hot code paths. Too much __read_mostly defeats 
> its purpose.

That stance is a commonly held but quite wrong and harmful IMHO.

It stiffles the proper identification of read-mostly variables _AND_ 
it hurts the proper identification of critical write-often variables 
as well. Not good.

The solution for critical write-often variables is what we always 
used: to identify them explicitly and to place them consciously into 
separate cachelines. (Or to per-cpu-ify or object-ify them where 
possible/sensible.)

Then annotate everything that is read-mostly and accessed-frequently 
with the __read_mostly attribute.

The rest (unannotated variables) is to be assumed "access-rarely" or 
"we-dont-care", by default. This is actually 95% of the global 
variables.

Yes, a spreading amount of annotations puts increasing pressure on 
the places that are frequently access but not properly annotated - 
but we should be happy about that: it creates the dynamics and 
pressure for them to be properly annotated.

On the other hand, depending on the "put enough data bloat between 
critical variables anyway, no need to care alignment" scheme is a 
sloppy, fragile concept that does not lead to a reliable and 
dependable end result.

It has two problems:

 - Thinking that this solves false cacheline sharing reliably is 
   wrong: there's nothing that guarantees and enforces that slapping 
   a few variables between two critical variables puts them on 
   separate cachelines:

      - Ongoing changes in code can bit-rot the 
        thought-to-be-large-enough distance between two critical 
        variables - and there's no mechanism in place. 
        Explicitly cacheline aligning them will preserve the 
        information long-term.

      - There are architectures with larger cacheline sizes than 
        what you are developing on.

      - .config variations can move variables closer or farther 
        apart from each other, hiding/triggering the false cacheline 
        sharing problem.

   It is not a maintainable concept IMHO and we should not pretend 
   it is.

 - It actually prevents true read-mostly variables from being
   annotated properly. (In such a case a true read-mostly variable
   bouncing around with a frequently-written variable cache line is
   almost as bad in terms of MESI latencies and costs as false
   cacheline sharing between two write-mostly variables.)

Architecturing the layout of variables in a knowingly random and 
.config sensitive way is simply not good design and we should not 
pretend it is.

We might not be able to solve the problem if not enough people care 
about their variables, but we should at least not be proud of a 
non-solution ;-)

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/