linux-kernel - Re: [PATCH RESEND] fs: Move bh_cachep to the __read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK-9PRCb1jJdejvim48LP7GN6jHKDdkKVx-UY1Gpsbx1ROyQvQ@mail.gmail.com>
Date:	Mon, 2 Jul 2012 21:30:04 +0530
From:	Chinmay V S <cvs268@...il.com>
To:	Vlad Zolotarov <vlad@...lemp.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>, viro@...iv.linux.org.uk,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	Shai@...lemp.com
Subject: Re: [PATCH RESEND] fs: Move bh_cachep to the __read_mostly section

>On Sunday 01 July 2012 17:24:34 Chinmay V S wrote:
>> Played around with __read_mostly some time back in an attempt to optimise
>> cache usage.
>> thecodeartist.blogspot.com/2011/12/why-readmostly-does-not-work-as-it.html
>
>Nice article!

Thank you! :)

>U see, if we assume that __read_mostly the way it's currently implemented is a
>bad thing (due to the fact that it implicitly causes the bunching of the
>write-mostly variables) than this means that "const" the way it's currently
>implemented is a bad thing too due to the same reasons. ;)

Every individual instance of __read_mostly may NOT degrade performance.
What *will* degrade the performance is "excessive" use of __read_mostly.
An interesting discussion on similar lines here[2].

>This is an interesting idea however there is (at least) one weakness in it -
>it assumes that linker's heuristics (those that will pack cons and non-const
>variables together in a single cache line) will do a better job than a person
>that writes a specific code section and knows the exact nature and the
>relationship between variables in his/her C code.

True.

>First of all, let me note that saying that C code performance may benefit from
>NOT using the __read_mostly variables is a bit misleading because here u rely
>on something that is not deterministic: a linker decision to pack one (often
>written) variable together with another (read mostly) variable in a single
>cache line boundaries (thus improving the performance); and this decision may
>change due to a minor code change and u will not know about it.

I totally agree that avoiding use of __read_mostly does NOT guarantee any
performance boost. The point i am trying to make is this:

1. Consider a code with NO instances of __read_mostly.

2. Now we go ahead and add __read_mostly to an object.
Note that we are NOT guaranteed that this object is "hot" i.e. accessed
frequently. All that __read_mostly signifies is that the object is rarely
written to i.e. most of the time it is accessed, it is a read operation.

Cost-Benefit analysis:
Currently each CPU keeps its own copy of the __read_mostly(variable) in the
per-cpu L1 cache(any benefits on non-SMP systems?). As the variable is rarely
written to, rarely do we need to sync it across multiple L1 caches i.e.
cacheline-bouncing is very rare.
So the cost is very less.

As the variable is maintained in L1 cache, rather than being shared across
multiple CPUs in L2 or L3 cache, the access is an order of magnitude faster.
Hence the benefit is very high.

3. We continue adding __read_mostly to other genuine read-mostly objects.

As we continue to increase the number of __read_mostly objects, they get moved
from bss to .data.read_mostly section. This IMHO, increase the chances
(as compared to earlier without __read_mostly) that 2 objects in the bss
compete for the same cache-line. But this is NOT directly evident as modern
cpu-caches are N-way associative i.e .each object has a choice of N different
cache-slots. This tends to intially hide the effect of __read_mostly.
(This is the point i make in my article[1]).

After a few iterations of adding __read_mostly, (if) the cache contention
increases to more than N objects competing for the same cache-slot. False
cache-line sharing occurs i.e. 2 or more objects continue to replace
one another from the cache-slot alternatively. i.e cache-thrashing begins.

Note that false cache-line sharing is NOT a one time cost. Cache thrashing will
continue to happen until the context changes sufficiently for one of the
cache-slots to free-up. Hence this scenario must be avoided at all costs.

>I agree that there might be a few places in kernel that suffer from the
>weakness described above. That's why it's important to add __read_mostly in
>small portions (even one by one) in order to ease the bisect if and when the
>performance regression occurs.

Exactly! So we can conclude that "excessive" use of __read_mostly must be
avoided. "Excessive" varies from system to system based on:
- Degree of SMP (no.of cores).
- Levels of cache (and penalties associated between successive levels).
- Associativity of caches.

Without proper understanding of these params, __read_mostly with be a "mostly"
hit-n-miss affair. In fact a quick grep shows ~1300 __read_mostly  scattered
around the kernel code(3.4-rc1) which on certain systems is already detrimental.
Certain architectures(eg-ARM) completely disable __read_mostly as its evident by
their 2-way associative cache that cache-thrashing will occur so quickly that it
voids any potential performance gains.

[1] thecodeartist.blogspot.com/2011/12/why-readmostly-does-not-work-as-it.html

[2] fixunix.com/kernel/262711-rfc-remove-__read_mostly.html

regards
ChinmayVS
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/