linux-kernel - Re: [PATCH] of: cache phandle nodes to decrease cost of of_find_node_by

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 2 Feb 2018 11:23:26 +0530
From:   Chintan Pandya <cpandya@...eaurora.org>
To:     Frank Rowand <frowand.list@...il.com>,
        Rob Herring <robh+dt@...nel.org>
Cc:     "open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS" 
        <devicetree@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] of: cache phandle nodes to decrease cost of
 of_find_node_by_phandle()

On 2/2/2018 2:39 AM, Frank Rowand wrote:
> On 02/01/18 06:24, Rob Herring wrote:
>> And so
>> far, no one has explained why a bigger cache got slower.
> 
> Yes, I still find that surprising.

I thought a bit about this. And realized that increasing the cache size 
should help improve the performance only if there are too many misses 
with the smaller cache. So, from my experiments some time back, I looked 
up the logs and saw the access pattern. Seems like, there is 
*not_too_much* juggling during look up by phandles.

See the access pattern here: 
https://drive.google.com/file/d/1qfAD8OsswNJABgAwjJf6Gr_JZMeK7rLV/view?usp=sharing

Sample log is pasted below where number in the last is phandle value.
	Line 8853: [   37.425405] OF: want to search this 262
	Line 8854: [   37.425453] OF: want to search this 262
	Line 8855: [   37.425499] OF: want to search this 262
	Line 8856: [   37.425549] OF: want to search this 15
	Line 8857: [   37.425599] OF: want to search this 5
	Line 8858: [   37.429989] OF: want to search this 253
	Line 8859: [   37.430058] OF: want to search this 253
	Line 8860: [   37.430217] OF: want to search this 253
	Line 8861: [   37.430278] OF: want to search this 253
	Line 8862: [   37.430337] OF: want to search this 253
	Line 8863: [   37.430399] OF: want to search this 254
	Line 8864: [   37.430597] OF: want to search this 254
	Line 8865: [   37.430656] OF: want to search this 254

Above explains why results with cache size 64 and 128 have almost 
similar results. Now, for cache size 256 we have degrading performance. 
I don't have a good theory here but I'm assuming that by making large SW 
cache, we miss the benefits of real HW cache which is typically smaller 
than our array size. Also, in my set up, I've set max_cpu=1 to reduce 
the variance. That again, should affect the cache holding pattern in HW 
and affect the perf numbers.

Chintan
-- 
Qualcom India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, a Linux Foundation
Collaborative Project