[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 2 Feb 2018 11:23:26 +0530
From: Chintan Pandya <cpandya@...eaurora.org>
To: Frank Rowand <frowand.list@...il.com>,
Rob Herring <robh+dt@...nel.org>
Cc: "open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS"
<devicetree@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] of: cache phandle nodes to decrease cost of
of_find_node_by_phandle()
On 2/2/2018 2:39 AM, Frank Rowand wrote:
> On 02/01/18 06:24, Rob Herring wrote:
>> And so
>> far, no one has explained why a bigger cache got slower.
>
> Yes, I still find that surprising.
I thought a bit about this. And realized that increasing the cache size
should help improve the performance only if there are too many misses
with the smaller cache. So, from my experiments some time back, I looked
up the logs and saw the access pattern. Seems like, there is
*not_too_much* juggling during look up by phandles.
See the access pattern here:
https://drive.google.com/file/d/1qfAD8OsswNJABgAwjJf6Gr_JZMeK7rLV/view?usp=sharing
Sample log is pasted below where number in the last is phandle value.
Line 8853: [ 37.425405] OF: want to search this 262
Line 8854: [ 37.425453] OF: want to search this 262
Line 8855: [ 37.425499] OF: want to search this 262
Line 8856: [ 37.425549] OF: want to search this 15
Line 8857: [ 37.425599] OF: want to search this 5
Line 8858: [ 37.429989] OF: want to search this 253
Line 8859: [ 37.430058] OF: want to search this 253
Line 8860: [ 37.430217] OF: want to search this 253
Line 8861: [ 37.430278] OF: want to search this 253
Line 8862: [ 37.430337] OF: want to search this 253
Line 8863: [ 37.430399] OF: want to search this 254
Line 8864: [ 37.430597] OF: want to search this 254
Line 8865: [ 37.430656] OF: want to search this 254
Above explains why results with cache size 64 and 128 have almost
similar results. Now, for cache size 256 we have degrading performance.
I don't have a good theory here but I'm assuming that by making large SW
cache, we miss the benefits of real HW cache which is typically smaller
than our array size. Also, in my set up, I've set max_cpu=1 to reduce
the variance. That again, should affect the cache holding pattern in HW
and affect the perf numbers.
Chintan
--
Qualcom India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, a Linux Foundation
Collaborative Project
Powered by blists - more mailing lists