linux-kernel - Re: [PATCH] interconnect: Skip call into provider if initial bw is zero

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6dd7b0b0-f6fb-9de4-c365-d6cbfe04f2c0@quicinc.com>
Date:   Mon, 23 Jan 2023 12:37:05 -0800
From:   Mike Tipton <quic_mdtipton@...cinc.com>
To:     Bryan O'Donoghue <bryan.odonoghue@...aro.org>,
        Vivek Aknurwar <quic_viveka@...cinc.com>, <djakov@...nel.org>
CC:     <quic_okukatla@...cinc.com>, <linux-pm@...r.kernel.org>,
        <linux-arm-msm@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] interconnect: Skip call into provider if initial bw is
 zero

On 1/19/2023 3:56 PM, Bryan O'Donoghue wrote:
> On 19/01/2023 22:18, Vivek Aknurwar wrote:
>> Hi Bryan,
>> Thanks for taking time to review the patch.
>>
>> On 1/13/2023 5:40 PM, Bryan O'Donoghue wrote:
>>> On 14/01/2023 01:24, Bryan O'Donoghue wrote:
>>>> On 13/01/2023 22:07, Vivek Aknurwar wrote:
>>>>> Currently framework sets bw even when init bw requirements are zero 
>>>>> during
>>>>> provider registration, thus resulting bulk of set bw to hw.
>>>>> Avoid this behaviour by skipping provider set bw calls if init bw 
>>>>> is zero.
>>>>>
>>>>> Signed-off-by: Vivek Aknurwar <quic_viveka@...cinc.com>
>>>>> ---
>>>>>   drivers/interconnect/core.c | 17 ++++++++++-------
>>>>>   1 file changed, 10 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
>>>>> index 25debde..43ed595 100644
>>>>> --- a/drivers/interconnect/core.c
>>>>> +++ b/drivers/interconnect/core.c
>>>>> @@ -977,14 +977,17 @@ void icc_node_add(struct icc_node *node, 
>>>>> struct icc_provider *provider)
>>>>>       node->avg_bw = node->init_avg;
>>>>>       node->peak_bw = node->init_peak;
>>>>> -    if (provider->pre_aggregate)
>>>>> -        provider->pre_aggregate(node);
>>>>> -
>>>>> -    if (provider->aggregate)
>>>>> -        provider->aggregate(node, 0, node->init_avg, node->init_peak,
>>>>> -                    &node->avg_bw, &node->peak_bw);
>>>>> +    if (node->avg_bw || node->peak_bw) {
>>>>> +        if (provider->pre_aggregate)
>>>>> +            provider->pre_aggregate(node);
>>>>> +
>>>>> +        if (provider->aggregate)
>>>>> +            provider->aggregate(node, 0, node->init_avg, 
>>>>> node->init_peak,
>>>>> +                        &node->avg_bw, &node->peak_bw);
>>>>> +        if (provider->set)
>>>>> +            provider->set(node, node);
>>>>> +    }
>>>>> -    provider->set(node, node);
>>>>>       node->avg_bw = 0;
>>>>>       node->peak_bw = 0;
>>>>
>>>> I have the same comment/question for this patch that I had for the 
>>>> qcom arch specific version of it. This patch seems to be doing at a 
>>>> higher level what the patch below was doing at a lower level.
>>>>
>>>> https://lore.kernel.org/lkml/1039a507-c4cd-e92f-dc29-1e2169ce5078@linaro.org/T/#m0c90588d0d1e2ab88c39be8f5f3a8f0b61396349
>>>>
>>>> what happens to earlier silicon - qcom silicon which previously made 
>>>> explicit zero requests ?
>>
>> This patch is to optimize and avoid all those bw 0 requests on each 
>> node addition during probe (which results in rpmh remote calls) for 
>> upcoming targets.
> 
> So why not change it just for rpmh ?
> 
> You are changing it for rpm here, as well as for Samsung and NXP 
> interconnects.
> 

This isn't actually changing it for all providers. Only for those that 
define the get_bw() callback. Right now that's only qcom/msm8974 and 
imx/imx. If get_bw() isn't defined, then icc_node_add() defaults to 
INT_MAX. So, the logical behavior in that case is unchanged. Which means 
this isn't even changing the behavior for rpmh yet, either.

We're also working on changes to align our downstream, qcom-specific, 
rpmh-specific sync-state approach with the common upstream approach. 
Part of which includes adding a get_bw() callback for rpmh that only 
returns non-zero BW for nodes already enabled from bootloaders or are 
otherwise marked as critical for HLOS operation (i.e. keepalive). 
Currently, the upstream rpmh driver doesn't define get_bw(), which means 
the framework votes INT_MAX for everything even if most of the nodes 
aren't needed yet.

Currently, with the upstream rpmh-based drivers this is just a 
performance/power optimization issue. It doesn't cause any functional 
failures. However, downstream we have additional nodes that use separate 
BCM voters than just the "apps" voter. These secondary voters aren't 
accessible when the providers probe, since they require additional 
regulator dependencies to be met first. We rely on the client voting for 
the required regulators before voting to interconnect for these nodes. 
So, we need to prevent the framework from calling our set() callbacks 
when adding these secondary nodes, otherwise it'll cause bus errors and 
crash the kernel. It's not always safe to assume that every node is 
immediately capable of being voted for when it's added.

We currently work around this by "stubbing" our pre_aggregate, 
aggregate, and set() callbacks when adding the nodes and only set them 
to the real callbacks after we've finished adding everything. But that 
stops being a valid workaround when we move to the upstream sync-state 
approach, since we're relying on the set() callback from icc_node_add() 
for placing the initial proxy votes for "keepalive" and other nodes 
already enable from boot.

I'm sure the secondary voters will make their way upstream some day, but 
not clear when yet. There are no upstream drivers in a state ready to 
use them yet anyway. But the other changes we're working on to add 
get_bw() to icc-rpmh providers to reduce the number of unnecessary calls 
during probe could go in sooner as an optimization.

It's not easy to implement this purely on the provider side, since we 
can't just always ignore zero votes. We need to honor zero votes that 
are made post-init so that things actually turn off. Thus, any logic 
that short-circuits the zero requests would need to be done only for the 
very first request. Each node would have to track if it's been called 
once already. And we'd have to spread that logic across pre_aggregate, 
aggregate, and set. There's isn't just one simple place to implement 
this on the provider side. This is much more easily handled on the 
framework side.


> Taking rpm as an example, for certain generations of silicon we make an 
> explicit zero call.
> 
> https://git.codelinaro.org/clo/la/kernel/msm-3.18/-/blob/LA.BR.1.2.9-00810-8x09.0/drivers/platform/msm/msm_bus/msm_bus_bimc.c#L1367
> 
> Here's the original RPM commit that sets a zero
> 
> https://git.codelinaro.org/clo/la/kernel/msm-3.18/-/commit/d91d108656a7a44a6dfcfb318a25d39c5418e54b
> >>>> 
https://lore.kernel.org/lkml/1039a507-c4cd-e92f-dc29-1e2169ce5078@linaro.org/T/#m589e8280de470e038249bb362634221771d845dd
>>>>
>>>> https://lkml.org/lkml/2023/1/3/1232
>>>>
>>>> Isn't it a better idea to let lower layer drivers differentiate what 
>>>> they do ?
>>
>> AFAIU lower layer driver can/should not differentiate between normal 
>> flow calls vs made as a result from probe/initialization of driver. 
>> Hence even bw 0 request is honored as like client in general wish to 
>> vote 0 as in an normal use case.
> 
> But surely if I vote zero, then I mean to vote zero ?
> 
> Do we know that for every architecture and for every different supported 
> that ignoring a zero vote is the right thing to do ?
> 
> I don't think we do know that.
> 

Relying on the existing behavior of icc_node_add() calling set() when 
the node's BW is already zero should be generally unnecessary. If the 
node is already physically disabled in HW, then disabling again should 
be a don't-care. And if the node is already physically enabled in HW, 
then get_bw() should logically return something non-zero for it. 
get_bw() is supposed to return the *current* BW. It's not always 
possible to know exactly what the BW is, so often the distinction may 
just be between zero and INT_MAX. But ultimately it would ideally return 
the actual current BW vote, such that the initial votes placed by 
icc_node_add() match the preexisting votes from boot and don't 
unnecessarily enable or dramatically increase BW of many nodes 
irrelevant for early kernel boot.

If the provider simply has no idea, then it can choose not to define the 
get_bw() callback and the framework will assume INT_MAX for everything. 
But if the provider wants to optimize the initial BW voting, it can 
define the get_bw() callback to inform the framework which nodes are 
already enabled and require proxy voting.

And relying on icc_node_add() calling set() for zero BW should also be 
unnecessary for cleaning up nodes enabled from boot that are no longer 
necessary. Because in either case if get_bw() returns non-zero or 
get_bw() isn't defined at all, then the framework has non-zero initial 
BW for them. And if no consumers explicitly vote for them, then they'll 
be disabled in icc_sync_state(). Sync-state is the proper place to 
disable resources no longer needed from boot.

> https://lore.kernel.org/linux-arm-msm/20230116132152.405535-1-konrad.dybcio@linaro.org/
> 
> I think for older rpm this is a departure from long existing logic.
> 
> Maybe its entirely benign but, IMO you should be proposing this change 
> at the rpmh level only, not at the top level across multiple different 
> interconnect arches.
> 
> ---
> bod