linux-kernel - Re: [PATCH v2 1/2] interconnect: Add sync state support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e4f68ae5-5cf7-bac4-e7f2-c074327ea659@codeaurora.org>
Date:   Mon, 27 Jul 2020 23:18:25 -0700
From:   Mike Tipton <mdtipton@...eaurora.org>
To:     Saravana Kannan <saravanak@...gle.com>,
        Georgi Djakov <georgi.djakov@...aro.org>
Cc:     Linux PM <linux-pm@...r.kernel.org>, okukatla@...eaurora.org,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        linux-arm-msm <linux-arm-msm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 1/2] interconnect: Add sync state support

On 7/22/2020 10:07 AM, Saravana Kannan wrote:
> On Wed, Jul 22, 2020 at 4:01 AM Georgi Djakov <georgi.djakov@...aro.org> wrote:
>>
>> The bootloaders often do some initial configuration of the interconnects
>> in the system and we want to keep this configuration until all consumers
>> have probed and expressed their bandwidth needs. This is because we don't
>> want to change the configuration by starting to disable unused paths until
>> every user had a chance to request the amount of bandwidth it needs.
>>
>> To accomplish this we will implement an interconnect specific sync_state
>> callback which will synchronize (aggregate and set) the current bandwidth
>> settings when all consumers have been probed.
>>
>> Signed-off-by: Georgi Djakov <georgi.djakov@...aro.org>
>> ---
>>   drivers/interconnect/core.c           | 61 +++++++++++++++++++++++++++
>>   include/linux/interconnect-provider.h |  5 +++
>>   2 files changed, 66 insertions(+)
>>
>> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
>> index e5f998744501..0c4e38d9f1fa 100644
>> --- a/drivers/interconnect/core.c
>> +++ b/drivers/interconnect/core.c
>> @@ -26,6 +26,8 @@
>>
>>   static DEFINE_IDR(icc_idr);
>>   static LIST_HEAD(icc_providers);
>> +static int providers_count;
>> +static bool synced_state;
>>   static DEFINE_MUTEX(icc_lock);
>>   static struct dentry *icc_debugfs_dir;
>>
>> @@ -255,6 +257,12 @@ static int aggregate_requests(struct icc_node *node)
>>                          continue;
>>                  p->aggregate(node, r->tag, r->avg_bw, r->peak_bw,
>>                               &node->avg_bw, &node->peak_bw);
>> +
>> +               /* during boot use the initial bandwidth as a floor value */
>> +               if (!synced_state) {
>> +                       node->avg_bw = max(node->avg_bw, node->init_avg);
>> +                       node->peak_bw = max(node->peak_bw, node->init_peak);
>> +               }
> 
> Sorry I didn't reply earlier.
> 
> I liked your previous approach with the get_bw ops. The v2 approach
> forces every interconnect provider driver to set up these values even
> if they are okay with just maxing out the bandwidth. Also, if they can
> actually query their hardware, this adds additional steps for them.

The problem with using something like get_bw() is that while we can 
dynamically query the HW, we have far less granularity in HW than we 
have nodes in the framework. We vote at BCM-level granularity, but each 
BCM can have many nodes. For example, the sdm845 CN0 BCM has 47 nodes. 
If we implement get_bw() generically, then it would return the BW for 
each node, which would be the queried BCM vote scaled to account for 
differences in BCM/node widths. While this could be useful in general as 
an informational callback, we wouldn't want to use this as a proxy for 
our initial BW vote requirements. For CN0, we wouldn't want or need to 
vote 47 times for the same CN0 BCM. Each of the 47 node requests would 
result in the same BCM request.

All we'd really need is a single node per-BCM to serve as the proxy 
node. We'd query the HW, scale the queried value for the chosen proxy 
node, and set init_avg/init_peak appropriately. This would save a lot of 
unnecessary votes. Based on the current implementation, the set() call 
in icc_node_add() for initial BW wouldn't trigger any actual HW requests 
since we only queue BCMs that require updating in the aggregate() 
callback. However, the set() call in icc_sync_state() would, since we 
re-aggregate each node that has a non-zero init_avg/init_peak.

There's nothing stopping us from implementing get_bw() as if it were 
get_initial_bw(), but that only works until the framework decides to use 
get_bw() for more things than just the initial vote. I suppose we could 
also just have a "get_initial_bw" callback, but it only needs to be 
called once, so doesn't necessarily need a callback as opposed to 
additional init_avg/init_peak members in the icc_node struct.

> 
> I think the default should be:
> 1. Query the current bandwidth at boot and use that.
> 2. If that's not available, max out the bandwidth.
> 
> The interconnect providers that don't like maxing out and don't have
> real get_bw() capability can just cache and return the last set_bw()
> values. And they start off with those cached values matching whatever
> init_bw they need.
> 
> That way, the default case (can get bw or don't care about maxing out)
> would be easy and the extra work would be limited to drivers that want
> neither. >
> -Saravana
>