[<prev] [next>] [day] [month] [year] [list]
Message-ID: <2ef44d81-fb6f-5753-de62-e36778be0537@linaro.org>
Date: Fri, 4 Feb 2022 12:50:24 +0100
From: Daniel Lezcano <daniel.lezcano@...aro.org>
To: Chetan Mistry <Chetan.Mistry@....com>,
Lukasz Luba <Lukasz.Luba@....com>
Cc: "rafael@...nel.org" <rafael@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"amitk@...nel.org" <amitk@...nel.org>,
"rui.zhang@...el.com" <rui.zhang@...el.com>,
"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>
Subject: Re: [PATCH v2][RFC 1/2] Implement Ziegler-Nichols Heuristic
Thanks!
On 04/02/2022 12:48, Chetan Mistry wrote:
> Hi Daniel,
>
> Here is a link to the Compiled Jupyter Notebook with the performance
> analysis of the proposed system vs what is currently there:
>
> https://nbviewer.org/gist/chemis01/0e86ad81508860659a57338dae8274f9
> <https://nbviewer.org/gist/chemis01/0e86ad81508860659a57338dae8274f9>
>
> Kind Regards,
>
> Chetan
>
> *From: *Lukasz Luba <lukasz.luba@....com>
> *Date: *Friday, 7 January 2022 at 13:39
> *To: *Daniel Lezcano <daniel.lezcano@...aro.org>
> *Cc: *rafael@...nel.org <rafael@...nel.org>,
> linux-kernel@...r.kernel.org <linux-kernel@...r.kernel.org>,
> amitk@...nel.org <amitk@...nel.org>, Chetan Mistry
> <Chetan.Mistry@....com>, rui.zhang@...el.com <rui.zhang@...el.com>,
> linux-pm@...r.kernel.org <linux-pm@...r.kernel.org>
> *Subject: *Re: [PATCH v2][RFC 1/2] Implement Ziegler-Nichols Heuristic
>
>
>
> On 1/7/22 11:52 AM, Daniel Lezcano wrote:
>> On 06/01/2022 14:16, Lukasz Luba wrote:
>>>
>>> Thank you for fast response!
>>>
>>> On 1/6/22 12:16 PM, Daniel Lezcano wrote:
>>>>
>>>> Hi Lukasz,
>>>>
>>>> On 06/01/2022 12:54, Lukasz Luba wrote:
>>>>> Hi Daniel,
>>>>>
>>>>> Could you have a look at this, please?
>>>>
>>>> Yes, I had a quick look at the code and went to the algorithm
>>>> description.
>>>>
>>>> Still digesting ...
>>>>
>>>>> On 12/17/21 6:49 PM, Chetankumar Mistry wrote:
>>>>>> Implement the Ziegler-Nichols Heuristic algorithm to better
>>>>>> estimate the PID Coefficients for a running platform.
>>>>>> The values are tuned to minimuse the amount of overshoot in
>>>>>> the temperature of the platform and subsequently minimise
>>>>>> the number of switches for cdev states.
>>>>>>
>>>>>> Signed-off-by: Chetankumar Mistry <chetan.mistry@....com>
>>>>>
>>>>>
>>>>> This is the continuation of the previous idea to have
>>>>> better k_* values. You might remember this conversation [1].
>>>>>
>>>>> I've spent some time researching papers how and what can be done
>>>>> in this field and if possible to plumb in to the kernel.
>>>>> We had internal discussions (~2017) of one method fuzzy-logic that I
>>>>> found back then, but died at the begging not fitting into this
>>>>> IPA kernel specific environment and user-space. Your suggestion with
>>>>> observing undershooting and overshooting results sparked better idea.
>>>>> I thought it's worth to invest in it but I didn't have
>>>>> time. We are lucky, Chetan was designated to help me and
>>>>> experiment/implement/test these ideas and here is the patch set.
>>>>>
>>>>> He's chosen the Ziegler-Nichols method, which shows really
>>>>> good results in benchmarks (Geekbench and GFXbench on hikey960 Android).
>>>>> The improved performance in Geekbench is ~10% (vs. old IPA).
>>>>
>>>> +10% perf improvements sounds great. What about the temperature
>>>> mitigation (temp avg + stddev) ?
>>>
>>> Chetan would respond about that with the link to the .html file.
>>> We just have to create an official public server space for it.
>>> I hope till Monday evening we would get something.
>>>
>>>>
>>>>> The main question from our side is the sysfs interface
>>>>> which we could be used to trigger this algorithm for
>>>>> better coefficients estimations.
>>>>> We ask user to echo to some sysfs files in thermal zone
>>>>> and start his/her workload. This new IPA 'learns' the system
>>>>> utilization and reaction in temperature. After a few rounds,
>>>>> we get better fitted coefficients.
>>>>> If you need more background about the code or mechanisms, or tests,
>>>>> I'm sure Chetan is happy to provide you those.
>>>>
>>>> I'm worried about the complexity of the algorithm and the overhead
>>>> implied.
>>>>
>>>> The k_* factors are tied with the system and the thermal setup (fan,
>>>> heatsink, processor, opp, ...). So IIUC when the factors are found, they
>>>> should not change and could be part of the system setup.
>>>
>>> True, they are found and will be fixed for that board.
>>>
>>>>
>>>> Would the algorithm fit better in a separate userspace kernel tooling?
>>>> So we can run it once and find the k_* for a board.
>>>
>>> We wanted to be part of IPA in kernel because:
>>> - the logic needs access to internals of IPA
>>> - it would be easy accessible for all distros out-of-box
>>> - no additional maintenance and keeping in sync two codes, especially
>>> those in some packages for user-space
>>
>> Sorry, I'm not convinced :/
>>
>> AFAICT, the temperature and the sampling rate should be enough
>> information to find out the k_*
>
> We are allowing to overshoot the temperature by not capping the
> power actors, but we have a safety net to not overshoot too much.
> It's internal decision insdie IPA. Userspace would have to
> re-implement whole IPA logic and take control over cooling
> device states - which would contradict the decision from IPA
> controlling the same thermal zone. The finding of coefficients
> is by testing many values while running. The post processing
> of the data (temp., power requests, frequency, etc) won't tell us
> the the limits. We have to check them. That means the user-space tool
> would have to re-implement major part of IPA, but also somehow
> get the CPUs utilization, then use the obsolete user-space
> governor API to experiment with them.
>
>>
>> IMO, an userspace tool in ./tools/thermal/ipa is the right place
>>
>> So if you give the tooling to the SoC vendor via the thermal ones, with
>> a file containing the temp + timestamp, they should be able to find the
>> k_* and setup their boards.
>
> This feature is aimed for every user of the device, even w/o
> expert knowledge in thermal/power. If vendor or OEM didn't
> support properly the board, users could do this and they don't
> have to be restricted (IMO).
> Apart from that, I see vendors are rather interested in investing
> in their proprietary solutions, not willing to share know-how
> in open source power/thermal mechanisms. Then user is restricted
> IMO in using the board. This might block e.g. research of some
> PhD students, who have good ideas, but the restrictions of the platform
> prevent them or cause the behavior of the board that they cannot
> control. I would like to enable everyone to use fully the potential
> of the HW/SW - even end-user.
>
>>
>> Actually my opinion is the kernel should not handle everything and the
>> SoC vendor should at least do some work to setup their system. If they
>> are able to find out the sustainable power, they can do the same for the
>> right coefficients.
>
> Ideally, yes, I would also like to see that. As you said a few months
> ago during review of my former patches, a lot of this 'sustainable
> power' entries in DT are Linaro contribution. I wish vendors also
> contribute, but c'est la vie.
>
>>
>>>> Additionally, the values can be stored in the Documentation for
>>>> different board and a documentation on how to use the tool.
>>>>
>>>> Then up to the SoC vendor to setup the k_* in sysfs, so no need to
>>>> change any interface.
>>>
>>> It wouldn't be for SoC vendor, but up to the OEM or board designer,
>>> because the same SoC might have different thermal headroom thanks
>>> to better cooling or bigger PCB, etc.
>>
>> Right, s/SoC vendor/SoC platform/
>>
>>> I agree that these optimized k_* values might be shared in the kernel.
>>> Ideally I would see them in the board's DT file, in the thermal zone,
>>> but I'm afraid they are not 'Device description' so don't fit into DT
>>> scope. They are rather optimizations of pure kernel mechanism.
>>>
>>> where would be a good place for it? Maybe a new IPA Documentation/
>>> sub-directory?
>>
>> You can improve the documentation in:
>>
>> Documentation/driver-api/thermal/power_allocator.rst
>>
>> And if we agree on a tools/thermal/ipa, the documentation with examples
>> and some SoC reference can be put there also
>>
>>>>> If you are interested in those analyses we can find a way to share a>
>>>>> .html file with the results from LISA notebook.
>>>>
>>>> Yes,
>>>
>>> Sure thing, let me arrange that.
>>
>>
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy
> the information in any medium. Thank you.
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Powered by blists - more mailing lists