linux-kernel - Re: [RFC PATCH 0/7] Introduce thermal pressure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181019080218.GA35752@gmail.com>
Date:   Fri, 19 Oct 2018 10:02:18 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Thara Gopinath <thara.gopinath@...aro.org>
Cc:     linux-kernel@...r.kernel.org, mingo@...hat.com,
        peterz@...radead.org, rui.zhang@...el.com,
        gregkh@...uxfoundation.org, rafael@...nel.org,
        amit.kachhap@...il.com, viresh.kumar@...aro.org,
        javi.merino@...nel.org, edubezval@...il.com,
        daniel.lezcano@...aro.org, linux-pm@...r.kernel.org,
        quentin.perret@....com, ionela.voinescu@....com,
        vincent.guittot@...aro.org
Subject: Re: [RFC PATCH 0/7] Introduce thermal pressure

* Thara Gopinath <thara.gopinath@...aro.org> wrote:

> > Yeah, so I'd definitely suggest to not integrate this averaging into 
> > pelt.c in the fashion presented, because:
> > 
> >  - This couples your thermal throttling averaging to the PELT decay
> >    half-time AFAICS, which would break the other user every time the
> >    decay is changed/tuned.
>
> Let me pose the question in this manner. Today rt utilization, dl 
> utilization etc is tracked via PELT. The inherent idea is that, a cpu 
> has some capacity stolen by let us say rt task and so subtract the 
> capacity utilized by the rt task from cpu when calculating the 
> remaining capacity for a CFS task. Now, the idea behind thermal 
> pressure is that, the maximum available capacity of a cpu is limited 
> due to a thermal event, so take it out of the remaining capacity of a 
> cpu for a CFS task (at-least to start with). If utilization for rt 
> task, dl task etc is calculated via PELT and the capacity constraint 
> due to thermal event calculated by another averaging algorithm, there 
> can be some mismatch in the "capacity stolen" calculations, right? 
> Isnt't it better to track all the events that can limit the capacity of 
> a cpu via one algorithm ?

So what unifies RT and DL utilization is that those are all direct task 
loads independent of external factors.

Thermal load is more of a complex physical property of the combination of 
various internal and external factors: the whole system workload running 
(not just that single task), the thermal topology of the hardware, 
external temperatures, the hardware's and the governor's policy regarding 
thermal loads, etc. etc.

So while obviously when effective capacity of a CPU is calculated then 
these will all be subtracted from the maximum capacity of the CPU, but I 
think the thermal load metric and the averaging itself is probably 
dissimilar enough to not be calculated via the PELT half-life for 
example.

For example a reasonable future property would be match the speed of 
decay in the averaging to the observed speed of decay via temperature 
sensors? Most temperature sensors do a certain amount of averaging 
themselves as well - and some platforms might not expose temperatures at 
all, only 'got thermally throttled' / 'running at full speed' kind of 
feedback.

Anyway, this doesn't really impact the concept, it's an implementational 
detail, and much of this could be resolved if the averaging code in 
pelt.c was librarized a bit - and that's really what you did there in a 
fashion, I just think it should probably be abstracted out more clearly. 
(I have no clear implementational suggestions right now, other than 'try 
and see how it works out - it might be a bad idea'.)

Thanks,

	Ingo