netdev - Re: [PATCH net 0/2] aqc111: Thermal throttling feature

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 12 Dec 2018 12:28:39 -0800
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Igor Russkikh <Igor.Russkikh@...antia.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Cc:     Dmitry Bezrukov <Dmitry.Bezrukov@...antia.com>,
        "andrew@...n.ch" <andrew@...n.ch>
Subject: Re: [PATCH net 0/2] aqc111: Thermal throttling feature

On 12/12/18 12:08 PM, Igor Russkikh wrote:
> 
>>
>> The idea of having the PHY/network device as a cooling agent is
>> something valuable, but as Andrew pointed out, you need to expose this
>> as a standard HWMON device, and you need to let user-space implement the
>> appropriate thermal policy, not do that in the network driver underneath
>> the user's feet with no feedback other than link dropped, got
>> re-negotiated at a different speed. How would one be able to
>> differentiate those events from a faulty link partner for instance?
> 
>>
>> None of what you are doing here is specific to your device driver and
>> the policy of downgrading the link speed to lower the thermal budget is
>> something that is nearly universally applicable to all network
>> equipments because higher speeds just require higher power.
>>
> 
> Hi Florian,
> Partially agreed with you, but as far as I know there is no much of
> ready to use infrastructure for this to use right now?

If you use programs like thermald, I am quite positive you could script
and action which involves re-negotiation of the link at a lower speed
and that would be something applicable to a variety of network devices.

> 
> IMHO that could be a both-way solution, where short term driver patch
> will secure against hardware burn out right now, and long term hwmon
> based infrastructure could handle that on userspace level.

The short term and most effective solution would be to have the firmware
running on the device do the thermal throttling, that way, if the host
CPU is crashed/unresponsive, you can still take corrective actions. Your
response to Andrew seems to suggest this is not possible, so if we are
reaching the critical junction temperature of your chip and that in
turn, causes the enclosure to melt down, then clearly the runaway
solution is not good.

> 
> A whole separate concern is how much userspace should be involved here.
> It could be a very device specific (and therefore driver specific) logic
> on how to do device's thermal control.

My problem with your approach is people doing the same thing to each and
every one of their driver and building policy, as opposed to mechanisms
in the kernel. If the argument is "user space may not be running a
thermal solution", then clearly you need a hardware driven (or firmware
driven) approach) which works across all possible use cases, including
those where appropriate SW is not there.

If you look at how your desktop PC likely manages the fans in the
chassis, they can be SW controlled, or ACPI controlled, for the same
reasons.
-- 
Florian