linux-kernel - Re: [PATCH net-next V2 2/3] net/mlx5e: Add device PCIe congestion ethtool stats

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b921eefe-3220-4b38-8b41-be6ddd98f913@gmail.com>
Date: Tue, 15 Jul 2025 16:59:43 +0300
From: Tariq Toukan <ttoukan.linux@...il.com>
To: Jakub Kicinski <kuba@...nel.org>, Dragos Tatulea <dtatulea@...dia.com>
Cc: Tariq Toukan <tariqt@...dia.com>, Eric Dumazet <edumazet@...gle.com>,
 Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew+netdev@...n.ch>,
 "David S. Miller" <davem@...emloft.net>, Saeed Mahameed <saeed@...nel.org>,
 Gal Pressman <gal@...dia.com>, Leon Romanovsky <leon@...nel.org>,
 Saeed Mahameed <saeedm@...dia.com>, Mark Bloch <mbloch@...dia.com>,
 Jonathan Corbet <corbet@....net>, netdev@...r.kernel.org,
 linux-rdma@...r.kernel.org, linux-doc@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next V2 2/3] net/mlx5e: Add device PCIe congestion
 ethtool stats

On 14/07/2025 18:26, Jakub Kicinski wrote:
> On Sat, 12 Jul 2025 07:55:27 +0000 Dragos Tatulea wrote:
>>> The metrics make sense, but utilization has to be averaged over some
>>> period of time to be meaningful. Can you shad any light on what the
>>> measurement period or algorithm is?
>>
>> The measurement period in FW is 200 ms.
> 
> SG, please include in the doc.
>   
>>>> +	changes = cong_event->state ^ new_cong_state;
>>>> +	if (!changes)
>>>> +		return;
>>>
>>> no risk of the high / low events coming so quickly we'll miss both?
>> Yes it is possible and it is fine because short bursts are not counted. The
>> counters are for sustained high PCI BW usage.
>>
>>> Should there be a counter for "mis-firing" of that sort?
>>> You'd be surprised how long the scheduling latency for a kernel worker
>>> can be on a busy server :(
>>>   
>> The event is just a notification to read the state from FW. If the
>> read is issued later and the state has not changed then it will not be
>> considered.
> 
> 200ms is within the range of normal scheduler latency on a busy server.
> It's not a deal breaker, but I'd personally add a counter for wakeups
> which did not result in any state change. Likely recent experience
> with constant EEVDF regressions and sched_ext is coloring my judgment.
> 

NP with that.
We'll add it as a followup patch, after it's implemented and properly 
tested.

Same applies for the requested devlink config (replacing the sysfs).

For now, I'll respin without the configuration part and the extra counter.