[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b921eefe-3220-4b38-8b41-be6ddd98f913@gmail.com>
Date: Tue, 15 Jul 2025 16:59:43 +0300
From: Tariq Toukan <ttoukan.linux@...il.com>
To: Jakub Kicinski <kuba@...nel.org>, Dragos Tatulea <dtatulea@...dia.com>
Cc: Tariq Toukan <tariqt@...dia.com>, Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>, Saeed Mahameed <saeed@...nel.org>,
Gal Pressman <gal@...dia.com>, Leon Romanovsky <leon@...nel.org>,
Saeed Mahameed <saeedm@...dia.com>, Mark Bloch <mbloch@...dia.com>,
Jonathan Corbet <corbet@....net>, netdev@...r.kernel.org,
linux-rdma@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next V2 2/3] net/mlx5e: Add device PCIe congestion
ethtool stats
On 14/07/2025 18:26, Jakub Kicinski wrote:
> On Sat, 12 Jul 2025 07:55:27 +0000 Dragos Tatulea wrote:
>>> The metrics make sense, but utilization has to be averaged over some
>>> period of time to be meaningful. Can you shad any light on what the
>>> measurement period or algorithm is?
>>
>> The measurement period in FW is 200 ms.
>
> SG, please include in the doc.
>
>>>> + changes = cong_event->state ^ new_cong_state;
>>>> + if (!changes)
>>>> + return;
>>>
>>> no risk of the high / low events coming so quickly we'll miss both?
>> Yes it is possible and it is fine because short bursts are not counted. The
>> counters are for sustained high PCI BW usage.
>>
>>> Should there be a counter for "mis-firing" of that sort?
>>> You'd be surprised how long the scheduling latency for a kernel worker
>>> can be on a busy server :(
>>>
>> The event is just a notification to read the state from FW. If the
>> read is issued later and the state has not changed then it will not be
>> considered.
>
> 200ms is within the range of normal scheduler latency on a busy server.
> It's not a deal breaker, but I'd personally add a counter for wakeups
> which did not result in any state change. Likely recent experience
> with constant EEVDF regressions and sched_ext is coloring my judgment.
>
NP with that.
We'll add it as a followup patch, after it's implemented and properly
tested.
Same applies for the requested devlink config (replacing the sysfs).
For now, I'll respin without the configuration part and the extra counter.
Powered by blists - more mailing lists