lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250711162504.2c0b365d@kernel.org>
Date: Fri, 11 Jul 2025 16:25:04 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Tariq Toukan <tariqt@...dia.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
 Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
 <davem@...emloft.net>, Saeed Mahameed <saeed@...nel.org>, Gal Pressman
 <gal@...dia.com>, "Leon Romanovsky" <leon@...nel.org>, Saeed Mahameed
 <saeedm@...dia.com>, Mark Bloch <mbloch@...dia.com>, Jonathan Corbet
 <corbet@....net>, <netdev@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
 <linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Dragos Tatulea
 <dtatulea@...dia.com>
Subject: Re: [PATCH net-next V2 2/3] net/mlx5e: Add device PCIe congestion
 ethtool stats

On Thu, 10 Jul 2025 09:51:31 +0300 Tariq Toukan wrote:
> +   * - `pci_bw_inbound_high`
> +     - The number of times the device crossed the high inbound pcie bandwidth
> +       threshold. To be compared to pci_bw_inbound_low to check if the device
> +       is in a congested state.
> +       If pci_bw_inbound_high == pci_bw_inbound_low then the device is not congested.
> +       If pci_bw_inbound_high > pci_bw_inbound_low then the device is congested.
> +     - Tnformative

The metrics make sense, but utilization has to be averaged over some
period of time to be meaningful. Can you shad any light on what the
measurement period or algorithm is?

> +	changes = cong_event->state ^ new_cong_state;
> +	if (!changes)
> +		return;

no risk of the high / low events coming so quickly we'll miss both?
Should there be a counter for "mis-firing" of that sort?
You'd be surprised how long the scheduling latency for a kernel worker
can be on a busy server :(

> +	cong_event->state = new_cong_state;
> +
> +	if (changes & MLX5E_INBOUND_CONG) {
> +		if (new_cong_state & MLX5E_INBOUND_CONG)
> +			cong_event->stats.pci_bw_inbound_high++;
> +		else
> +			cong_event->stats.pci_bw_inbound_low++;
> +	}
> +
> +	if (changes & MLX5E_OUTBOUND_CONG) {
> +		if (new_cong_state & MLX5E_OUTBOUND_CONG)
> +			cong_event->stats.pci_bw_outbound_high++;
> +		else
> +			cong_event->stats.pci_bw_outbound_low++;
> +	}

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ