linux-kernel - Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875xyex10q.fsf@nvidia.com>
Date: Fri, 23 Feb 2024 14:21:12 -0800
From: Rahul Rameshbabu <rrameshbabu@...dia.com>
To: Jacob Keller <jacob.e.keller@...el.com>
Cc: Saeed Mahameed <saeed@...nel.org>, Leon Romanovsky <leon@...nel.org>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet
 <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
 <pabeni@...hat.com>, Jonathan Corbet <corbet@....net>, Richard Cochran
 <richardcochran@...il.com>, Tariq Toukan <tariqt@...dia.com>, Gal Pressman
 <gal@...dia.com>, Vadim Fedorenko <vadim.fedorenko@...ux.dev>, Andrew Lunn
 <andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>, Przemek Kitszel
 <przemyslaw.kitszel@...el.com>, Ahmed  Zaki <ahmed.zaki@...el.com>,
 Alexander Lobakin <aleksander.lobakin@...el.com>, Hangbin Liu
 <liuhangbin@...il.com>, Paul  Greenwalt <paul.greenwalt@...el.com>, Justin
 Stitt <justinstitt@...gle.com>, Randy Dunlap <rdunlap@...radead.org>,
 Maxime Chevallier <maxime.chevallier@...tlin.com>, Kory Maincent
 <kory.maincent@...tlin.com>, Wojciech Drewek <wojciech.drewek@...el.com>,
 Vladimir Oltean <vladimir.oltean@....com>, Jiri Pirko <jiri@...nulli.us>,
 Alexandre Torgue <alexandre.torgue@...s.st.com>, Jose Abreu
 <joabreu@...opsys.com>, Dragos  Tatulea <dtatulea@...dia.com>,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-doc@...r.kernel.org
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx
 hardware timestamping statistics

On Fri, 23 Feb, 2024 13:07:08 -0800 Jacob Keller <jacob.e.keller@...el.com> wrote:
> On 2/23/2024 11:24 AM, Rahul Rameshbabu wrote:
>> +/**
>> + * struct ethtool_ts_stats - HW timestamping statistics
>> + * @layer: input field denoting whether stats should be queried from the DMA or
>> + *        PHY timestamping layer. Defaults to the active layer for packet
>> + *        timestamping.
>> + * @tx_stats: struct group for TX HW timestamping
>> + *	@pkts: Number of packets successfully timestamped by the queried
>> + *	      layer.
>> + *	@lost: Number of packet timestamps that failed to get applied on a
>> + *	      packet by the queried layer.
>> + *	@late: Number of packet timestamps that were delivered by the
>> + *	      hardware but were lost due to arriving too late.
>> + *	@err: Number of timestamping errors that occurred on the queried
>> + *	     layer.
>> + */
>> +struct ethtool_ts_stats {
>> +	enum ethtool_ts_stats_layer layer;
>> +	struct_group(tx_stats,
>> +		u64 pkts;
>> +		u64 lost;
>> +		u64 late;
>> +		u64 err;
>> +	);
>> +};
>
> The Intel ice drivers has the following Tx timestamp statistics:
>
> tx_hwtstamp_skipped - indicates when we get a Tx timestamp request but
> are unable to fulfill it.
> tx_hwtstamp_timeouts - indicates we had a Tx timestamp skb waiting for a
> timestamp from hardware but it didn't get received within some internal
> time limit.

This is interesting. In mlx5 land, the only case where we are unable to
fulfill a hwtstamp is when the timestamp information is lost or late.

lost for us means that the timestamp never arrived within some internal
time limit that our device will supposedly never be able to deliver
timestamp information after that point.

late for us is that we got hardware timestamp information delivered
after that internal time limit. We are able to track this by using
identifiers in our completion events and we only release references to
these identifiers upon delivery (never delivering leaks the references.
Enough build up leads to a recovery flow). The theory for us is that
late timestamp information arrival after that period of time should not
happen. However the truth is that it does happen and we want our driver
implementation to be resilient to this case rather than trusting the
time interval.

Do you have any example of a case of skipping timestamp information that
is not related to lack of delivery over time? I am wondering if this
case is more like a hardware error or not. Or is it more like something
along the lines of being busy/would impact line rate of timestamp
information must be recorded?

> tx_hwtstamp_flushed - indicates that we flushed an outstanding timestamp
> before it completed, such as if the link resets or similar.
> tx_hwtstamp_discarded - indicates that we obtained a timestamp from
> hardware but were unable to complete it due to invalid cached data used
> for timestamp extension.
>
> I think these could be translated roughly to one of the lost, late, or
> err stats. I am a bit confused as to how drivers could distinguish
> between lost and late, but I guess that depends on the specific hardware
> design.
>
> In theory we could keep some of these more detailed stats but I don't
> think we strictly need to be as detailed as the ice driver is.

We also converged a statistic in the mlx5 driver to the simple error
counter here. I think what makes sense is design specific counters
should be exposed as driver specific counters and more common counters
should be converged into the ethtool_ts_stats struct.

>
> The only major addition I think is the skipped stat, which I would
> prefer to have. Perhaps that could be tracked in the netdev layer by
> checking whether the skb flags to see whether or not the driver actually
> set the appropriate flag?

I guess the problem is how would the core stack know at what layer this
was skipped at (I think Kory's patch series can be used to help with
this since it's adding a common interface in ethtool to select the
timestamping layer). As of today, mlx5 is the only driver I know of that
supports selecting between the DMA and PHY layers for timestamp
information.

>
> I think i can otherwise translate the flushed status to the lost
> category, the timeout to the late category, and everything else to the
> error category. I can easily add a counter to track completed timestamps
> as well.

Sounds good.

--
Thanks,

Rahul Rameshbabu