netdev - Re: [PATCH net-next v2 1/2] net: extend ndo_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAPREpbZpjwpUkY11Qz_xbVbCm44DZw=3Z2GYu=K93z+L-_4EhA@mail.gmail.com>
Date: Tue, 27 Jan 2026 18:13:41 -0500
From: Kevin Yang <yyd@...gle.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Gerhard Engleder <gerhard@...leder-embedded.com>, Jakub Kicinski <kuba@...nel.org>, 
	netdev@...r.kernel.org, Willem de Bruijn <willemb@...gle.com>, 
	Harshitha Ramamurthy <hramamurthy@...gle.com>, Andrew Lunn <andrew+netdev@...n.ch>, 
	David Miller <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Paolo Abeni <pabeni@...hat.com>, Joshua Washington <joshwash@...gle.com>, 
	Richard Cochran <richardcochran@...il.com>
Subject: Re: [PATCH net-next v2 1/2] net: extend ndo_get_tstamp for other
 timestamp types

Just to clarify, the ptp vclocks approach is not contradictory to this patch.

I think converting the timestamp should be a net_device ndo function. Since
-  a TCP socket may receive packets from multiple net_devices
-  only the driver is aware of the device clock's details and whether
  a conversion is actually required

That conversion is implemented per device:
-  Some devices can identify the correct vclock and call
ptp_convert_timestamp(&hwtstamp, vclock). This is a valid approach,
but it requires the system admin to run phc2sys to sync the vclock to
REALTIME(or MONOTONIC), alongside necessary lookup infrastructure.
-  Some devices may already sync their clock to REALTIME natively.
In this case, hwtstamp is returned as-is without conversion.
-  Some devices may handle conversion internally without using PTP.
This is the case for our current GVE patch.

With that, I think this patch (extending ndo_get_tstamp) has value
regardless of the specific driver implementation.

As for the second patch and the question of why GVE does not use vclocks:
The current GVE patch is simple and self-contained. Switching to vclocks
would likely require adding PTP infrastructure to look up a vclock without
user interaction, which complicates development compared to the current
approach. Also, ptp_convert_timestamp involves a mutex lock, this might
raise performance concerns since the usage is on the TCP RX fast path.

On Sun, Jan 25, 2026 at 4:41 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> Gerhard Engleder wrote:
> > On 22.01.26 23:28, Willem de Bruijn wrote:
> > > Gerhard Engleder wrote:
> > >> On 21.01.26 17:04, Kevin Yang wrote:
> > >>> Network device hardware timestamps (hwtstamps) and the system's
> > >>> clock (ktime) often originate from different clock domains.
> > >>> This makes it hard to directly calculate the duration between
> > >>> a hardware-timestamped event and a system-time event by simple
> > >>> subtraction.
> > >>>
> > >>> This patch extends ndo_get_tstamp to allow a netdev to provide
> > >>> a hwtstamp into the system's CLOCK_REALTIME domain. This allows a
> > >>> driver to either perform a conversion by estimating or, if the
> > >>> clocks are kept synchronized, return the original timestamp directly.
> > >>> Other clock domains, e.g. CLOCK_MONOTONIC_RAW can also be added when
> > >>> a use surfaces.
> > >>>
> > >>> This is useful for features that need to measure the delay between
> > >>> a packet's hardware arrival/departure and a later software event.
> > >>> For example, the TCP stack can use this to measure precise
> > >>> packet receive delays, which is a requirement for the upcoming
> > >>> TCP Swift [1] congestion control algorithm.
> > >>>
> > >>> [1] Kumar, Gautam, et al. "Swift: Delay is simple and effective
> > >>> for congestion control in the datacenter." Proceedings of the
> > >>> Annual conference of the ACM Special Interest Group on Data
> > >>> Communication on the applications, technologies, architectures,
> > >>> and protocols for computer communication. 2020.
> > >>>
> > >>> Signed-off-by: Kevin Yang <yyd@...gle.com>
> > >>> Reviewed-by: Willem de Bruijn <willemb@...gle.com>
> > >>
> > >> Like Jakub in his reply
> > >> https://lore.kernel.org/netdev/20260119115710.6fdde8c0@kernel.org/
> > >> for me also the question why this is a driver implementation came to my
> > >> mind.
> > >>
> > >> With vclocks it is already possible to get timestamps for arbitrary
> > >> clock domains in parallel. So it is already possible to synchronize
> > >> the hwtstamp to CLOCK_REALTIME, CLOCK_MONOTONIC, ... in parallel.
> > >> Therefore, user space synchronisation is needed, but e.g. ptp4l does
> > >> a much better synchronisation job than your solution.
> > >>
> > >> Maybe CLOCK_REALTIME is not supported by ptp4l, because due to daytime
> > >> saving this clock jumps. IMO these jumps will also be problem for
> > >> your solution, as it will lead to wrong delays two times a year.
> > >> So usually CLOCK_TAI or CLOCK_MONOTONIC would be a better choice.
> > >>
> > >> To sum up: IMO you suggest a driver specific in-kernel solution where
> > >> already a driver independent user space solution with higher accuracy
> > >> exists.
> > >
> > > Definitely a promising alternative.
> > >
> > > With multiple netdevices, a TCP listener socket may receive packets
> > > from all devices. This would need new infrastructure to lookup the
> > > correct vclock for a given net_device, cannot hardcode a choice with
> > > SOF_TIMESTAMPING_BIND_PHC.
> > >
> > > And this needs to happen for every packet, so with minimal overhead.
> > >
> > > Though for established connections the expectation will be that
> > > packets generally arrive on the same netdevice. Bar infrequent path
> > > changes such as from sk_rethink_txhash on the peer. So there this
> > > value can perhaps be cached.
> > >
> > > It would still have to be learned by the kernel, no explicit
> > > setsockopt.
> >
> > Maybe it would also be an option, that the kernel learns with which
> > clock domain the timestamps of the PHC and vclocks correlate. Then
> > the TCP stack could calculate the delay if it finds a valid e.g.
> > CLOCK_MONOTONIC timestamp in the packet. This would make the
> > TCP listener socket independent from the devices. Just an idea, without
> > thinking about implementation details.
>
> I think we're on the same page.
>
> - use the existing vclocks
> - look up the right vclock based on the original incoming iface
> - cache this known clock with an established socket
>
> But I also have not looked at how/whether the lookup infra can be
> implemented to find a vclock automatically, i.e., without userspace
> admin.
>
> In some cases shinfo hwtstamp raw format may actually be the
> CLOCK_REALTIME that TCP requires. But if the raw clock is not
> realtime, we'll have to adjust based on timecounter/cyclecounter.