netdev - Re: [PATCH net-next v2 1/2] net: extend ndo_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <willemdebruijn.kernel.976a69fefaaa@gmail.com>
Date: Sun, 25 Jan 2026 16:41:03 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Gerhard Engleder <gerhard@...leder-embedded.com>, 
 Willem de Bruijn <willemdebruijn.kernel@...il.com>, 
 Kevin Yang <yyd@...gle.com>, 
 Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, 
 Willem de Bruijn <willemb@...gle.com>, 
 Harshitha Ramamurthy <hramamurthy@...gle.com>, 
 Andrew Lunn <andrew+netdev@...n.ch>, 
 David Miller <davem@...emloft.net>, 
 Eric Dumazet <edumazet@...gle.com>, 
 Paolo Abeni <pabeni@...hat.com>, 
 Joshua Washington <joshwash@...gle.com>, 
 Richard Cochran <richardcochran@...il.com>
Subject: Re: [PATCH net-next v2 1/2] net: extend ndo_get_tstamp for other
 timestamp types

Gerhard Engleder wrote:
> On 22.01.26 23:28, Willem de Bruijn wrote:
> > Gerhard Engleder wrote:
> >> On 21.01.26 17:04, Kevin Yang wrote:
> >>> Network device hardware timestamps (hwtstamps) and the system's
> >>> clock (ktime) often originate from different clock domains.
> >>> This makes it hard to directly calculate the duration between
> >>> a hardware-timestamped event and a system-time event by simple
> >>> subtraction.
> >>>
> >>> This patch extends ndo_get_tstamp to allow a netdev to provide
> >>> a hwtstamp into the system's CLOCK_REALTIME domain. This allows a
> >>> driver to either perform a conversion by estimating or, if the
> >>> clocks are kept synchronized, return the original timestamp directly.
> >>> Other clock domains, e.g. CLOCK_MONOTONIC_RAW can also be added when
> >>> a use surfaces.
> >>>
> >>> This is useful for features that need to measure the delay between
> >>> a packet's hardware arrival/departure and a later software event.
> >>> For example, the TCP stack can use this to measure precise
> >>> packet receive delays, which is a requirement for the upcoming
> >>> TCP Swift [1] congestion control algorithm.
> >>>
> >>> [1] Kumar, Gautam, et al. "Swift: Delay is simple and effective
> >>> for congestion control in the datacenter." Proceedings of the
> >>> Annual conference of the ACM Special Interest Group on Data
> >>> Communication on the applications, technologies, architectures,
> >>> and protocols for computer communication. 2020.
> >>>
> >>> Signed-off-by: Kevin Yang <yyd@...gle.com>
> >>> Reviewed-by: Willem de Bruijn <willemb@...gle.com>
> >>
> >> Like Jakub in his reply
> >> https://lore.kernel.org/netdev/20260119115710.6fdde8c0@kernel.org/
> >> for me also the question why this is a driver implementation came to my
> >> mind.
> >>
> >> With vclocks it is already possible to get timestamps for arbitrary
> >> clock domains in parallel. So it is already possible to synchronize
> >> the hwtstamp to CLOCK_REALTIME, CLOCK_MONOTONIC, ... in parallel.
> >> Therefore, user space synchronisation is needed, but e.g. ptp4l does
> >> a much better synchronisation job than your solution.
> >>
> >> Maybe CLOCK_REALTIME is not supported by ptp4l, because due to daytime
> >> saving this clock jumps. IMO these jumps will also be problem for
> >> your solution, as it will lead to wrong delays two times a year.
> >> So usually CLOCK_TAI or CLOCK_MONOTONIC would be a better choice.
> >>
> >> To sum up: IMO you suggest a driver specific in-kernel solution where
> >> already a driver independent user space solution with higher accuracy
> >> exists.
> > 
> > Definitely a promising alternative.
> > 
> > With multiple netdevices, a TCP listener socket may receive packets
> > from all devices. This would need new infrastructure to lookup the
> > correct vclock for a given net_device, cannot hardcode a choice with
> > SOF_TIMESTAMPING_BIND_PHC.
> > 
> > And this needs to happen for every packet, so with minimal overhead.
> > 
> > Though for established connections the expectation will be that
> > packets generally arrive on the same netdevice. Bar infrequent path
> > changes such as from sk_rethink_txhash on the peer. So there this
> > value can perhaps be cached.
> > 
> > It would still have to be learned by the kernel, no explicit
> > setsockopt.
> 
> Maybe it would also be an option, that the kernel learns with which
> clock domain the timestamps of the PHC and vclocks correlate. Then
> the TCP stack could calculate the delay if it finds a valid e.g.
> CLOCK_MONOTONIC timestamp in the packet. This would make the
> TCP listener socket independent from the devices. Just an idea, without
> thinking about implementation details.

I think we're on the same page.

- use the existing vclocks
- look up the right vclock based on the original incoming iface
- cache this known clock with an established socket

But I also have not looked at how/whether the lookup infra can be
implemented to find a vclock automatically, i.e., without userspace
admin.

In some cases shinfo hwtstamp raw format may actually be the
CLOCK_REALTIME that TCP requires. But if the raw clock is not
realtime, we'll have to adjust based on timecounter/cyclecounter.