[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+e8w0cA8+uE5QEUFY0fkQuwM7C5=8ULvRNaY2vh0YT4w@mail.gmail.com>
Date: Tue, 14 Oct 2025 23:03:15 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Harshitha Ramamurthy <hramamurthy@...gle.com>, netdev@...r.kernel.org, joshwash@...gle.com,
andrew+netdev@...n.ch, davem@...emloft.net, kuba@...nel.org,
pabeni@...hat.com, willemb@...gle.com, pkaligineedi@...gle.com,
jfraker@...gle.com, ziweixiao@...gle.com, thostet@...gle.com,
linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH net] gve: Check valid ts bit on RX descriptor before hw timestamping
On Tue, Oct 14, 2025 at 1:19 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> Eric Dumazet wrote:
> > On Mon, Oct 13, 2025 at 5:47 PM Harshitha Ramamurthy
> > <hramamurthy@...gle.com> wrote:
> > >
> > > From: Tim Hostetler <thostet@...gle.com>
> > >
> > > The device returns a valid bit in the LSB of the low timestamp byte in
> > > the completion descriptor that the driver should check before
> > > setting the SKB's hardware timestamp. If the timestamp is not valid, do not
> > > hardware timestamp the SKB.
> > >
> > > Cc: stable@...r.kernel.org
> > > Fixes: b2c7aeb49056 ("gve: Implement ndo_hwtstamp_get/set for RX timestamping")
> > > Reviewed-by: Joshua Washington <joshwash@...gle.com>
> > > Signed-off-by: Tim Hostetler <thostet@...gle.com>
> > > Signed-off-by: Harshitha Ramamurthy <hramamurthy@...gle.com>
> > > ---
> > > drivers/net/ethernet/google/gve/gve.h | 2 ++
> > > drivers/net/ethernet/google/gve/gve_desc_dqo.h | 3 ++-
> > > drivers/net/ethernet/google/gve/gve_rx_dqo.c | 18 ++++++++++++------
> > > 3 files changed, 16 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
> > > index bceaf9b05cb4..4cc6dcbfd367 100644
> > > --- a/drivers/net/ethernet/google/gve/gve.h
> > > +++ b/drivers/net/ethernet/google/gve/gve.h
> > > @@ -100,6 +100,8 @@
> > > */
> > > #define GVE_DQO_QPL_ONDEMAND_ALLOC_THRESHOLD 96
> > >
> > > +#define GVE_DQO_RX_HWTSTAMP_VALID 0x1
> > > +
> > > /* Each slot in the desc ring has a 1:1 mapping to a slot in the data ring */
> > > struct gve_rx_desc_queue {
> > > struct gve_rx_desc *desc_ring; /* the descriptor ring */
> > > diff --git a/drivers/net/ethernet/google/gve/gve_desc_dqo.h b/drivers/net/ethernet/google/gve/gve_desc_dqo.h
> > > index d17da841b5a0..f7786b03c744 100644
> > > --- a/drivers/net/ethernet/google/gve/gve_desc_dqo.h
> > > +++ b/drivers/net/ethernet/google/gve/gve_desc_dqo.h
> > > @@ -236,7 +236,8 @@ struct gve_rx_compl_desc_dqo {
> > >
> > > u8 status_error1;
> > >
> > > - __le16 reserved5;
> > > + u8 reserved5;
> > > + u8 ts_sub_nsecs_low;
> > > __le16 buf_id; /* Buffer ID which was sent on the buffer queue. */
> > >
> > > union {
> > > diff --git a/drivers/net/ethernet/google/gve/gve_rx_dqo.c b/drivers/net/ethernet/google/gve/gve_rx_dqo.c
> > > index 7380c2b7a2d8..02e25be8a50d 100644
> > > --- a/drivers/net/ethernet/google/gve/gve_rx_dqo.c
> > > +++ b/drivers/net/ethernet/google/gve/gve_rx_dqo.c
> > > @@ -456,14 +456,20 @@ static void gve_rx_skb_hash(struct sk_buff *skb,
> > > * Note that this means if the time delta between packet reception and the last
> > > * clock read is greater than ~2 seconds, this will provide invalid results.
> > > */
> > > -static void gve_rx_skb_hwtstamp(struct gve_rx_ring *rx, u32 hwts)
> > > +static void gve_rx_skb_hwtstamp(struct gve_rx_ring *rx,
> > > + const struct gve_rx_compl_desc_dqo *desc)
> > > {
> > > u64 last_read = READ_ONCE(rx->gve->last_sync_nic_counter);
> > > struct sk_buff *skb = rx->ctx.skb_head;
> > > - u32 low = (u32)last_read;
> > > - s32 diff = hwts - low;
> > > -
> > > - skb_hwtstamps(skb)->hwtstamp = ns_to_ktime(last_read + diff);
> > > + u32 ts, low;
> > > + s32 diff;
> > > +
> > > + if (desc->ts_sub_nsecs_low & GVE_DQO_RX_HWTSTAMP_VALID) {
> > > + ts = le32_to_cpu(desc->ts);
> > > + low = (u32)last_read;
> > > + diff = ts - low;
> > > + skb_hwtstamps(skb)->hwtstamp = ns_to_ktime(last_read + diff);
> > > + }
> >
> > If (desc->ts_sub_nsecs_low & GVE_DQO_RX_HWTSTAMP_VALID) can vary among
> > all packets received on this queue,
> > I will suggest you add an
> >
> > else {
> > skb_hwtstamps(skb)->hwtstamp = 0;
> > }
> >
> > This is because napi_reuse_skb() does not currently clear this field.
> >
> > So if a prior skb had hwtstamp set to a timestamp, then merged in GRO,
> > and recycled, we have the risk of reusing an old timestamp
> > if GVE_DQO_RX_HWTSTAMP_VALID is unset.
> >
> > We could also handle this generically, at the cost of one extra
> > instruction in the fast path.
>
> That would be safest. This may not be limited to GVE.
Yes, I started counting the bugs yesterday.
But considering this is going to be a patch that is separate and could
be missed in various old kernels,
I think a belt and suspender mode is safer :
Make sure each individual driver does not assume the field is zero
(or add a check about this assertion, which is going to be more
expensive anyway)
>
> NICs supporting line rate timestamping is not universal. Older devices
> predominantly aim to support low rate PTP messages AFAIK.
>
> On the Tx path there are known rate limits to the number of packets
> that some can timestamp, e.g., because of using PHY registers.
>
> On the Rx path packets are selected by filters such as
> HWTSTAMP_FILTER_PTP_V2_L2_SYNC. But its not guaranteed that these can
> be matched and timestamped at any rate? Essentially trusting that no
> more PTP packets arrive than the device can timestamp.
>
> e1000e_rx_hwtstamp is a good example. It has a descriptor bit whether
> a packet was timestamped, similar to GVE. And only supports a single
> outstanding request:
>
> /* The Rx time stamp registers contain the time stamp. No other
> * received packet will be time stamped until the Rx time stamp
> * registers are read. Because only one packet can be time stamped
> * at a time, the register values must belong to this packet and
> * therefore none of the other additional attributes need to be
> * compared.
> */
>
> Perhaps not the best example as it does not use napi_reuse_skb. I
> thought of too late ;) But there are quite likely more.
>
I took a look at various uses in RX path in drivers/net/ethernet, I
found many bugs
drivers/net/ethernet/amd/xgbe/xgbe-drv.c:2374
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c:1098
drivers/net/ethernet/broadcom/tg3.c:6898
drivers/net/ethernet/cavium/liquidio/lio_core.c
drivers/net/ethernet/freescale/enetc/enetc.c
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
drivers/net/ethernet/meta/fbnic/fbnic_txrx.c...
Then I stopped chasing.
I will send a fix in napi_reuse_skb()
diff --git a/net/core/gro.c b/net/core/gro.c
index 5ba4504cfd28ec26f487bfb96645e25c4845d720..76f9c3712422109ad00f15f6804abf6a8b00db43
100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -639,6 +639,8 @@ EXPORT_SYMBOL(gro_receive_skb);
static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
{
+ struct skb_shared_info *shinfo;
+
if (unlikely(skb->pfmemalloc)) {
consume_skb(skb);
return;
@@ -655,8 +657,12 @@ static void napi_reuse_skb(struct napi_struct
*napi, struct sk_buff *skb)
skb->encapsulation = 0;
skb->ip_summed = CHECKSUM_NONE;
- skb_shinfo(skb)->gso_type = 0;
- skb_shinfo(skb)->gso_size = 0;
+
+ shinfo = skb_shinfo(skb);
+ shinfo->gso_type = 0;
+ shinfo->gso_size = 0;
+ shinfo->hwtstamps.hwtstamp = 0;
+
if (unlikely(skb->slow_gro)) {
skb_orphan(skb);
skb_ext_reset(skb);
Powered by blists - more mailing lists