lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+e8w0cA8+uE5QEUFY0fkQuwM7C5=8ULvRNaY2vh0YT4w@mail.gmail.com>
Date: Tue, 14 Oct 2025 23:03:15 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Harshitha Ramamurthy <hramamurthy@...gle.com>, netdev@...r.kernel.org, joshwash@...gle.com, 
	andrew+netdev@...n.ch, davem@...emloft.net, kuba@...nel.org, 
	pabeni@...hat.com, willemb@...gle.com, pkaligineedi@...gle.com, 
	jfraker@...gle.com, ziweixiao@...gle.com, thostet@...gle.com, 
	linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH net] gve: Check valid ts bit on RX descriptor before hw timestamping

On Tue, Oct 14, 2025 at 1:19 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> Eric Dumazet wrote:
> > On Mon, Oct 13, 2025 at 5:47 PM Harshitha Ramamurthy
> > <hramamurthy@...gle.com> wrote:
> > >
> > > From: Tim Hostetler <thostet@...gle.com>
> > >
> > > The device returns a valid bit in the LSB of the low timestamp byte in
> > > the completion descriptor that the driver should check before
> > > setting the SKB's hardware timestamp. If the timestamp is not valid, do not
> > > hardware timestamp the SKB.
> > >
> > > Cc: stable@...r.kernel.org
> > > Fixes: b2c7aeb49056 ("gve: Implement ndo_hwtstamp_get/set for RX timestamping")
> > > Reviewed-by: Joshua Washington <joshwash@...gle.com>
> > > Signed-off-by: Tim Hostetler <thostet@...gle.com>
> > > Signed-off-by: Harshitha Ramamurthy <hramamurthy@...gle.com>
> > > ---
> > >  drivers/net/ethernet/google/gve/gve.h          |  2 ++
> > >  drivers/net/ethernet/google/gve/gve_desc_dqo.h |  3 ++-
> > >  drivers/net/ethernet/google/gve/gve_rx_dqo.c   | 18 ++++++++++++------
> > >  3 files changed, 16 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
> > > index bceaf9b05cb4..4cc6dcbfd367 100644
> > > --- a/drivers/net/ethernet/google/gve/gve.h
> > > +++ b/drivers/net/ethernet/google/gve/gve.h
> > > @@ -100,6 +100,8 @@
> > >   */
> > >  #define GVE_DQO_QPL_ONDEMAND_ALLOC_THRESHOLD 96
> > >
> > > +#define GVE_DQO_RX_HWTSTAMP_VALID 0x1
> > > +
> > >  /* Each slot in the desc ring has a 1:1 mapping to a slot in the data ring */
> > >  struct gve_rx_desc_queue {
> > >         struct gve_rx_desc *desc_ring; /* the descriptor ring */
> > > diff --git a/drivers/net/ethernet/google/gve/gve_desc_dqo.h b/drivers/net/ethernet/google/gve/gve_desc_dqo.h
> > > index d17da841b5a0..f7786b03c744 100644
> > > --- a/drivers/net/ethernet/google/gve/gve_desc_dqo.h
> > > +++ b/drivers/net/ethernet/google/gve/gve_desc_dqo.h
> > > @@ -236,7 +236,8 @@ struct gve_rx_compl_desc_dqo {
> > >
> > >         u8 status_error1;
> > >
> > > -       __le16 reserved5;
> > > +       u8 reserved5;
> > > +       u8 ts_sub_nsecs_low;
> > >         __le16 buf_id; /* Buffer ID which was sent on the buffer queue. */
> > >
> > >         union {
> > > diff --git a/drivers/net/ethernet/google/gve/gve_rx_dqo.c b/drivers/net/ethernet/google/gve/gve_rx_dqo.c
> > > index 7380c2b7a2d8..02e25be8a50d 100644
> > > --- a/drivers/net/ethernet/google/gve/gve_rx_dqo.c
> > > +++ b/drivers/net/ethernet/google/gve/gve_rx_dqo.c
> > > @@ -456,14 +456,20 @@ static void gve_rx_skb_hash(struct sk_buff *skb,
> > >   * Note that this means if the time delta between packet reception and the last
> > >   * clock read is greater than ~2 seconds, this will provide invalid results.
> > >   */
> > > -static void gve_rx_skb_hwtstamp(struct gve_rx_ring *rx, u32 hwts)
> > > +static void gve_rx_skb_hwtstamp(struct gve_rx_ring *rx,
> > > +                               const struct gve_rx_compl_desc_dqo *desc)
> > >  {
> > >         u64 last_read = READ_ONCE(rx->gve->last_sync_nic_counter);
> > >         struct sk_buff *skb = rx->ctx.skb_head;
> > > -       u32 low = (u32)last_read;
> > > -       s32 diff = hwts - low;
> > > -
> > > -       skb_hwtstamps(skb)->hwtstamp = ns_to_ktime(last_read + diff);
> > > +       u32 ts, low;
> > > +       s32 diff;
> > > +
> > > +       if (desc->ts_sub_nsecs_low & GVE_DQO_RX_HWTSTAMP_VALID) {
> > > +               ts = le32_to_cpu(desc->ts);
> > > +               low = (u32)last_read;
> > > +               diff = ts - low;
> > > +               skb_hwtstamps(skb)->hwtstamp = ns_to_ktime(last_read + diff);
> > > +       }
> >
> > If (desc->ts_sub_nsecs_low & GVE_DQO_RX_HWTSTAMP_VALID) can vary among
> > all packets received on this queue,
> > I will suggest you add an
> >
> >         else {
> >                  skb_hwtstamps(skb)->hwtstamp = 0;
> >         }
> >
> > This is because napi_reuse_skb() does not currently clear this field.
> >
> > So if a prior skb had hwtstamp set to a timestamp, then merged in GRO,
> > and recycled, we have the risk of reusing an old timestamp
> > if GVE_DQO_RX_HWTSTAMP_VALID is unset.
> >
> > We could also handle this generically, at the cost of one extra
> > instruction in the fast path.
>
> That would be safest. This may not be limited to GVE.

Yes, I started counting the bugs yesterday.

But considering this is going to be a patch that is separate and could
be missed in various old kernels,
I think a belt and suspender mode is safer :

Make sure each individual driver does not assume the field is zero
(or add a check about this assertion, which is going to be more
expensive anyway)

>
> NICs supporting line rate timestamping is not universal. Older devices
> predominantly aim to support low rate PTP messages AFAIK.
>
> On the Tx path there are known rate limits to the number of packets
> that some can timestamp, e.g., because of using PHY registers.
>
> On the Rx path packets are selected by filters such as
> HWTSTAMP_FILTER_PTP_V2_L2_SYNC. But its not guaranteed that these can
> be matched and timestamped at any rate? Essentially trusting that no
> more PTP packets arrive than the device can timestamp.
>
> e1000e_rx_hwtstamp is a good example. It has a descriptor bit whether
> a packet was timestamped, similar to GVE. And only supports a single
> outstanding request:
>
>         /* The Rx time stamp registers contain the time stamp.  No other
>          * received packet will be time stamped until the Rx time stamp
>          * registers are read.  Because only one packet can be time stamped
>          * at a time, the register values must belong to this packet and
>          * therefore none of the other additional attributes need to be
>          * compared.
>          */
>
> Perhaps not the best example as it does not use napi_reuse_skb. I
> thought of too late ;) But there are quite likely more.
>

I took a look at various uses in RX path in drivers/net/ethernet, I
found many bugs

drivers/net/ethernet/amd/xgbe/xgbe-drv.c:2374
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c:1098
drivers/net/ethernet/broadcom/tg3.c:6898
drivers/net/ethernet/cavium/liquidio/lio_core.c
drivers/net/ethernet/freescale/enetc/enetc.c
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
drivers/net/ethernet/meta/fbnic/fbnic_txrx.c...

Then I stopped chasing.

I will send a fix in napi_reuse_skb()

diff --git a/net/core/gro.c b/net/core/gro.c
index 5ba4504cfd28ec26f487bfb96645e25c4845d720..76f9c3712422109ad00f15f6804abf6a8b00db43
100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -639,6 +639,8 @@ EXPORT_SYMBOL(gro_receive_skb);

 static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
 {
+       struct skb_shared_info *shinfo;
+
        if (unlikely(skb->pfmemalloc)) {
                consume_skb(skb);
                return;
@@ -655,8 +657,12 @@ static void napi_reuse_skb(struct napi_struct
*napi, struct sk_buff *skb)

        skb->encapsulation = 0;
        skb->ip_summed = CHECKSUM_NONE;
-       skb_shinfo(skb)->gso_type = 0;
-       skb_shinfo(skb)->gso_size = 0;
+
+       shinfo = skb_shinfo(skb);
+       shinfo->gso_type = 0;
+       shinfo->gso_size = 0;
+       shinfo->hwtstamps.hwtstamp = 0;
+
        if (unlikely(skb->slow_gro)) {
                skb_orphan(skb);
                skb_ext_reset(skb);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ