linux-kernel - Re: [PATCH net-next v2 6/8] gve: Add rx hardware timestamp expansion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <abf16cc2-c350-430d-a2fd-2a8bedef9f34@linux.dev>
Date: Tue, 20 May 2025 20:53:32 +0100
From: Vadim Fedorenko <vadim.fedorenko@...ux.dev>
To: Ziwei Xiao <ziweixiao@...gle.com>
Cc: Harshitha Ramamurthy <hramamurthy@...gle.com>, netdev@...r.kernel.org,
 davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
 pabeni@...hat.com, jeroendb@...gle.com, andrew+netdev@...n.ch,
 willemb@...gle.com, pkaligineedi@...gle.com, yyd@...gle.com,
 joshwash@...gle.com, shailend@...gle.com, linux@...blig.org,
 thostet@...gle.com, jfraker@...gle.com, richardcochran@...il.com,
 jdamato@...tly.com, horms@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v2 6/8] gve: Add rx hardware timestamp expansion

On 19.05.2025 19:45, Ziwei Xiao wrote:
> .
> 
> 
> On Sun, May 18, 2025 at 2:45 PM Vadim Fedorenko
> <vadim.fedorenko@...ux.dev> wrote:
>>
>> On 17.05.2025 01:11, Harshitha Ramamurthy wrote:
>>> From: John Fraker <jfraker@...gle.com>
>>>
>>> Allow the rx path to recover the high 32 bits of the full 64 bit rx
>>> timestamp.
>>>
>>> Use the low 32 bits of the last synced nic time and the 32 bits of the
>>> timestamp provided in the rx descriptor to generate a difference, which
>>> is then applied to the last synced nic time to reconstruct the complete
>>> 64-bit timestamp.
>>>
>>> This scheme remains accurate as long as no more than ~2 seconds have
>>> passed between the last read of the nic clock and the timestamping
>>> application of the received packet.
>>>
>>> Signed-off-by: John Fraker <jfraker@...gle.com>
>>> Signed-off-by: Ziwei Xiao <ziweixiao@...gle.com>
>>> Reviewed-by: Willem de Bruijn <willemb@...gle.com>
>>> Signed-off-by: Harshitha Ramamurthy <hramamurthy@...gle.com>
>>> ---
>>>    Changes in v2:
>>>    - Add the missing READ_ONCE (Joe Damato)
>>> ---
>>>    drivers/net/ethernet/google/gve/gve_rx_dqo.c | 23 ++++++++++++++++++++
>>>    1 file changed, 23 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/google/gve/gve_rx_dqo.c b/drivers/net/ethernet/google/gve/gve_rx_dqo.c
>>> index dcb0545baa50..c03c3741e0d4 100644
>>> --- a/drivers/net/ethernet/google/gve/gve_rx_dqo.c
>>> +++ b/drivers/net/ethernet/google/gve/gve_rx_dqo.c
>>> @@ -437,6 +437,29 @@ static void gve_rx_skb_hash(struct sk_buff *skb,
>>>        skb_set_hash(skb, le32_to_cpu(compl_desc->hash), hash_type);
>>>    }
>>>
>>> +/* Expand the hardware timestamp to the full 64 bits of width, and add it to the
>>> + * skb.
>>> + *
>>> + * This algorithm works by using the passed hardware timestamp to generate a
>>> + * diff relative to the last read of the nic clock. This diff can be positive or
>>> + * negative, as it is possible that we have read the clock more recently than
>>> + * the hardware has received this packet. To detect this, we use the high bit of
>>> + * the diff, and assume that the read is more recent if the high bit is set. In
>>> + * this case we invert the process.
>>> + *
>>> + * Note that this means if the time delta between packet reception and the last
>>> + * clock read is greater than ~2 seconds, this will provide invalid results.
>>> + */
>>> +static void __maybe_unused gve_rx_skb_hwtstamp(struct gve_rx_ring *rx, u32 hwts)
>>> +{
>>> +     s64 last_read = READ_ONCE(rx->gve->last_sync_nic_counter);
>>
>> I believe last_read should be u64 as last_sync_nic_counter is u64 and
>> ns_to_ktime expects u64.
>>
> Thanks for the suggestion. The reason to choose s64 for `last_read`
> here is to use signed addition explicitly with `last_read +
> (s32)diff`. This allows diff (which can be positive or negative,
> depending on whether hwts is ahead of or behind low(last_read)) to
> directly adjust last_read without a conditional branch, which makes
> the intent clear IMO. The s64 nanosecond value is not at risk of
> overflow, and the positive s64 result is then safely converted to u64
> for ns_to_ktime.
> 
> I'm happy to change last_read to u64 if that's preferred for type
> consistency, or I can add a comment to clarify the rationale for the
> current s64 approach. Please let me know what you think. Thanks!

I didn't get where is the conditional branch expected? AFAIU, you can do
direct addition u64 + s32 without any branches. The assembly is also pretty
clean in this case (used simplified piece of code):

         movl    -12(%rbp), %eax
         movslq  %eax, %rdx
         movq    -8(%rbp), %rax
         addq    %rax, %rdx


> 
>>> +     struct sk_buff *skb = rx->ctx.skb_head;
>>> +     u32 low = (u32)last_read;
>>> +     s32 diff = hwts - low;
>>> +
>>> +     skb_hwtstamps(skb)->hwtstamp = ns_to_ktime(last_read + diff);
>>> +}
>>> +
>>>    static void gve_rx_free_skb(struct napi_struct *napi, struct gve_rx_ring *rx)
>>>    {
>>>        if (!rx->ctx.skb_head)
>>