netdev - Re: [PATCH net-next v3 3/5] eth: fbnic: add RX packets timestamping support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <fbf9e403-d21c-4652-a48a-427d9942b82e@linux.dev>
Date: Tue, 8 Oct 2024 18:13:06 +0100
From: Vadim Fedorenko <vadim.fedorenko@...ux.dev>
To: Jacob Keller <jacob.e.keller@...el.com>, Jakub Kicinski
 <kuba@...nel.org>, David Ahern <dsahern@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, "David S. Miller" <davem@...emloft.net>,
 Alexander Duyck <alexanderduyck@...com>
Cc: netdev@...r.kernel.org, Richard Cochran <richardcochran@...il.com>
Subject: Re: [PATCH net-next v3 3/5] eth: fbnic: add RX packets timestamping
 support

On 08/10/2024 18:01, Jacob Keller wrote:
> 
> 
> On 10/8/2024 9:47 AM, Vadim Fedorenko wrote:
>> On 05/10/2024 00:14, Jacob Keller wrote:
>>>
>>>
>>> On 10/3/2024 5:39 AM, Vadim Fedorenko wrote:
>>>> Add callbacks to support timestamping configuration via ethtool.
>>>> Add processing of RX timestamps.
>>>>
>>>> Signed-off-by: Vadim Fedorenko <vadfed@...a.com>
>>>>    
>>>> +/**
>>>> + * fbnic_ts40_to_ns() - convert descriptor timestamp to PHC time
>>>> + * @fbn: netdev priv of the FB NIC
>>>> + * @ts40: timestamp read from a descriptor
>>>> + *
>>>> + * Return: u64 value of PHC time in nanoseconds
>>>> + *
>>>> + * Convert truncated 40 bit device timestamp as read from a descriptor
>>>> + * to the full PHC time in nanoseconds.
>>>> + */
>>>> +static __maybe_unused u64 fbnic_ts40_to_ns(struct fbnic_net *fbn, u64 ts40)
>>>> +{
>>>> +	unsigned int s;
>>>> +	u64 time_ns;
>>>> +	s64 offset;
>>>> +	u8 ts_top;
>>>> +	u32 high;
>>>> +
>>>> +	do {
>>>> +		s = u64_stats_fetch_begin(&fbn->time_seq);
>>>> +		offset = READ_ONCE(fbn->time_offset);
>>>> +	} while (u64_stats_fetch_retry(&fbn->time_seq, s));
>>>> +
>>>> +	high = READ_ONCE(fbn->time_high);
>>>> +
>>>> +	/* Bits 63..40 from periodic clock reads, 39..0 from ts40 */
>>>> +	time_ns = (u64)(high >> 8) << 40 | ts40;
>>>> +
>>>> +	/* Compare bits 32-39 between periodic reads and ts40,
>>>> +	 * see if HW clock may have wrapped since last read
>>>> +	 */
>>>> +	ts_top = ts40 >> 32;
>>>> +	if (ts_top < (u8)high && (u8)high - ts_top > U8_MAX / 2)
>>>> +		time_ns += 1ULL << 40;
>>>> +
>>>> +	return time_ns + offset;
>>>> +}
>>>> +
>>>
>>> This logic doesn't seem to match the logic used by the cyclecounter
>>> code, and Its not clear to me if this safe against a race between
>>> time_high updating and the packet timestamp arriving.
>>>
>>> the timestamp could arrive either before or after the time_high update,
>>> and the logic needs to ensure the appropriate high bits are applied in
>>> both cases.
>>
>> To avoid this race condition we decided to make sure that incoming
>> timestamps are always later then cached high bits. That will make the
>> logic above correct.
>>
> 
> How do you do that? Timestamps come in asynchronously. The value is
> captured by hardware. How do you guarantee that it was captured only
> after an update to the cached high bits?
> 
> I guess if it arrives before the roll-over, you handle that by the range
> check to see if the clock wrapped around.
> 
> Hmm.
> 
> But what happens if an Rx timestamp races with an update to the high
> value and comes in just before the 40 bit time would have overflowed,
> but the cached time_high value is captured just after it overflowed.
> 
> Do you have some mechanism to ensure that this is impossible? i.e.
> either ensuring that the conversion uses the old time_high value, or
> ensuring that Rx timestamps can't come in during an update?
> 
> Otherwise, I think the logic here could accidentally combine a time
> value from an Rx timestamp that is just prior to the time_high update
> and just prior to a rollover, then it would see a huge gap between the
> values and trigger the addition of another 1 << 40, which would cycle it
> even farther out of what the real value should have been.

Yes, you are absolutely correct, we have missed the situation when the
logic can bring additional (1 << 40) value on top of wrongly calculated
higher bits. This can only happen in case of overflow of lower 8 bits of
high cached value. But if we keep high cached value a bit below the real
value, this will never happen. If we subtract 16 from high value it will
translate into ~1 minute of oldness of cached value. If for any reasons
the packet processing will be delayed by 1 minute, user-space app will
definitely give up on waiting for the packet/timestamp and will ignore
wrong timestamp. In all other cases the logic in fbnic_ts40_to_ns() will
work perfectly fine.

>>> Again, I think your use case makes sense to just implement with a
>>> timecounter and cyclecounter, since you're not modifying the hardware
>>> cycle counter and are leaving it as free-running.
>>
>> After discussion with Jakub we decided to keep simple logic without
>> timecounter + cyclecounter, as it's pretty straight forward.
> 
> Fair enough.