netdev - Re: [Intel-wired-lan] [PATCH iwl-net] idpf: extend tx watchdog timeout

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ef8057dc-28f9-483a-9885-35879ad84b56@intel.com>
Date: Wed, 12 Jun 2024 11:34:43 +0200
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Josh Hay <joshua.a.hay@...el.com>
CC: David Decotigny <ddecotig@...il.com>, <intel-wired-lan@...ts.osuosl.org>,
	<netdev@...r.kernel.org>, Sridhar Samudrala <sridhar.samudrala@...el.com>
Subject: Re: [Intel-wired-lan] [PATCH iwl-net] idpf: extend tx watchdog
 timeout

From: Josh Hay <joshua.a.hay@...el.com>
Date: Tue, 11 Jun 2024 11:13:53 -0700

> 
> 
> On 6/11/2024 3:44 AM, Alexander Lobakin wrote:
>> From: David Decotigny <ddecotig@...il.com>
>> Date: Tue, 4 Jun 2024 16:34:48 -0700

[...]

>> Note that there are several patches fixing Tx (incl. timeouts) in my
>> tree, including yours (Joshua's) which you somehow didn't send yet ._.
>> Maybe start from them first?
> 
> I believe it was you who specifically asked our team to defer pushing
> any upstream patches while you were working on the XDP series "to avoid
> having to rebase", which was a reasonable request at the time. We also

It was only related to the virtchnl refactoring and later I cancelled
that when I realized it will go earlier than our series.

> had no reason to believe the existing upstream idpf implementation was
> experiencing timeouts (it is being tested by numerous validation teams).
> So there was no urgency to get those patches upstream. Which patches in
> your tree do you believe fix specific timeout situations? It appears you

[0][1][2]

> pulled in some of the changes from the out-of-tree driver, but those
> were all enhancements. It wasn't until the workload that David mentioned

No, there are all fixes.

[0] is your from the OOT, extended.
[1] is mine and never was in the OOT.
[2] is your from the OOT, extended by Michał.

They really do help.

Note that there's one more Tx timeout patch from you in the OOT, but it
actually broke Tx xD

> was run on the current driver that we had any indication there were
> timeout issues.
> 
>>
>> I don't buy 30 seconds, at least for now. Maybe I'm missing something.
>>
>> Nacked-by: Alexander Lobakin <aleksander.lobakin@...el.com>
> 
> 
> In the process of debugging the newly discovered timeout, our
> architecture team made it clear that 5 seconds is too low for this type
> of device, with a non deterministic pipeline where packets can take a
> number of exception/slow paths. Admittedly, we don't know the exact

Slowpath which takes 30 seconds to complete, seriously?

> number, so the solution for the time being was to bump it up with a
> comfortable buffer. As we tune things and debug with various workloads,
> we can bring it back down. As David mentioned, there is precedent for an
> extended timeout for smartnics. Why is it suddenly unacceptable for
> Intel's device?

I don't know where this "suddenly" comes from.
Because even 5 seconds is too much.
HW usually send packets in microseconds if not faster. Extending the
timeout will hide real issues and make debugging more difficult.

I suspect this all is for OOO packets with an explicit sending timestamp
passed from the userspace, but as I said, you need to teach the kernel
watchdog to account them.
Otherwise, I can ask the driver to send a packet in 31 seconds, what
then, there will be a timeout and you will send a patch to extend it to
60 seconds?

> 
>>
>> Thanks,
>> Olek
> 
> Thanks,
> Josh

[0] https://github.com/alobakin/linux/commit/aad547037598
[1] https://github.com/alobakin/linux/commit/50f4c27ba13e
[2] https://github.com/alobakin/linux/commit/4a9b6c5d0ee8

Thanks,
Olek