netdev - Re: [Intel-wired-lan] [PATCH iwl-net] idpf: extend tx watchdog timeout

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b6ed60cb-03a8-48fd-bffa-5a8efb0cd143@intel.com>
Date: Thu, 13 Jun 2024 12:16:40 +0200
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Josh Hay <joshua.a.hay@...el.com>
CC: David Decotigny <ddecotig@...il.com>, <intel-wired-lan@...ts.osuosl.org>,
	<netdev@...r.kernel.org>, Sridhar Samudrala <sridhar.samudrala@...el.com>
Subject: Re: [Intel-wired-lan] [PATCH iwl-net] idpf: extend tx watchdog
 timeout

From: Josh Hay <joshua.a.hay@...el.com>
Date: Wed, 12 Jun 2024 23:36:29 -0700

> 
> 
> On 6/12/2024 2:34 AM, Alexander Lobakin wrote:
>> From: Josh Hay <joshua.a.hay@...el.com>
>> Date: Tue, 11 Jun 2024 11:13:53 -0700
>>
>>>
>>>
>>> On 6/11/2024 3:44 AM, Alexander Lobakin wrote:
>>>> From: David Decotigny <ddecotig@...il.com>
>>>> Date: Tue, 4 Jun 2024 16:34:48 -0700
>>
>> [...]
>>
>>>> Note that there are several patches fixing Tx (incl. timeouts) in my
>>>> tree, including yours (Joshua's) which you somehow didn't send yet ._.
>>>> Maybe start from them first?
>>>
>>> I believe it was you who specifically asked our team to defer pushing
>>> any upstream patches while you were working on the XDP series "to avoid
>>> having to rebase", which was a reasonable request at the time. We also
>>
>> It was only related to the virtchnl refactoring and later I cancelled
>> that when I realized it will go earlier than our series.
>>
>>> had no reason to believe the existing upstream idpf implementation was
>>> experiencing timeouts (it is being tested by numerous validation teams).
>>> So there was no urgency to get those patches upstream. Which patches in
>>> your tree do you believe fix specific timeout situations? It appears you
>>
>> [0][1][2]
>>
>>> pulled in some of the changes from the out-of-tree driver, but those
>>> were all enhancements. It wasn't until the workload that David mentioned
>>
>> No, there are all fixes.
>>
>> [0] is your from the OOT, extended. > [1] is mine and never was in the
>> OOT.
>> [2] is your from the OOT, extended by Michał.
> 
> My main point was since no other tx timeouts have been reported on the
> upstream driver, none of these seem like critical fixes. This particular

Lots of bugs have never been reported, this doesn't mean they don't exist.

> timeout signature did not seem to match any of these in general. E.g. it
> would have been obvious if the timeout was because of what [0] fixes.
> It's also possible these timeouts did not manifest on the upstream
> driver because it is missing other OOT changes.

What I'm saying is that why not try to reproduce the issues that this
patch tries to hide on my tree with [0][1][2] first and see whether
they're still here? If they fix the issue, then why extend the timeout?

> 
>>
>> They really do help.
> 
> No disagreement there. I would've loved to push these changes sooner,
> but we already covered why that didn't happen.

Because "they're not critical"? Intel's been having Tx timeouts on idpf
for years. I'd say that at least [1] is critical since this is obvious
bug. Without [2], XDP just doesn't work on idpf.
Why isn't [0] critical since without this patch, you stash partial
packets? And in some cases, ntc/ntu can go beyond the ring size? Is it okay?

> 
>>
>> Note that there's one more Tx timeout patch from you in the OOT, but it
>> actually broke Tx xD
> 
> If you are implying that our team would commit code that is knowingly
> broken, that is absolutely not true. I believe what you're referring to

I didn't say that. I just said that I tried to pull your latest yet
another Tx timeout fix and with it, I have more timeouts than without.

> is a change that introduced a tx timeout that also took a very specific
> workload to trigger it. That issue was fixed and not applicable to the
> current upstream implementation, so I do not see how that is relevant to
> this conversation.

It was just a side note, don't focus too much on such instead of really
important stuff.

> 
>>
>>> was run on the current driver that we had any indication there were
>>> timeout issues.
>>>
>>>>
>>>> I don't buy 30 seconds, at least for now. Maybe I'm missing something.
>>>>
>>>> Nacked-by: Alexander Lobakin <aleksander.lobakin@...el.com>
>>>
>>>
>>> In the process of debugging the newly discovered timeout, our
>>> architecture team made it clear that 5 seconds is too low for this type
>>> of device, with a non deterministic pipeline where packets can take a
>>> number of exception/slow paths. Admittedly, we don't know the exact
>>
>> Slowpath which takes 30 seconds to complete, seriously?
> 
> The architecture team said 5s is too low. 30s was chosen to give ample
> cushion to avoid changing the timeout should this situation come up again.

Just "said", without any details why? Now I can say that 5 is too much
and we need to have 1 second there. Why believe them, not me, w/o any
arguments/explanation?

(I at least have an argument that usually packets get sent in a couple
 us or faster and even 1 second between triggering the sending and
 receiving a completion is too much and if that happens, there were
 some HW issues)

> 
>>
>>> number, so the solution for the time being was to bump it up with a
>>> comfortable buffer. As we tune things and debug with various workloads,
>>> we can bring it back down. As David mentioned, there is precedent for an
>>> extended timeout for smartnics. Why is it suddenly unacceptable for
>>> Intel's device?
>>
>> I don't know where this "suddenly" comes from.
>> Because even 5 seconds is too much.
>> HW usually send packets in microseconds if not faster. Extending the
>> timeout will hide real issues and make debugging more difficult.
> 
> Can you please elaborate on exactly why it would be more difficult? If
> something is actually wrong in HW, it seems unlikely extra time would
> correct it. If something is functionally wrong in the driver, why does
> it matter if it's 5s, 15s, or 30s? It will timeout just the same.

HW can hang and perform a reset and if you have 30 seconds for Tx
timeout, you can even not notice it. Having 5 seconds or lower for the
timeout will most likely spot it.

> 
>>
>> I suspect this all is for OOO packets with an explicit sending timestamp
>> passed from the userspace, but as I said, you need to teach the kernel
>> watchdog to account them.
> 
> Out of order completions can happen for numerous reasons, some of which
> the driver will know nothing about, i.e. the userspace timestamps are
> not the only things that trigger OOO completions. We can detect that
> we're processing completion B before A, but it's only at that time that
> we can tell the stack to _maybe_ expect a delayed completion. I'm open
> to discussing this further, but it does not seem like a simple solution
> that can be implemented in the immediate future.

I still didn't get an explanation why a packet can receive an OOO
completion in whole 30 seconds instead of a couple microseconds.
I also don't understand why we can receive an OOO completion for packets
that were send without an explicit timestamp from the userspace. Without
the timestamp (or some sort of priority etc.), they should be sent in
the same order as they were passed to the driver, right? If so, why a
packet can get an OOO completion, which means it was sent not in the
same order as the driver got them?
Does HW do some prioritizing even if the kernel didn't ask for it? But
even if so, HW shouldn't in general delay sending for 30 seconds?

> 
>> Otherwise, I can ask the driver to send a packet in 31 seconds, what
>> then, there will be a timeout and you will send a patch to extend it to
>> 60 seconds?
> 
> I hope there are checks in the stack itself that would not allow the
> packet to be scheduled beyond the timeout window :)

I don't think so, but since I haven't read the actual code, I won't say
anything particular. Anyway, it was just a side note, my main concerns
are above.

Thanks,
Olek