[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <641b439b-2bc0-4f2b-9871-b522e1141cd1@intel.com>
Date: Tue, 11 Jun 2024 11:13:53 -0700
From: Josh Hay <joshua.a.hay@...el.com>
To: Alexander Lobakin <aleksander.lobakin@...el.com>, David Decotigny
<ddecotig@...il.com>
CC: <intel-wired-lan@...ts.osuosl.org>, <netdev@...r.kernel.org>, "Sridhar
Samudrala" <sridhar.samudrala@...el.com>
Subject: Re: [Intel-wired-lan] [PATCH iwl-net] idpf: extend tx watchdog
timeout
On 6/11/2024 3:44 AM, Alexander Lobakin wrote:
> From: David Decotigny <ddecotig@...il.com>
> Date: Tue, 4 Jun 2024 16:34:48 -0700
>
>>
>>
>> On 6/3/2024 11:47 AM, Joshua Hay wrote:
>>>
>>> There are several reasons for a TX completion to take longer than usual
>>> to be written back by HW. For example, the completion for a packet that
>>> misses a rule will have increased latency. The side effect of these
>>> variable latencies for any given packet is out of order completions. The
>>> stack sends packet X and Y. If packet X takes longer because of the rule
>>> miss in the example above, but packet Y hits, it can go on the wire
>>> immediately. Which also means it can be completed first. The driver
>>> will then receive a completion for packet Y before packet X. The driver
>>> will stash the buffers for packet X in a hash table to allow the tx send
>>> queue descriptors for both packet X and Y to be reused. The driver will
>>> receive the completion for packet X sometime later and have to search
>>> the hash table for the associated packet.
>>>
>>> The driver cleans packets directly on the ring first, i.e. not out of
>>> order completions since they are to some extent considered "slow(er)
>>> path". However, certain workloads can increase the frequency of out of
>>> order completions thus introducing even more latency into the cleaning
>>> path. Bump up the timeout value to account for these workloads.
>>>
>>> Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration")
>>> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@...el.com>
>>> Signed-off-by: Joshua Hay <joshua.a.hay@...el.com>
>>> ---
>>> drivers/net/ethernet/intel/idpf/idpf_lib.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>>
>> We tested this patch with our intensive high-performance workloads that
>> have been able to reproduce the issue, and it was sufficient to avoid tx
>> timeouts. We also noticed that these longer timeouts are not unusual in
>> the smartnic space: we see 100s or 50s timeouts for a few NICs, and
>
> Example?
>
>> other NICs receive this timeout as a hint from the fw.
>>
>> Reviewed-by: David Decotigny <ddecotig@...gle.com>
>
> We either need to teach watchdog core to account OOOs or there's
> something really wrong in the driver. For example, how can we then
> distinguish if > 5 sec delay happened just due to an OOO or due to that
> something went bad with the HW?
>
> Note that there are several patches fixing Tx (incl. timeouts) in my
> tree, including yours (Joshua's) which you somehow didn't send yet ._.
> Maybe start from them first?
I believe it was you who specifically asked our team to defer pushing
any upstream patches while you were working on the XDP series "to avoid
having to rebase", which was a reasonable request at the time. We also
had no reason to believe the existing upstream idpf implementation was
experiencing timeouts (it is being tested by numerous validation teams).
So there was no urgency to get those patches upstream. Which patches in
your tree do you believe fix specific timeout situations? It appears you
pulled in some of the changes from the out-of-tree driver, but those
were all enhancements. It wasn't until the workload that David mentioned
was run on the current driver that we had any indication there were
timeout issues.
>
> I don't buy 30 seconds, at least for now. Maybe I'm missing something.
>
> Nacked-by: Alexander Lobakin <aleksander.lobakin@...el.com>
In the process of debugging the newly discovered timeout, our
architecture team made it clear that 5 seconds is too low for this type
of device, with a non deterministic pipeline where packets can take a
number of exception/slow paths. Admittedly, we don't know the exact
number, so the solution for the time being was to bump it up with a
comfortable buffer. As we tune things and debug with various workloads,
we can bring it back down. As David mentioned, there is precedent for an
extended timeout for smartnics. Why is it suddenly unacceptable for
Intel's device?
>
> Thanks,
> Olek
Thanks,
Josh
Powered by blists - more mailing lists