lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 12 Jun 2024 11:01:46 -0700
From: David Decotigny <ddecotig@...gle.com>
To: Josh Hay <joshua.a.hay@...el.com>
Cc: Alexander Lobakin <aleksander.lobakin@...el.com>, intel-wired-lan@...ts.osuosl.org, 
	netdev@...r.kernel.org, Sridhar Samudrala <sridhar.samudrala@...el.com>
Subject: Re: [Intel-wired-lan] [PATCH iwl-net] idpf: extend tx watchdog timeout

On Tue, Jun 11, 2024 at 11:13 AM Josh Hay <joshua.a.hay@...el.com> wrote:
>
>
>
> On 6/11/2024 3:44 AM, Alexander Lobakin wrote:
> > From: David Decotigny <ddecotig@...il.com>
> > Date: Tue, 4 Jun 2024 16:34:48 -0700
> >
> >>
> >>
> >> On 6/3/2024 11:47 AM, Joshua Hay wrote:
> >>>
> >>> There are several reasons for a TX completion to take longer than usual
> >>> to be written back by HW. For example, the completion for a packet that
> >>> misses a rule will have increased latency. The side effect of these
> >>> variable latencies for any given packet is out of order completions. The
> >>> stack sends packet X and Y. If packet X takes longer because of the rule
> >>> miss in the example above, but packet Y hits, it can go on the wire
> >>> immediately. Which also means it can be completed first.  The driver
> >>> will then receive a completion for packet Y before packet X.  The driver
> >>> will stash the buffers for packet X in a hash table to allow the tx send
> >>> queue descriptors for both packet X and Y to be reused. The driver will
> >>> receive the completion for packet X sometime later and have to search
> >>> the hash table for the associated packet.
> >>>
> >>> The driver cleans packets directly on the ring first, i.e. not out of
> >>> order completions since they are to some extent considered "slow(er)
> >>> path". However, certain workloads can increase the frequency of out of
> >>> order completions thus introducing even more latency into the cleaning
> >>> path. Bump up the timeout value to account for these workloads.
> >>>
> >>> Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration")
> >>> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@...el.com>
> >>> Signed-off-by: Joshua Hay <joshua.a.hay@...el.com>
> >>> ---
> >>>    drivers/net/ethernet/intel/idpf/idpf_lib.c | 4 ++--
> >>>    1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >>
> >> We tested this patch with our intensive high-performance workloads that
> >> have been able to reproduce the issue, and it was sufficient to avoid tx
> >> timeouts. We also noticed that these longer timeouts are not unusual in
> >> the smartnic space: we see 100s or 50s timeouts for a few NICs, and
> >
> > Example?

a sample:

drivers/net/ethernet/cavium/thunder/nic.h:#define
NICVF_TX_TIMEOUT                (50 * HZ)
drivers/net/ethernet/cavium/thunder/nicvf_main.c:
netdev->watchdog_timeo = NICVF_TX_TIMEOUT;
drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h:#define
OTX2_TX_TIMEOUT                (100 * HZ)
drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c:
netdev->watchdog_timeo = OTX2_TX_TIMEOUT;
drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c:
netdev->watchdog_timeo = OTX2_TX_TIMEOUT;
drivers/net/ethernet/amazon/ena/ena_netdev.c:
netdev->watchdog_timeo = msecs_to_jiffies(hints->netdev_wd_timeout);

> >
> >> other NICs receive this timeout as a hint from the fw.
> >>
> >> Reviewed-by: David Decotigny <ddecotig@...gle.com>
> >
> > We either need to teach watchdog core to account OOOs or there's
> > something really wrong in the driver. For example, how can we then
> > distinguish if > 5 sec delay happened just due to an OOO or due to that
> > something went bad with the HW?
> >
> > Note that there are several patches fixing Tx (incl. timeouts) in my
> > tree, including yours (Joshua's) which you somehow didn't send yet ._.
> > Maybe start from them first?
>
> I believe it was you who specifically asked our team to defer pushing
> any upstream patches while you were working on the XDP series "to avoid
> having to rebase", which was a reasonable request at the time. We also
> had no reason to believe the existing upstream idpf implementation was
> experiencing timeouts (it is being tested by numerous validation teams).
> So there was no urgency to get those patches upstream. Which patches in
> your tree do you believe fix specific timeout situations? It appears you
> pulled in some of the changes from the out-of-tree driver, but those
> were all enhancements. It wasn't until the workload that David mentioned
> was run on the current driver that we had any indication there were
> timeout issues.

Some issues were observed with high cpu/memory/network utilization
workloads such as a SAP HANA benchmark.

>
> >
> > I don't buy 30 seconds, at least for now. Maybe I'm missing something.
> >
> > Nacked-by: Alexander Lobakin <aleksander.lobakin@...el.com>
>
>
> In the process of debugging the newly discovered timeout, our
> architecture team made it clear that 5 seconds is too low for this type
> of device, with a non deterministic pipeline where packets can take a
> number of exception/slow paths. Admittedly, we don't know the exact
> number, so the solution for the time being was to bump it up with a
> comfortable buffer. As we tune things and debug with various workloads,
> we can bring it back down. As David mentioned, there is precedent for an
> extended timeout for smartnics. Why is it suddenly unacceptable for
> Intel's device?
>
> >
> > Thanks,
> > Olek
>
> Thanks,
> Josh
--
David Decotigny
--
David Decotigny

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ