lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1809a34d-dcf4-4b54-089a-a7be3f4c23e1@intel.com>
Date:   Wed, 12 Apr 2023 15:30:38 -0700
From:   Jacob Keller <jacob.e.keller@...el.com>
To:     Kurt Kanzenbach <kurt@...utronix.de>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>,
        Tony Nguyen <anthony.l.nguyen@...el.com>
CC:     "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Vinicius Costa Gomes <vinicius.gomes@...el.com>,
        "Ong Boon Leong" <boon.leong.ong@...el.com>,
        <intel-wired-lan@...ts.osuosl.org>, <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next] igc: Avoid transmit queue timeout for XDP



On 4/12/2023 12:36 AM, Kurt Kanzenbach wrote:
> High XDP load triggers the netdev watchdog:
> 
> |NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out
> 
> The reason is the Tx queue transmission start (txq->trans_start) is not updated
> in XDP code path. Therefore, add it for all XDP transmission functions.
> 
> Signed-off-by: Kurt Kanzenbach <kurt@...utronix.de>

For Intel, I only see this being done in igb, as 5337824f4dc4 ("net:
annotate accesses to queue->trans_start"). I see a few other drivers
also calling this.

Is this a gap that other XDP implementations also need to fix?

grepping for txq_trans_cond_update I see:

> apm/xgene/xgene_enet_main.c
> 874:            txq_trans_cond_update(txq);
> 
> engleder/tsnep_main.c
> 623:            txq_trans_cond_update(tx_nq);
> 1660:           txq_trans_cond_update(nq);
> 
> freescale/dpaa/dpaa_eth.c
> 2347:   txq_trans_cond_update(txq);
> 2553:   txq_trans_cond_update(txq);
> 
> ibm/ibmvnic.c
> 2485:   txq_trans_cond_update(txq);
> 
> intel/igb/igb_main.c
> 2980:   txq_trans_cond_update(nq);
> 3014:   txq_trans_cond_update(nq);
> 
> stmicro/stmmac/stmmac_main.c
> 2428:   txq_trans_cond_update(nq);
> 4808:   txq_trans_cond_update(nq);
> 6436:   txq_trans_cond_update(nq);
> 

Is most driver's XDP implementation broken? There's also
netif_trans_update but this is called out as a legacy only function. Far
more drivers call this but I don't see either call or a direct update to
trans_start in many XDP implementations...

Am I missing something or are a bunch of other XDP implementations also
wrong?

The patch seems ok to me, assuming this is the correct way to fix things
and not something in the XDP path.

> ---
>  drivers/net/ethernet/intel/igc/igc_main.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index ba49728be919..e71e85e3bcc2 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -2384,6 +2384,8 @@ static int igc_xdp_xmit_back(struct igc_adapter *adapter, struct xdp_buff *xdp)
>  	nq = txring_txq(ring);
>  
>  	__netif_tx_lock(nq, cpu);
> +	/* Avoid transmit queue timeout since we share it with the slow path */
> +	txq_trans_cond_update(nq);
>  	res = igc_xdp_init_tx_descriptor(ring, xdpf);
>  	__netif_tx_unlock(nq);
>  	return res;
> @@ -2786,6 +2788,9 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
>  
>  	__netif_tx_lock(nq, cpu);
>  
> +	/* Avoid transmit queue timeout since we share it with the slow path */
> +	txq_trans_cond_update(nq);
> +
>  	budget = igc_desc_unused(ring);
>  
>  	while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
> @@ -6311,6 +6316,9 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
>  
>  	__netif_tx_lock(nq, cpu);
>  
> +	/* Avoid transmit queue timeout since we share it with the slow path */
> +	txq_trans_cond_update(nq);
> +
>  	drops = 0;
>  	for (i = 0; i < num_frames; i++) {
>  		int err;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ