lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 13 Apr 2023 09:20:57 +0200
From:   Kurt Kanzenbach <kurt@...utronix.de>
To:     Jacob Keller <jacob.e.keller@...el.com>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>,
        Tony Nguyen <anthony.l.nguyen@...el.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Vinicius Costa Gomes <vinicius.gomes@...el.com>,
        Ong Boon Leong <boon.leong.ong@...el.com>,
        intel-wired-lan@...ts.osuosl.org, netdev@...r.kernel.org
Subject: Re: [PATCH net-next] igc: Avoid transmit queue timeout for XDP

On Wed Apr 12 2023, Jacob Keller wrote:
> On 4/12/2023 12:36 AM, Kurt Kanzenbach wrote:
>> High XDP load triggers the netdev watchdog:
>> 
>> |NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out
>> 
>> The reason is the Tx queue transmission start (txq->trans_start) is not updated
>> in XDP code path. Therefore, add it for all XDP transmission functions.
>> 
>> Signed-off-by: Kurt Kanzenbach <kurt@...utronix.de>
>
> For Intel, I only see this being done in igb, as 5337824f4dc4 ("net:
> annotate accesses to queue->trans_start"). I see a few other drivers
> also calling this.
>
> Is this a gap that other XDP implementations also need to fix?
>
> grepping for txq_trans_cond_update I see:
>
>> apm/xgene/xgene_enet_main.c
>> 874:            txq_trans_cond_update(txq);
>> 
>> engleder/tsnep_main.c
>> 623:            txq_trans_cond_update(tx_nq);
>> 1660:           txq_trans_cond_update(nq);
>> 
>> freescale/dpaa/dpaa_eth.c
>> 2347:   txq_trans_cond_update(txq);
>> 2553:   txq_trans_cond_update(txq);
>> 
>> ibm/ibmvnic.c
>> 2485:   txq_trans_cond_update(txq);
>> 
>> intel/igb/igb_main.c
>> 2980:   txq_trans_cond_update(nq);
>> 3014:   txq_trans_cond_update(nq);
>> 
>> stmicro/stmmac/stmmac_main.c
>> 2428:   txq_trans_cond_update(nq);
>> 4808:   txq_trans_cond_update(nq);
>> 6436:   txq_trans_cond_update(nq);
>> 
>
> Is most driver's XDP implementation broken? There's also
> netif_trans_update but this is called out as a legacy only function. Far
> more drivers call this but I don't see either call or a direct update to
> trans_start in many XDP implementations...
>
> Am I missing something or are a bunch of other XDP implementations also
> wrong?
>
> The patch seems ok to me, assuming this is the correct way to fix things
> and not something in the XDP path.

AFAICT the netdev watchdog is only started when the device exposes
ndo_tx_timeout callback (see __netdev_watchdog_up()). For igc this
callback was introduced recently in 9b275176270e ("igc: Add
ndo_tx_timeout support"). My guess, as soon as the net device has
ndo_tx_timeout it needs to maintain trans_start for XDP?

Thanks,
Kurt

Download attachment "signature.asc" of type "application/pgp-signature" (862 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ