netdev - Re: [PATCH net-next v4 09/10] bnxt_en: Extend queue stop/start for TX rings

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACKFLi=jHfL2iAP-hVm=MmLDBD+wOOHrHsNNM21dCRAjRu7o7A@mail.gmail.com>
Date: Tue, 11 Feb 2025 18:31:21 -0800
From: Michael Chan <michael.chan@...adcom.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com, 
	pabeni@...hat.com, andrew+netdev@...n.ch, pavan.chebbi@...adcom.com, 
	andrew.gospodarek@...adcom.com, michal.swiatkowski@...ux.intel.com, 
	helgaas@...nel.org, horms@...nel.org, 
	Somnath Kotur <somnath.kotur@...adcom.com>, Ajit Khaparde <ajit.khaparde@...adcom.com>, 
	David Wei <dw@...idwei.uk>
Subject: Re: [PATCH net-next v4 09/10] bnxt_en: Extend queue stop/start for TX rings

On Tue, Feb 11, 2025 at 5:44 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Sat,  8 Feb 2025 12:29:15 -0800 Michael Chan wrote:
> > +             rc = bnxt_hwrm_cp_ring_alloc_p5(bp, txr->tx_cpr);
> > +             if (rc)
> > +                     return rc;
> > +
> > +             rc = bnxt_hwrm_tx_ring_alloc(bp, txr, false);
> > +             if (rc)
> > +                     return rc;
>
> Under what circumstances can these alloc calls fail?
> "alloc" sounds concerning in a start call.

The ring has been previously reserved with FW, so it normally should
not fail.  I'll need to ask the FW team for some possible failure
scenarios.

>
> > +             txr->tx_prod = 0;
> > +             txr->tx_cons = 0;
> > +             txr->tx_hw_cons = 0;
>
> >       cpr->sw_stats->rx.rx_resets++;
> >
> > +     if (bp->flags & BNXT_FLAG_SHARED_RINGS) {
> > +             cpr->sw_stats->tx.tx_resets++;
>
> Is there a reason why queue op stop/start cycles are counted as resets?
> IIUC previously only faults (~errors) would be counted as resets.
> ifdown / ifup or ring reconfig (ethtool -L / -G) would not increment
> resets. I think queue reconfig is more like ethtool -L than a fault.
> It'd be more consistent with existing code not to increment these
> counters.

I think David's original code increments the rx_reset counter for
every queue_start.  We're just following that.  Maybe it came from the
original plan to use HWRM_RING_RESET to do the RX
queue_stop/queue_start.  We can remove the reset counters for all
queue_stop/queue_start if that makes more sense.

>
> > +             rc = bnxt_tx_queue_start(bp, idx);
> > +             if (rc) {
> > +                     netdev_warn(bp->dev,
> > +                                 "tx queue restart failed: rc=%d\n", rc);
> > +                     bnapi->tx_fault = 1;
> > +                     goto err_reset;
> > +             }
> > +     }
> > +
> > +     napi_enable(&bnapi->napi);
>
> Here you first start the queue then enable NAPI...
>
> > +     bnxt_db_nq_arm(bp, &cpr->cp_db, cpr->cp_raw_cons);
> > +
> >       for (i = 0; i <= BNXT_VNIC_NTUPLE; i++) {
> >               vnic = &bp->vnic_info[i];
> >
>
> > @@ -15716,17 +15820,25 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
> >       /* Make sure NAPI sees that the VNIC is disabled */
> >       synchronize_net();
> >       rxr = &bp->rx_ring[idx];
> > -     cancel_work_sync(&rxr->bnapi->cp_ring.dim.work);
> > +     bnapi = rxr->bnapi;
> > +     cpr = &bnapi->cp_ring;
> > +     cancel_work_sync(&cpr->dim.work);
> >       bnxt_hwrm_rx_ring_free(bp, rxr, false);
> >       bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
> >       page_pool_disable_direct_recycling(rxr->page_pool);
> >       if (bnxt_separate_head_pool())
> >               page_pool_disable_direct_recycling(rxr->head_pool);
> >
> > +     if (bp->flags & BNXT_FLAG_SHARED_RINGS)
> > +             bnxt_tx_queue_stop(bp, idx);
> > +
> > +     napi_disable(&bnapi->napi);
>
> ... but here you do the opposite, and require extra synchronization
> in bnxt_tx_queue_stop() to set your magic flag, sync the NAPI etc.
> Why can't the start and stop paths be the mirror image?

The ring free operation requires interrupt/NAPI to be working.  FW
signals the completion of the ring free command on the completion ring
associated with the ring we're freeing.  When we see this completion
during NAPI, it guarantees that this is the last DMA on that ring.
Only ring free FW commands are handled this way, requiring NAPI.

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4209 bytes)