netdev - Re: [PATCH net-next v4 09/10] bnxt_en: Extend queue stop/start for TX rings

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250211184305.2605e4fb@kernel.org>
Date: Tue, 11 Feb 2025 18:43:05 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Michael Chan <michael.chan@...adcom.com>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
 pabeni@...hat.com, andrew+netdev@...n.ch, pavan.chebbi@...adcom.com,
 andrew.gospodarek@...adcom.com, michal.swiatkowski@...ux.intel.com,
 helgaas@...nel.org, horms@...nel.org, Somnath Kotur
 <somnath.kotur@...adcom.com>, Ajit Khaparde <ajit.khaparde@...adcom.com>,
 David Wei <dw@...idwei.uk>
Subject: Re: [PATCH net-next v4 09/10] bnxt_en: Extend queue stop/start for
 TX rings

On Tue, 11 Feb 2025 18:31:21 -0800 Michael Chan wrote:
> On Tue, Feb 11, 2025 at 5:44 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > On Sat,  8 Feb 2025 12:29:15 -0800 Michael Chan wrote:  
> > > +             rc = bnxt_hwrm_cp_ring_alloc_p5(bp, txr->tx_cpr);
> > > +             if (rc)
> > > +                     return rc;
> > > +
> > > +             rc = bnxt_hwrm_tx_ring_alloc(bp, txr, false);
> > > +             if (rc)
> > > +                     return rc;  
> >
> > Under what circumstances can these alloc calls fail?
> > "alloc" sounds concerning in a start call.  
> 
> The ring has been previously reserved with FW, so it normally should
> not fail.  I'll need to ask the FW team for some possible failure
> scenarios.

Thanks, expectation is that start never fails.
If the FW team comes back with "should never happen if rings 
are reserved" please add a comment to that effect here. Since
this is one of very few implementations people may read it
and incorrectly assume that allocating is okay.
If the FW team comes back with a list of possible but unlikely
scenarios I'm afraid a rework will be needed.

> > >       cpr->sw_stats->rx.rx_resets++;
> > >
> > > +     if (bp->flags & BNXT_FLAG_SHARED_RINGS) {
> > > +             cpr->sw_stats->tx.tx_resets++;  
> >
> > Is there a reason why queue op stop/start cycles are counted as resets?
> > IIUC previously only faults (~errors) would be counted as resets.
> > ifdown / ifup or ring reconfig (ethtool -L / -G) would not increment
> > resets. I think queue reconfig is more like ethtool -L than a fault.
> > It'd be more consistent with existing code not to increment these
> > counters.  
> 
> I think David's original code increments the rx_reset counter for
> every queue_start.  We're just following that.  Maybe it came from the
> original plan to use HWRM_RING_RESET to do the RX
> queue_stop/queue_start.  We can remove the reset counters for all
> queue_stop/queue_start if that makes more sense.

I vote remove, just to be crystal clear.

> > > @@ -15716,17 +15820,25 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
> > >       /* Make sure NAPI sees that the VNIC is disabled */
> > >       synchronize_net();
> > >       rxr = &bp->rx_ring[idx];
> > > -     cancel_work_sync(&rxr->bnapi->cp_ring.dim.work);
> > > +     bnapi = rxr->bnapi;
> > > +     cpr = &bnapi->cp_ring;
> > > +     cancel_work_sync(&cpr->dim.work);
> > >       bnxt_hwrm_rx_ring_free(bp, rxr, false);
> > >       bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
> > >       page_pool_disable_direct_recycling(rxr->page_pool);
> > >       if (bnxt_separate_head_pool())
> > >               page_pool_disable_direct_recycling(rxr->head_pool);
> > >
> > > +     if (bp->flags & BNXT_FLAG_SHARED_RINGS)
> > > +             bnxt_tx_queue_stop(bp, idx);
> > > +
> > > +     napi_disable(&bnapi->napi);  
> >
> > ... but here you do the opposite, and require extra synchronization
> > in bnxt_tx_queue_stop() to set your magic flag, sync the NAPI etc.
> > Why can't the start and stop paths be the mirror image?  
> 
> The ring free operation requires interrupt/NAPI to be working.  FW
> signals the completion of the ring free command on the completion ring
> associated with the ring we're freeing.  When we see this completion
> during NAPI, it guarantees that this is the last DMA on that ring.
> Only ring free FW commands are handled this way, requiring NAPI.

Ugh, I feel like this was explained to me before, sorry.
Again, a comment in the code would go a long way for non-Broadcom
readers.