[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACKFLimD-bKmJ1tGZOLYRjWzEwxkri-Mw7iFme1x2Dr0twdCeg@mail.gmail.com>
Date: Tue, 11 Jul 2023 17:01:08 -0700
From: Michael Chan <michael.chan@...adcom.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
pabeni@...hat.com
Subject: Re: [PATCH net-next 3/3] eth: bnxt: handle invalid Tx completions
more gracefully
On Tue, Jul 11, 2023 at 1:00 AM Michael Chan <michael.chan@...adcom.com> wrote:
>
> On Mon, Jul 10, 2023 at 1:56 PM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > Invalid Tx completions should never happen (tm) but when they do
> > they crash the host, because driver blindly trusts that there is
> > a valid skb pointer on the ring.
> >
> > The completions I've seen appear to be some form of FW / HW
> > miscalculation or staleness, they have typical (small) values
> > (<100), but they are most often higher than number of queued
> > descriptors. They usually happen after boot.
> >
> > Instead of crashing print a warning and schedule a reset.
>
> It generally looks good to me. I have a few comments below.
>
> The logic is very similar to the bnapi->in_reset logic to reset due to
> RX errors. We have a counter for the number of times we do the RX
> reset so I think it might be good to add a similar TX reset counter.
Never mind about the counter. Since we are doing a complete reset,
the cpr structure will be freed anyway and the counter won't persist.
Later when we add support for per TX ring reset, we can add the
counter at that time.
>
> The XDP code path can potentially crash in a similar way if we get a
> bad completion from hardware. I'm not sure if we should add similar
> logic to the XDP code path.
>
> Thanks.
Download attachment "smime.p7s" of type "application/pkcs7-signature" (4209 bytes)
Powered by blists - more mailing lists