[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iLJi8n7=kmhN4hkF8SDr9uuiX8_4SVzx5Og_Mpn_RYGXA@mail.gmail.com>
Date: Wed, 16 Aug 2023 19:59:23 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Edward Cree <ecree.xilinx@...il.com>
Cc: edward.cree@....com, davem@...emloft.net, kuba@...nel.org,
pabeni@...hat.com, netdev@...r.kernel.org,
Martin Habets <habetsm.xilinx@...il.com>
Subject: Re: [RFC PATCH net] net-gro: restore check for NULL skb in napi_gro_frags
On Wed, Aug 16, 2023 at 7:46 PM Edward Cree <ecree.xilinx@...il.com> wrote:
>
> On 02/08/2023 11:22, Eric Dumazet wrote:
> > On Wed, Aug 2, 2023 at 11:42 AM <edward.cree@....com> wrote:
> >> An sfc customer has encountered this panic in the wild; we're still
> >> investigating exactly how it happened (we have a reproducer) but it
> >> seems wise to have the core handle this check rather than requiring
> >> it in every driver.
> >
> > An ethernet driver feeding non-ethernet packets to the upper stacks
> > seems weird to me,
> ...
> > Not sure why a napi_gro_frags() enabled driver would be allowed to
> > cook arbitrary packets with length < ETH_HLEN
> ...
> > Mixed feelings here
> Fwiw we now have some more understanding of what caused this: the
> device produced an RX descriptor with a zero in its buffer length
> field (there was actually data in the buffer, but a — we think —
> firmware bug caused the length to be stored in the wrong place).
> And the driver, blithely trusting the device, attached the RX page
> to the skb from napi_get_frags, passing the buffer length it had
> straight to skb_fill_page_desc without checking it. (After all,
> the device had told us through RX event flags that the packet was
> TCP, so it had to have seen a complete set of headers.)
> This certainly can be fixed in the driver, by adding a length
> check before calling napi_gro_frags(), but I see two reasons to
> put it in the core instead.
> 1) The same issue is likely to be present in any napi_gro_frags()
> driver; I've just looked at e1000 and mlx4 (as examples) and it
> looks like they could both be susceptible if hw misbehaved in a
> similar way.
Yet, it seems they do not misbehave ?
How XDP will react to such bug btw ?
I would vote to add the fix in a driver first.
If really the same disease seems to be common to more than one vendor,
perhaps we could harden core ?
Powered by blists - more mailing lists