[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9c58567a-f490-2d3e-7262-ade3ddd55785@kapsi.fi>
Date: Mon, 24 May 2021 15:48:54 +0300
From: Mikko Perttunen <cyndis@...si.fi>
To: Jon Hunter <jonathanh@...dia.com>,
Michał Mirosław <mirq-linux@...e.qmqm.pl>
Cc: Giuseppe Cavallaro <peppe.cavallaro@...com>,
Alexandre Torgue <alexandre.torgue@...s.st.com>,
Jose Abreu <joabreu@...opsys.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
linux-tegra <linux-tegra@...r.kernel.org>,
Thierry Reding <treding@...dia.com>
Subject: Re: [BUG] net: stmmac: Panic observed in stmmac_napi_poll_rx()
On 5/17/21 1:39 PM, Jon Hunter wrote:
>
> On 14/05/2021 22:49, Michał Mirosław wrote:
>> On Fri, May 14, 2021 at 03:24:58PM +0100, Jon Hunter wrote:
>>> Hello!
>>>
>>> I have been looking into some random crashes that appear to stem from
>>> the stmmac_napi_poll_rx() function. There are two different panics I
>>> have observed which are ...
>> [...]
>>> The bug being triggered in skbuff.h is the following ...
>>>
>>> void *skb_pull(struct sk_buff *skb, unsigned int len);
>>> static inline void *__skb_pull(struct sk_buff *skb, unsigned int len)
>>> {
>>> skb->len -= len;
>>> BUG_ON(skb->len < skb->data_len);
>>> return skb->data += len;
>>> }
>>>
>>> Looking into the above panic triggered in skbuff.h, when this occurs
>>> I have noticed that the value of skb->data_len is unusually large ...
>>>
>>> __skb_pull: len 1500 (14), data_len 4294967274
>> [...]
>>
>> The big value looks suspiciously similar to (unsigned)-EINVAL.
>
> Yes it does and at first, I thought it was being set to -EINVAL.
> However, from tracing the length variables I can see that this is not
> the case.
>
>>> I then added some traces to stmmac_napi_poll_rx() and
>>> stmmac_rx_buf2_len() to trace the values of various various variables
>>> and when the problem occurs I see ...
>>>
>>> stmmac_napi_poll_rx: stmmac_rx: count 0, len 1518, buf1 66, buf2 1452
>>> stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 66, plen 1518
>>> stmmac_napi_poll_rx: stmmac_rx: count 1, len 1518, buf1 66, buf2 1452
>>> stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 66, plen 1536
>>> stmmac_napi_poll_rx: stmmac_rx: count 2, len 1602, buf1 66, buf2 1536
>>> stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 1602, plen 1518
>>> stmmac_napi_poll_rx: stmmac_rx: count 2, len 1518, buf1 0, buf2 4294967212
>>> stmmac_napi_poll_rx: stmmac_rx: dma_buf_sz 1536, buf1 0, buf2 4294967212
>>
>> And this one to (unsigned)-EILSEQ.
>
> Yes but this simply comes from 1518-1602 = -84. So it is purely
> coincidence.
>
> Jon
>
I dug around this a little bit. It looks like the issue occurs when we
get (pardon my terminology, I haven't dealt with networking stuff much)
a split packet.
What happens is we first process the first frame, growing 'len'.
buf1_len, I think, hits the "First descriptor, get split header length"
case and the length is 66. buf2_len hits the rx_not_ls case and the
length is 1536. In total 1602.
Then the condition 'likely(status & rx_not_ls)' passes and we goto back
to 'read_again', and read the next frame. Here we eventually get to
buf2_len again. stmmac_get_rx_frame_len returns 1518 for this frame
which sounds reasonable, that's what we normally get for non-split
frames. So what we get is 1518 - 1602 which overflows.
I can dig around a bit more but it would be nice if someone with a bit
more knowledge of the hardware could comment on the above.
Thanks,
Mikko
Powered by blists - more mailing lists