netdev - Re: [BUG] net: stmmac: Panic observed in stmmac_napi_poll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <9c58567a-f490-2d3e-7262-ade3ddd55785@kapsi.fi>
Date:   Mon, 24 May 2021 15:48:54 +0300
From:   Mikko Perttunen <cyndis@...si.fi>
To:     Jon Hunter <jonathanh@...dia.com>,
        Michał Mirosław <mirq-linux@...e.qmqm.pl>
Cc:     Giuseppe Cavallaro <peppe.cavallaro@...com>,
        Alexandre Torgue <alexandre.torgue@...s.st.com>,
        Jose Abreu <joabreu@...opsys.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        linux-tegra <linux-tegra@...r.kernel.org>,
        Thierry Reding <treding@...dia.com>
Subject: Re: [BUG] net: stmmac: Panic observed in stmmac_napi_poll_rx()

On 5/17/21 1:39 PM, Jon Hunter wrote:
> 
> On 14/05/2021 22:49, Michał Mirosław wrote:
>> On Fri, May 14, 2021 at 03:24:58PM +0100, Jon Hunter wrote:
>>> Hello!
>>>
>>> I have been looking into some random crashes that appear to stem from
>>> the stmmac_napi_poll_rx() function. There are two different panics I
>>> have observed which are ...
>> [...]
>>> The bug being triggered in skbuff.h is the following ...
>>>
>>>   void *skb_pull(struct sk_buff *skb, unsigned int len);
>>>   static inline void *__skb_pull(struct sk_buff *skb, unsigned int len)
>>>   {
>>>           skb->len -= len;
>>>           BUG_ON(skb->len < skb->data_len);
>>>           return skb->data += len;
>>>   }
>>>
>>> Looking into the above panic triggered in skbuff.h, when this occurs
>>> I have noticed that the value of skb->data_len is unusually large ...
>>>
>>>   __skb_pull: len 1500 (14), data_len 4294967274
>> [...]
>>
>> The big value looks suspiciously similar to (unsigned)-EINVAL.
> 
> Yes it does and at first, I thought it was being set to -EINVAL.
> However, from tracing the length variables I can see that this is not
> the case.
> 
>>> I then added some traces to stmmac_napi_poll_rx() and
>>> stmmac_rx_buf2_len() to trace the values of various various variables
>>> and when the problem occurs I see ...
>>>
>>>   stmmac_napi_poll_rx: stmmac_rx: count 0, len 1518, buf1 66, buf2 1452
>>>   stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 66, plen 1518
>>>   stmmac_napi_poll_rx: stmmac_rx: count 1, len 1518, buf1 66, buf2 1452
>>>   stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 66, plen 1536
>>>   stmmac_napi_poll_rx: stmmac_rx: count 2, len 1602, buf1 66, buf2 1536
>>>   stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 1602, plen 1518
>>>   stmmac_napi_poll_rx: stmmac_rx: count 2, len 1518, buf1 0, buf2 4294967212
>>>   stmmac_napi_poll_rx: stmmac_rx: dma_buf_sz 1536, buf1 0, buf2 4294967212
>>
>> And this one to (unsigned)-EILSEQ.
> 
> Yes but this simply comes from 1518-1602 = -84. So it is purely
> coincidence.
> 
> Jon
> 

I dug around this a little bit. It looks like the issue occurs when we 
get (pardon my terminology, I haven't dealt with networking stuff much) 
a split packet.

What happens is we first process the first frame, growing 'len'. 
buf1_len, I think, hits the "First descriptor, get split header length" 
case and the length is 66. buf2_len hits the rx_not_ls case and the 
length is 1536. In total 1602.

Then the condition 'likely(status & rx_not_ls)' passes and we goto back 
to 'read_again', and read the next frame. Here we eventually get to 
buf2_len again. stmmac_get_rx_frame_len returns 1518 for this frame 
which sounds reasonable, that's what we normally get for non-split 
frames. So what we get is 1518 - 1602 which overflows.

I can dig around a bit more but it would be nice if someone with a bit 
more knowledge of the hardware could comment on the above.

Thanks,
Mikko