[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0b45c78f-3dcc-9946-a372-25428ce1c046@microchip.com>
Date: Mon, 11 Mar 2019 08:58:12 +0000
From: <Claudiu.Beznea@...rochip.com>
To: <harinik@...inx.com>, <krnl@...h.eu>, <netdev@...r.kernel.org>
CC: <davem@...emloft.net>, <michals@...inx.com>,
<Nicolas.Ferre@...rochip.com>, <harinikatakamlinux@...il.com>
Subject: Re: Cadence/macb ethernet driver bug on nonlinear skb buffers
Hi Klaus,
Thanks for reporting this.
On 09.03.2019 07:50, Harini Katakam wrote:
> Hi Klaus,
>
>> -----Original Message-----
>> From: Klaus Doth [mailto:krnl@...h.eu]
>> Sent: Friday, March 8, 2019 10:19 PM
>> To: netdev@...r.kernel.org
>> Cc: davem@...emloft.net; claudiu.beznea@...rochip.com; Harini Katakam
>> <harinik@...inx.com>; Michal Simek <michals@...inx.com>; Nicolas Ferre
>> <nicolas.ferre@...rochip.com>
>> Subject: Cadence/macb ethernet driver bug on nonlinear skb buffers
>>
>> Hi,
>>
>>
>> I think I found a bug in the cadence / macb ethernet driver.
>>
>> It seems the macb_pad_and_fcs function in macb_main.c does not handle cases
>> of fragmented/paged sk-buffers correctly, as sometimes a memmove and
>> afterwards skb_put_u8 is done on fragmented buffers.
I will also take a look at this. I though I have also tested theses scenario with
my pktgen tests. Sorry for the issue.
>> skb_put_u8 then fails as
>> it checks if the buffer is nonlinear.
>>
>>
>> My setup is a Xilinx ZynqMP using two macb ethernet ports, which are combined
>> in a bridge interface. As long as only those two interfaces are bridged,
>> everything works fine, but as soon as I add a wireless AP interface to it, and then
>> connect to the wireless interface using any WiFi enabled device, the kernel
>> panics with the message appended at the bottom of this email. I am currently
>> running Kernel 5.0.0-rc8, so this issue is in the current mainline kernel, and as far
>> as I can see also in the stable branch.
>>
>>
>> I did some debugging and traced the issue to the macb_pad_and_fcs function,
>> and it only occurs for fragmented sk-buffers.
>>
>> If I understand the code correctly, the buffer should not be moved by using
>> memmove and afterwards the free tailroom be used for FCS if the buffer is
>> fragmented. Instead the buffer should be copied, and thus combining the
>> fragmented buffer into one non-fragmented one, as skb_put_u8 does not work
>> on fragmented buffers. However as I am not too deep into kernel network
>> drivers, there may be a better solution, or I could have missed something
>> important.
>>
>>
>> Currently my system is running, after I changed the first line of static int
>> macb_pad_and_fcs(struct sk_buff **skb, struct net_device *ndev) from
>>
>> bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb);
>>
>> to
>>
>> bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb) ||
>> skb_is_nonlinear(*skb);
>>
>>
>> I.e. handle any nonlinear buffer as if it was cloned. Thus force the function into
>> copying the buffer for increasing its size.
>>
>>
>> Before the change, the kernel panicked after a few seconds of running data over
>> the network bridge, which could be reproduced every time this connection is
>> attempted. After the change it is running for over a day now continuously
>> without any issues, or any visible data loss.
>>
>
> Thanks for the debug.
> Yes, this is a bug - we recently noticed this on ZynqMP and temporarily disabled
> this functionality for fragmented packets before finding a clean solution.
> I noticed this error in one of the functional tests using pktgen.
>
> Regards,
> Harini
>
Powered by blists - more mailing lists