[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9b34d60d-8de7-5384-3822-98ec79d53e04@gmail.com>
Date: Fri, 15 Mar 2019 21:26:08 +0100
From: Heiner Kallweit <hkallweit1@...il.com>
To: VDR User <user.vdr@...il.com>,
Alexander Duyck <alexander.h.duyck@...ux.intel.com>
Cc: netdev@...r.kernel.org
Subject: Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
On 15.03.2019 21:09, VDR User wrote:
>>>>> Thanks for the additional info and for testing 4.20.15.
>>>>> To rule out that the issue is caused by a regression in network or
>>>>> some other subsystem: Can you take the r8169.c from 4.20.15 and test
>>>>> it on top of 5.0?
>>>>> Meanwhile I'll look at the changes in the driver between 4.20 and 5.0.
>>>>
>>>> Sure, no problem! I'll copy the driver & recompile now actually.
>>>> Hopefully there aren't a ton of changes to r8169.c to sift through and
>>>> the cause isn't good at hiding itself!
>>>>
>>> I checked the driver changes new in 5.0 and there are very few
>>> functional changes. You could try to revert the following:
>>>
>>> 5317d5c6d47e ("r8169: use napi_consume_skb where possible")
>>
>> Will do, and fwiw, while I haven't been able to do tons of testing
>> today, I haven't been able to trigger the crash after replacing
>> 5.0.0's r8169.c with 4.20.15's r8169.c this morning. I'll restore the
>> file and revert the change you mentioned, and report back my findings.
>
> Heiner,
>
> After going back to vanilla kernel 5.0 and then reverting 5317d5c6d47e
> ("r8169: use napi_consume_skb where possible"), I so far have not had
> any crashes after transferring roughly 30GB back & forth. I'm not
> completely confident yet the crash is resolve with that revert and
> will continue to do further testing throughout the weekend as well.
> What confidence level do you have that 5317d5c6d47e is the culprit at
> this point?
>
Good, thanks for testing. I simply see no other change since 4.20 that
could cause these symptoms.
Using napi_consume_skb() at this place in r8169.c looks safe to me.
Option 1 is that I miss something, option 2 is that there's an issue
in the NAPI subsystem. However in the latter case I assume at least
the Mellanox and/or Intel guys would have observed the same issue
on their respective CI systems.
Let me add Alexander, maybe he can provide a hint before we go and
revert the change.
> Thanks,
> Derek
>
Heiner
Powered by blists - more mailing lists