netdev - Re: [PATCH v1 net] lan743x: fix return value for lan743x_tx_napi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <27c637a7-a988-1acb-b967-70f5719306a1@gmail.com>
Date:   Tue, 20 Nov 2018 14:11:38 -0800
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Bryan.Whitehead@...rochip.com, andrew@...n.ch
Cc:     davem@...emloft.net, netdev@...r.kernel.org,
        UNGLinuxDriver@...rochip.com
Subject: Re: [PATCH v1 net] lan743x: fix return value for lan743x_tx_napi_poll

On 11/20/18 1:39 PM, Bryan.Whitehead@...rochip.com wrote:
>> -----Original Message-----
>> From: Andrew Lunn <andrew@...n.ch>
>> Sent: Tuesday, November 20, 2018 2:31 PM
>> To: Bryan Whitehead - C21958 <Bryan.Whitehead@...rochip.com>
>> Cc: davem@...emloft.net; netdev@...r.kernel.org; UNGLinuxDriver
>> <UNGLinuxDriver@...rochip.com>
>> Subject: Re: [PATCH v1 net] lan743x: fix return value for
>> lan743x_tx_napi_poll
>>
>> On Tue, Nov 20, 2018 at 01:26:43PM -0500, Bryan Whitehead wrote:
>>> It has been noticed that under stress the lan743x driver will
>>> sometimes hang or cause a kernel panic. It has been noticed that
>>> returning '0' instead of 'weight' fixes this issue.
>>>
>>> fixes: rare kernel panic under heavy traffic load.
>>> Signed-off-by: Bryan Whitehead <Bryan.Whitehead@...rochip.com>
>>
>> Hi Bryan
>>
>> This sounds like a band aid over something which is broken, not a real fix.
>>
>> Can you show us the stack trace from the panic?
>>
>>     Andrew
> 
> Andrew,
> 
> Admittedly, my knowledge of what the kernel is doing behind the scenes is limited.
> 
> But according to documentation found on 
> https://wiki.linuxfoundation.org/networking/napi
> 
> It states the following
> "The poll() function may also process TX completions, in which case if it processes
> the entire TX ring then it should count that work as the rest of the budget.
> Otherwise, TX completions are not counted."
> 
> So based on that, the original driver was returning the full budget. But I was having
> Issues with it. And the above documentation seems to suggest that I could return 0
> As in "not counted" from above.
> 
> I tried it, and my lock up issues disappeared.
> 
> Regarding the kernel panic stack trace. So far its very hard to replicate that on the 
> latest kernel. I've seen it more frequently when back porting to older kernels such
> as 4.14, and 4.9. This same fix caused those kernel panics to disappear.
> Are you interested in seeing a stack dump from older kernels?
> 
> In the latest kernel the issue manifests as a kernel message which states
> "[  945.021101] enp48s0: Budget exhausted after napi rescheduled"
> 
> I'm not sure what that means. But it does not lock up immediately after seeing that
> Message. But it usually locks up with in a minute of seeing that message.
> 
> And the sometimes I get the following warning
> [ 1240.425020] ------------[ cut here ]------------
> [ 1240.426014] NETDEV WATCHDOG: enp0s25 (e1000e): transmit queue 0 timed out
> [ 1240.430027] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x1ef/0x200
> [ 1240.430027] Modules linked in: lan743x
> [ 1240.430027] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I       4.19.2 #1
> [ 1240.430027] Hardware name: Hewlett-Packard HP Compaq dc7900 Convertible Minitower/3032h, BIOS 786G1 v01.16 03/05/2009
> [ 1240.430027] RIP: 0010:dev_watchdog+0x1ef/0x200
> [ 1240.430027] Code: 00 48 63 4d e0 eb 93 4c 89 e7 c6 05 68 30 b3 00 01 e8 25 3d fd ff 89 d9 48 89 c2 4c 89 e6 48 c7 c7 98 92 48 ab e8 f1 28 87 ff <0f> 0b eb c0 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 48 c7 47 08 00
> [ 1240.430027] RSP: 0018:ffff98490be03e90 EFLAGS: 00010282
> [ 1240.430027] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1240.497168] RDX: 0000000000040400 RSI: 00000000000000f6 RDI: 0000000000000300
> [ 1240.497168] RBP: ffff984908574440 R08: 0000000000000000 R09: 00000000000003a4
> [ 1240.497168] R10: 0000000000000020 R11: ffffffffabc928ed R12: ffff984908574000
> [ 1240.497168] R13: 0000000000000000 R14: 0000000000000000 R15: ffff98490be195b0
> [ 1240.497168] FS:  0000000000000000(0000) GS:ffff98490be00000(0000) knlGS:0000000000000000
> [ 1240.497168] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1240.497168] CR2: 00007f31cd4c0000 CR3: 0000000109bca000 CR4: 00000000000406f0
> [ 1240.497168] Call Trace:
> [ 1240.497168]  <IRQ>
> [ 1240.497168]  ? qdisc_reset+0xe0/0xe0
> [ 1240.497168]  call_timer_fn+0x26/0x130
> [ 1240.497168]  run_timer_softirq+0x1cd/0x400
> [ 1240.497168]  ? hpet_interrupt_handler+0x10/0x30
> [ 1240.497168]  __do_softirq+0xed/0x2aa
> [ 1240.497168]  irq_exit+0xb7/0xc0
> [ 1240.497168]  do_IRQ+0x45/0xd0
> [ 1240.497168]  common_interrupt+0xf/0xf
> [ 1240.497168]  </IRQ>
> [ 1240.497168] RIP: 0010:cpuidle_enter_state+0xa6/0x330
> [ 1240.497168] Code: 65 8b 3d 1d b0 4d 55 e8 58 6a 95 ff 48 89 c3 66 66 66 66 90 31 ff e8 59 73 95 ff 80 7c 24 0b 00 0f 85 25 02 00 00 fb 4c 29 eb <48> ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 ea b8 ff
> [ 1240.497168] RSP: 0018:ffffffffab603e60 EFLAGS: 00000216 ORIG_RAX: ffffffffffffffde
> [ 1240.497168] RAX: ffff98490be20a80 RBX: 000000000081035c RCX: 00000120cf178c49
> [ 1240.497168] RDX: 00000120cf178ca0 RSI: 00000120cf178ca0 RDI: 0000000000000000
> [ 1240.497168] RBP: ffff984908fbd000 R08: fffffffb58ea5f9e R09: 000001208e0b48df
> [ 1240.497168] R10: 00000000000018c4 R11: 0000000000002468 R12: 0000000000000002
> [ 1240.497168] R13: 00000120ce968944 R14: ffffffffab6a68a0 R15: ffffffffab611740
> [ 1240.497168]  do_idle+0x1da/0x230
> [ 1240.497168]  cpu_startup_entry+0x6a/0x70
> [ 1240.497168]  start_kernel+0x4a2/0x4c2
> [ 1240.497168]  secondary_startup_64+0xa4/0xb0
> [ 1240.497168] ---[ end trace c6f3be34c214db4e ]---
> 
> Notice the warning is referring to a different adapter. So I suspect that whatever happened it froze
> All network adapters.
> 
> If you have suggestions let me know.

Did you look at the output of "perf top" or something along those lines
to figure out if your lan743x driver is indeed responsible for that by
not being scheduler friendly? What is likely happening is that you do
not reclaim "weight" packets and instead keep looping into NAPI context,
which prevents the system from making further progress.

Calling napi_complete_done() for the TX path is not necessary AFAICT,
what you really want to do is call napi_complete() and make sure you:

- reclaim/free as many TX buffers as possible, without looking at the
NAPI weight which becomes irrelevant
- if you have been able to reclaim enough descriptors, wake-up the
transmit queue

So ignoring the NAPI weight like you do is correct, but calling
napi_complete_done() with a 0 argument does not sound correct to me.
-- 
Florian