lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <93AF473E2DA327428DE3D46B72B1E9FD411F1362@CHN-SV-EXMX02.mchp-main.com>
Date:   Wed, 21 Nov 2018 02:13:30 +0000
From:   <Tristram.Ha@...rochip.com>
To:     <Bryan.Whitehead@...rochip.com>
CC:     <davem@...emloft.net>, <netdev@...r.kernel.org>,
        <UNGLinuxDriver@...rochip.com>, <andrew@...n.ch>,
        <f.fainelli@...il.com>
Subject: RE: [PATCH v1 net] lan743x: fix return value for
 lan743x_tx_napi_poll

Slightly out of topic I am not sure why NAPI is used on the transmit side.
Originally NAPI was designed to fix the receive interrupt happening on each
receive frame problem, so on transmit side it is to avoid the transmit
done interrupt on each transmit frame?  Typically hardware has a way
to trigger transmit done interrupt or not in each transmit frame.

NAPI may have other uses in newer kernels that I am not aware of.

I notice 2 problems in the driver:

1. netif_napi_add is used instead of netif_tx_napi_add.
2. In all other drivers that use netif_tx_napi_add most do not call napi_complete_done.
They all call napi_complete directly and return 0.

freescale/gianfar.c
rocker/rocker_main.c
ti/cpsw.c

virtio_net.c does use napi_complete_done but it also passes 0 as a parameter.


> -----Original Message-----
> From: Florian Fainelli <f.fainelli@...il.com>
> Sent: Tuesday, November 20, 2018 2:12 PM
> To: Bryan Whitehead - C21958 <Bryan.Whitehead@...rochip.com>;
> andrew@...n.ch
> Cc: davem@...emloft.net; netdev@...r.kernel.org; UNGLinuxDriver
> <UNGLinuxDriver@...rochip.com>
> Subject: Re: [PATCH v1 net] lan743x: fix return value for
> lan743x_tx_napi_poll
> 
> On 11/20/18 1:39 PM, Bryan.Whitehead@...rochip.com wrote:
> >> -----Original Message-----
> >> From: Andrew Lunn <andrew@...n.ch>
> >> Sent: Tuesday, November 20, 2018 2:31 PM
> >> To: Bryan Whitehead - C21958 <Bryan.Whitehead@...rochip.com>
> >> Cc: davem@...emloft.net; netdev@...r.kernel.org; UNGLinuxDriver
> >> <UNGLinuxDriver@...rochip.com>
> >> Subject: Re: [PATCH v1 net] lan743x: fix return value for
> >> lan743x_tx_napi_poll
> >>
> >> On Tue, Nov 20, 2018 at 01:26:43PM -0500, Bryan Whitehead wrote:
> >>> It has been noticed that under stress the lan743x driver will
> >>> sometimes hang or cause a kernel panic. It has been noticed that
> >>> returning '0' instead of 'weight' fixes this issue.
> >>>
> >>> fixes: rare kernel panic under heavy traffic load.
> >>> Signed-off-by: Bryan Whitehead <Bryan.Whitehead@...rochip.com>
> >>
> >> Hi Bryan
> >>
> >> This sounds like a band aid over something which is broken, not a real fix.
> >>
> >> Can you show us the stack trace from the panic?
> >>
> >>     Andrew
> >
> > Andrew,
> >
> > Admittedly, my knowledge of what the kernel is doing behind the scenes is
> limited.
> >
> > But according to documentation found on
> > https://wiki.linuxfoundation.org/networking/napi
> >
> > It states the following
> > "The poll() function may also process TX completions, in which case if it
> processes
> > the entire TX ring then it should count that work as the rest of the budget.
> > Otherwise, TX completions are not counted."
> >
> > So based on that, the original driver was returning the full budget. But I was
> having
> > Issues with it. And the above documentation seems to suggest that I could
> return 0
> > As in "not counted" from above.
> >
> > I tried it, and my lock up issues disappeared.
> >
> > Regarding the kernel panic stack trace. So far its very hard to replicate that
> on the
> > latest kernel. I've seen it more frequently when back porting to older
> kernels such
> > as 4.14, and 4.9. This same fix caused those kernel panics to disappear.
> > Are you interested in seeing a stack dump from older kernels?
> >
> > In the latest kernel the issue manifests as a kernel message which states
> > "[  945.021101] enp48s0: Budget exhausted after napi rescheduled"
> >
> > I'm not sure what that means. But it does not lock up immediately after
> seeing that
> > Message. But it usually locks up with in a minute of seeing that message.
> >
> > And the sometimes I get the following warning
> > [ 1240.425020] ------------[ cut here ]------------
> > [ 1240.426014] NETDEV WATCHDOG: enp0s25 (e1000e): transmit queue 0
> timed out
> > [ 1240.430027] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:461
> dev_watchdog+0x1ef/0x200
> > [ 1240.430027] Modules linked in: lan743x
> > [ 1240.430027] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I       4.19.2 #1
> > [ 1240.430027] Hardware name: Hewlett-Packard HP Compaq dc7900
> Convertible Minitower/3032h, BIOS 786G1 v01.16 03/05/2009
> > [ 1240.430027] RIP: 0010:dev_watchdog+0x1ef/0x200
> > [ 1240.430027] Code: 00 48 63 4d e0 eb 93 4c 89 e7 c6 05 68 30 b3 00 01 e8 25
> 3d fd ff 89 d9 48 89 c2 4c 89 e6 48 c7 c7 98 92 48 ab e8 f1 28 87 ff <0f> 0b eb c0
> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 48 c7 47 08 00
> > [ 1240.430027] RSP: 0018:ffff98490be03e90 EFLAGS: 00010282
> > [ 1240.430027] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> 0000000000000000
> > [ 1240.497168] RDX: 0000000000040400 RSI: 00000000000000f6 RDI:
> 0000000000000300
> > [ 1240.497168] RBP: ffff984908574440 R08: 0000000000000000 R09:
> 00000000000003a4
> > [ 1240.497168] R10: 0000000000000020 R11: ffffffffabc928ed R12:
> ffff984908574000
> > [ 1240.497168] R13: 0000000000000000 R14: 0000000000000000 R15:
> ffff98490be195b0
> > [ 1240.497168] FS:  0000000000000000(0000) GS:ffff98490be00000(0000)
> knlGS:0000000000000000
> > [ 1240.497168] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1240.497168] CR2: 00007f31cd4c0000 CR3: 0000000109bca000 CR4:
> 00000000000406f0
> > [ 1240.497168] Call Trace:
> > [ 1240.497168]  <IRQ>
> > [ 1240.497168]  ? qdisc_reset+0xe0/0xe0
> > [ 1240.497168]  call_timer_fn+0x26/0x130
> > [ 1240.497168]  run_timer_softirq+0x1cd/0x400
> > [ 1240.497168]  ? hpet_interrupt_handler+0x10/0x30
> > [ 1240.497168]  __do_softirq+0xed/0x2aa
> > [ 1240.497168]  irq_exit+0xb7/0xc0
> > [ 1240.497168]  do_IRQ+0x45/0xd0
> > [ 1240.497168]  common_interrupt+0xf/0xf
> > [ 1240.497168]  </IRQ>
> > [ 1240.497168] RIP: 0010:cpuidle_enter_state+0xa6/0x330
> > [ 1240.497168] Code: 65 8b 3d 1d b0 4d 55 e8 58 6a 95 ff 48 89 c3 66 66 66 66
> 90 31 ff e8 59 73 95 ff 80 7c 24 0b 00 0f 85 25 02 00 00 fb 4c 29 eb <48> ba cf f7
> 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 ea b8 ff
> > [ 1240.497168] RSP: 0018:ffffffffab603e60 EFLAGS: 00000216 ORIG_RAX:
> ffffffffffffffde
> > [ 1240.497168] RAX: ffff98490be20a80 RBX: 000000000081035c RCX:
> 00000120cf178c49
> > [ 1240.497168] RDX: 00000120cf178ca0 RSI: 00000120cf178ca0 RDI:
> 0000000000000000
> > [ 1240.497168] RBP: ffff984908fbd000 R08: fffffffb58ea5f9e R09:
> 000001208e0b48df
> > [ 1240.497168] R10: 00000000000018c4 R11: 0000000000002468 R12:
> 0000000000000002
> > [ 1240.497168] R13: 00000120ce968944 R14: ffffffffab6a68a0 R15:
> ffffffffab611740
> > [ 1240.497168]  do_idle+0x1da/0x230
> > [ 1240.497168]  cpu_startup_entry+0x6a/0x70
> > [ 1240.497168]  start_kernel+0x4a2/0x4c2
> > [ 1240.497168]  secondary_startup_64+0xa4/0xb0
> > [ 1240.497168] ---[ end trace c6f3be34c214db4e ]---
> >
> > Notice the warning is referring to a different adapter. So I suspect that
> whatever happened it froze
> > All network adapters.
> >
> > If you have suggestions let me know.
> 
> Did you look at the output of "perf top" or something along those lines
> to figure out if your lan743x driver is indeed responsible for that by
> not being scheduler friendly? What is likely happening is that you do
> not reclaim "weight" packets and instead keep looping into NAPI context,
> which prevents the system from making further progress.
> 
> Calling napi_complete_done() for the TX path is not necessary AFAICT,
> what you really want to do is call napi_complete() and make sure you:
> 
> - reclaim/free as many TX buffers as possible, without looking at the
> NAPI weight which becomes irrelevant
> - if you have been able to reclaim enough descriptors, wake-up the
> transmit queue
> 
> So ignoring the NAPI weight like you do is correct, but calling
> napi_complete_done() with a 0 argument does not sound correct to me.
> --
> Florian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ