[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170710203050.54b2d8eb@redhat.com>
Date: Mon, 10 Jul 2017 20:30:50 +0200
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: David Miller <davem@...emloft.net>
Cc: john.fastabend@...il.com, netdev@...r.kernel.org,
andy@...yhouse.net, daniel@...earbox.net, ast@...com,
alexander.duyck@...il.com, bjorn.topel@...el.com,
jakub.kicinski@...ronome.com, ecree@...arflare.com,
sgoutham@...ium.com, Yuval.Mintz@...ium.com, saeedm@...lanox.com,
brouer@...hat.com
Subject: Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants
On Sat, 8 Jul 2017 21:06:17 +0200
Jesper Dangaard Brouer <brouer@...hat.com> wrote:
> On Sat, 08 Jul 2017 10:46:18 +0100 (WEST)
> David Miller <davem@...emloft.net> wrote:
>
> > From: John Fastabend <john.fastabend@...il.com>
> > Date: Fri, 07 Jul 2017 10:48:36 -0700
> >
> > > On 07/07/2017 10:34 AM, John Fastabend wrote:
> > >> This series adds two new XDP helper routines bpf_redirect() and
> > >> bpf_redirect_map(). The first variant bpf_redirect() is meant
> > >> to be used the same way it is currently being used by the cls_bpf
> > >> classifier. An xdp packet will be redirected immediately when this
> > >> is called.
> > >
> > > Also other than the typo in the title there ;) I'm going to CC
> > > the driver maintainers working on XDP (makes for a long CC list but)
> > > because we would want to try and get support in as many as possible in
> > > the next merge window.
> > >
> > > For this rev I just implemented on ixgbe because I wrote the
> > > original XDP support there. I'll volunteer to do virtio as well.
> >
> > I went over this series a few times and it looks great to me.
> > You didn't even give me some coding style issues to pick on :-)
>
> We (Daniel, Andy and I) have been reviewing and improving on this
> patchset the last couple of weeks ;-). We had some stability issues,
> which is why it wasn't published earlier. My plan is to test this
> latest patchset again, Monday and Tuesday. I'll try to assess stability
> and provide some performance numbers.
Damn, I though it was stable, I have been running a lot of performance
tests, and then this just happened :-(
[11357.149486] BUG: unable to handle kernel NULL pointer dereference at 0000000000000210
[11357.157393] IP: __dev_map_flush+0x58/0x90
[11357.161446] PGD 3ff685067
[11357.161446] P4D 3ff685067
[11357.164199] PUD 3ff684067
[11357.166947] PMD 0
[11357.170396]
[11357.173981] Oops: 0000 [#1] PREEMPT SMP
[11357.177859] Modules linked in: coretemp kvm_intel kvm irqbypass intel_cstate intel_uncore intel_rapl_perf mxm_wmi i2c_i801 pcspkr sg i2c_co]
[11357.203021] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.0-net-next-xdp_redirect02-rfc+ #135
[11357.211706] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z97 Extreme4, BIOS P2.10 05/12/2015
[11357.221606] task: ffffffff81c0e480 task.stack: ffffffff81c00000
[11357.227568] RIP: 0010:__dev_map_flush+0x58/0x90
[11357.232138] RSP: 0018:ffff88041fa03de0 EFLAGS: 00010286
[11357.237409] RAX: 0000000000000000 RBX: ffff8803fc996600 RCX: 0000000000000003
[11357.244589] RDX: ffff88040d0bf480 RSI: 0000000000000000 RDI: ffffffff81d901d8
[11357.251767] RBP: ffff88041fa03df8 R08: fffffffffffffffc R09: 0000000700000008
[11357.258940] R10: ffffffff813f11d0 R11: 0000000000000040 R12: ffffe8ffffc014a0
[11357.266119] R13: 0000000000000003 R14: 000000000000003c R15: ffff8803fc9c26c0
[11357.273294] FS: 0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
[11357.281454] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11357.287244] CR2: 0000000000000210 CR3: 00000003fc41e000 CR4: 00000000001406f0
[11357.294423] Call Trace:
[11357.296912] <IRQ>
[11357.298967] xdp_do_flush_map+0x36/0x40
[11357.302847] ixgbe_poll+0x7ea/0x1370 [ixgbe]
[11357.307160] net_rx_action+0x247/0x3e0
[11357.310957] __do_softirq+0x106/0x2cb
[11357.314664] irq_exit+0xbe/0xd0
[11357.317851] do_IRQ+0x4f/0xd0
[11357.320858] common_interrupt+0x86/0x86
[11357.324733] RIP: 0010:poll_idle+0x2f/0x5a
[11357.328781] RSP: 0018:ffffffff81c03dd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff8e
[11357.336426] RAX: 0000000000000000 RBX: ffffffff81d689e0 RCX: 0000000000200000
[11357.343605] RDX: 0000000000000000 RSI: ffffffff81c0e480 RDI: ffff88041fa22800
[11357.350783] RBP: ffffffff81c03dd0 R08: 00000000000003c5 R09: 0000000000000018
[11357.357958] R10: 0000000000000327 R11: 0000000000000390 R12: ffff88041fa22800
[11357.365135] R13: ffffffff81d689f8 R14: 0000000000000000 R15: ffffffff81d689e0
[11357.372311] </IRQ>
[11357.374453] cpuidle_enter_state+0xf2/0x2e0
[11357.378678] cpuidle_enter+0x17/0x20
[11357.382299] call_cpuidle+0x23/0x40
[11357.385834] do_idle+0xe8/0x190
[11357.389024] cpu_startup_entry+0x1d/0x20
[11357.392993] rest_init+0xd5/0xe0
[11357.396268] start_kernel+0x3d7/0x3e4
[11357.399979] x86_64_start_reservations+0x2a/0x2c
[11357.404641] x86_64_start_kernel+0x178/0x18b
[11357.408959] secondary_startup_64+0x9f/0x9f
[11357.413186] ? secondary_startup_64+0x9f/0x9f
[11357.417589] Code: 41 89 c5 48 8b 53 60 44 89 e8 48 8d 14 c2 48 8b 12 48 85 d2 74 23 48 8b 3a f0 49 0f b3 04 24 48 85 ff 74 15 48 8b 87 e0 0
[11357.436565] RIP: __dev_map_flush+0x58/0x90 RSP: ffff88041fa03de0
[11357.442613] CR2: 0000000000000210
[11357.445972] ---[ end trace f7ed232095169a98 ]---
[11357.450637] Kernel panic - not syncing: Fatal exception in interrupt
[11357.457038] Kernel Offset: disabled
[11357.460566] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
(gdb) list *(__dev_map_flush)+0x58
0xffffffff811422a8 is in __dev_map_flush (kernel/bpf/devmap.c:257).
252 continue;
253
254 netdev = dev->dev;
255
256 clear_bit(bit, bitmap);
257 if (unlikely(!netdev || !netdev->netdev_ops->ndo_xdp_flush))
258 continue;
259
260 netdev->netdev_ops->ndo_xdp_flush(netdev);
261 }
My analysis it that "netdev->netdev_ops == NULL" as:
NULL pointer dereference at 0000000000000210
$ echo $((0x210))
528
As ndo_xdp_flush is at offset 528 in struct net_device_ops.
$ pahole -C net_device_ops vmlinux
void (*ndo_xdp_flush)(struct net_device *); /* 528 8 */
But I cannot see where/what will change netdev->netdev_ops to be NULL?!?
> I've complained/warned about the danger of redirecting with XDP,
> without providing (1) a way to debug/see XDP redirects, (2) a way
> interfaces opt-in they can be redirected. (1) is solved by patch-07/12
> via a tracepoint. (2) is currently done by only forwarding to
> interfaces with an XDP program loaded itself, this also comes from a
> practical need for NIC drivers to allocate XDP-TX HW queues.
>
> I'm not satisfied with the (UAPI) name for the new map
> "BPF_MAP_TYPE_DEVMAP" and the filename this is placed in
> "kernel/bpf/devmap.c", as we want to take advantage of compiler
> inlining for the next redirect map types. (1) because the name doesn't
> tell the user that this map is connected to the redirect_map call.
> (2) we want to introduce other kinds of redirect maps (like redirect to
> CPUs and sockets), and it would be good if they shared a common "text"
> string.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Powered by blists - more mailing lists