lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 10 Jul 2017 20:30:50 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     David Miller <davem@...emloft.net>
Cc:     john.fastabend@...il.com, netdev@...r.kernel.org,
        andy@...yhouse.net, daniel@...earbox.net, ast@...com,
        alexander.duyck@...il.com, bjorn.topel@...el.com,
        jakub.kicinski@...ronome.com, ecree@...arflare.com,
        sgoutham@...ium.com, Yuval.Mintz@...ium.com, saeedm@...lanox.com,
        brouer@...hat.com
Subject: Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

On Sat, 8 Jul 2017 21:06:17 +0200
Jesper Dangaard Brouer <brouer@...hat.com> wrote:

> On Sat, 08 Jul 2017 10:46:18 +0100 (WEST)
> David Miller <davem@...emloft.net> wrote:
> 
> > From: John Fastabend <john.fastabend@...il.com>
> > Date: Fri, 07 Jul 2017 10:48:36 -0700
> >   
> > > On 07/07/2017 10:34 AM, John Fastabend wrote:    
> > >> This series adds two new XDP helper routines bpf_redirect() and
> > >> bpf_redirect_map(). The first variant bpf_redirect() is meant
> > >> to be used the same way it is currently being used by the cls_bpf
> > >> classifier. An xdp packet will be redirected immediately when this
> > >> is called.    
> > > 
> > > Also other than the typo in the title there ;) I'm going to CC
> > > the driver maintainers working on XDP (makes for a long CC list but)
> > > because we would want to try and get support in as many as possible in
> > > the next merge window.
> > > 
> > > For this rev I just implemented on ixgbe because I wrote the
> > > original XDP support there. I'll volunteer to do virtio as well.    
> > 
> > I went over this series a few times and it looks great to me.
> > You didn't even give me some coding style issues to pick on :-)  
> 
> We (Daniel, Andy and I) have been reviewing and improving on this
> patchset the last couple of weeks ;-).  We had some stability issues,
> which is why it wasn't published earlier. My plan is to test this
> latest patchset again, Monday and Tuesday. I'll try to assess stability
> and provide some performance numbers.


Damn, I though it was stable, I have been running a lot of performance
tests, and then this just happened :-(

[11357.149486] BUG: unable to handle kernel NULL pointer dereference at 0000000000000210
[11357.157393] IP: __dev_map_flush+0x58/0x90
[11357.161446] PGD 3ff685067 
[11357.161446] P4D 3ff685067 
[11357.164199] PUD 3ff684067 
[11357.166947] PMD 0 
[11357.170396] 
[11357.173981] Oops: 0000 [#1] PREEMPT SMP
[11357.177859] Modules linked in: coretemp kvm_intel kvm irqbypass intel_cstate intel_uncore intel_rapl_perf mxm_wmi i2c_i801 pcspkr sg i2c_co]
[11357.203021] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.0-net-next-xdp_redirect02-rfc+ #135
[11357.211706] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z97 Extreme4, BIOS P2.10 05/12/2015
[11357.221606] task: ffffffff81c0e480 task.stack: ffffffff81c00000
[11357.227568] RIP: 0010:__dev_map_flush+0x58/0x90
[11357.232138] RSP: 0018:ffff88041fa03de0 EFLAGS: 00010286
[11357.237409] RAX: 0000000000000000 RBX: ffff8803fc996600 RCX: 0000000000000003
[11357.244589] RDX: ffff88040d0bf480 RSI: 0000000000000000 RDI: ffffffff81d901d8
[11357.251767] RBP: ffff88041fa03df8 R08: fffffffffffffffc R09: 0000000700000008
[11357.258940] R10: ffffffff813f11d0 R11: 0000000000000040 R12: ffffe8ffffc014a0
[11357.266119] R13: 0000000000000003 R14: 000000000000003c R15: ffff8803fc9c26c0
[11357.273294] FS:  0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
[11357.281454] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11357.287244] CR2: 0000000000000210 CR3: 00000003fc41e000 CR4: 00000000001406f0
[11357.294423] Call Trace:
[11357.296912]  <IRQ>
[11357.298967]  xdp_do_flush_map+0x36/0x40
[11357.302847]  ixgbe_poll+0x7ea/0x1370 [ixgbe]
[11357.307160]  net_rx_action+0x247/0x3e0
[11357.310957]  __do_softirq+0x106/0x2cb
[11357.314664]  irq_exit+0xbe/0xd0
[11357.317851]  do_IRQ+0x4f/0xd0
[11357.320858]  common_interrupt+0x86/0x86
[11357.324733] RIP: 0010:poll_idle+0x2f/0x5a
[11357.328781] RSP: 0018:ffffffff81c03dd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff8e
[11357.336426] RAX: 0000000000000000 RBX: ffffffff81d689e0 RCX: 0000000000200000
[11357.343605] RDX: 0000000000000000 RSI: ffffffff81c0e480 RDI: ffff88041fa22800
[11357.350783] RBP: ffffffff81c03dd0 R08: 00000000000003c5 R09: 0000000000000018
[11357.357958] R10: 0000000000000327 R11: 0000000000000390 R12: ffff88041fa22800
[11357.365135] R13: ffffffff81d689f8 R14: 0000000000000000 R15: ffffffff81d689e0
[11357.372311]  </IRQ>
[11357.374453]  cpuidle_enter_state+0xf2/0x2e0
[11357.378678]  cpuidle_enter+0x17/0x20
[11357.382299]  call_cpuidle+0x23/0x40
[11357.385834]  do_idle+0xe8/0x190
[11357.389024]  cpu_startup_entry+0x1d/0x20
[11357.392993]  rest_init+0xd5/0xe0
[11357.396268]  start_kernel+0x3d7/0x3e4
[11357.399979]  x86_64_start_reservations+0x2a/0x2c
[11357.404641]  x86_64_start_kernel+0x178/0x18b
[11357.408959]  secondary_startup_64+0x9f/0x9f
[11357.413186]  ? secondary_startup_64+0x9f/0x9f
[11357.417589] Code: 41 89 c5 48 8b 53 60 44 89 e8 48 8d 14 c2 48 8b 12 48 85 d2 74 23 48 8b 3a f0 49 0f b3 04 24 48 85 ff 74 15 48 8b 87 e0 0 
[11357.436565] RIP: __dev_map_flush+0x58/0x90 RSP: ffff88041fa03de0
[11357.442613] CR2: 0000000000000210
[11357.445972] ---[ end trace f7ed232095169a98 ]---
[11357.450637] Kernel panic - not syncing: Fatal exception in interrupt
[11357.457038] Kernel Offset: disabled
[11357.460566] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

(gdb) list *(__dev_map_flush)+0x58
0xffffffff811422a8 is in __dev_map_flush (kernel/bpf/devmap.c:257).
252				continue;
253	
254			netdev = dev->dev;
255	
256			clear_bit(bit, bitmap);
257			if (unlikely(!netdev || !netdev->netdev_ops->ndo_xdp_flush))
258				continue;
259	
260			netdev->netdev_ops->ndo_xdp_flush(netdev);
261		}

My analysis it that "netdev->netdev_ops == NULL" as:

 NULL pointer dereference at 0000000000000210
 $ echo $((0x210))
 528

As ndo_xdp_flush is at offset 528 in struct net_device_ops.

 $ pahole -C net_device_ops vmlinux
 void (*ndo_xdp_flush)(struct net_device *); /*   528     8 */

But I cannot see where/what will change netdev->netdev_ops to be NULL?!?


> I've complained/warned about the danger of redirecting with XDP,
> without providing (1) a way to debug/see XDP redirects, (2) a way
> interfaces opt-in they can be redirected. (1) is solved by patch-07/12
> via a tracepoint. (2) is currently done by only forwarding to
> interfaces with an XDP program loaded itself, this also comes from a
> practical need for NIC drivers to allocate XDP-TX HW queues.
> 
> I'm not satisfied with the (UAPI) name for the new map
> "BPF_MAP_TYPE_DEVMAP" and the filename this is placed in
> "kernel/bpf/devmap.c", as we want to take advantage of compiler
> inlining for the next redirect map types.  (1) because the name doesn't
> tell the user that this map is connected to the redirect_map call.
> (2) we want to introduce other kinds of redirect maps (like redirect to
> CPUs and sockets), and it would be good if they shared a common "text"
> string.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ