lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52d793f86d36baac455630a03d76f09a388e549f.camel@mellanox.com>
Date:   Tue, 26 May 2020 21:23:11 +0000
From:   Saeed Mahameed <saeedm@...lanox.com>
To:     "dsahern@...il.com" <dsahern@...il.com>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "brouer@...hat.com" <brouer@...hat.com>
Subject: Re: bpf-next/net-next: panic using bpf_xdp_adjust_head

On Tue, 2020-05-26 at 13:04 -0600, David Ahern wrote:
> bpf-next and net-next are panicing when a bpf program uses
> adjust_head -
> e.g., popping a vlan header.
> 
> [ 7269.886684] BUG: kernel NULL pointer dereference, address:
> 0000000000000004
> [ 7269.893676] #PF: supervisor read access in kernel mode
> [ 7269.898821] #PF: error_code(0x0000) - not-present page
> [ 7269.903970] PGD 0 P4D 0
> [ 7269.906516] Oops: 0000 [#1] SMP PTI
> [ 7269.910021] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G
>       I       5.7.0-rc6+ #221
> [ 7269.919076] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS
> 1.6.12 11/20/2018
> [ 7269.926661] RIP: 0010:__memmove+0x24/0x1a0
> [ 7269.930766] Code: cc cc cc cc cc cc 48 89 f8 48 39 fe 7d 0f 49 89
> f0
> 49 01 d0 49 39 f8 0f 8f a9 00 00 00 48 83 fa 20 0f 82 f5 00 00 00 48
> 89
> d1 <f3> a4 c3 48 81 fa a8 02 00 00 72 05 40 38 fe 74 3b 48 83 ea 20
> 48
> [ 7269.949548] RSP: 0018:ffff9c09cca04c68 EFLAGS: 00010282
> [ 7269.954781] RAX: 0000000000000008 RBX: ffff9c09cca04d78 RCX:
> ffff8bfc475a20fc
> [ 7269.961927] RDX: ffff8bfc475a20fc RSI: 0000000000000004 RDI:
> 0000000000000008
> [ 7269.969068] RBP: ffff8bfc475a2104 R08: ffff8bfc475a2100 R09:
> ffff8bfc475a211c
> [ 7269.976229] R10: 0000000000000012 R11: 0000000000000008 R12:
> 0000000000000004
> [ 7269.983376] R13: ffff9c09cc9f57b8 R14: ffff8bfc475a2100 R15:
> 0000000000000008
> [ 7269.990518] FS:  0000000000000000(0000) GS:ffff8c011f240000(0000)
> knlGS:0000000000000000
> [ 7269.998623] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7270.004381] CR2: 0000000000000004 CR3: 0000001a72a0a004 CR4:
> 00000000007626e0
> [ 7270.011523] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 7270.018682] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 7270.025824] PKRU: 55555554
> [ 7270.028539] Call Trace:
> [ 7270.030990]  <IRQ>

looks like: xdp->data_meta has some invalid value.
and i think its boundaries should be checked on 
bpf_xdp_adjust_head() regardless of the issue that you are seeing.

Anyway I can't figure out the reason for this without extra digging
since in mlx5 we do xdp_set_data_meta_invalid(); before passing the xdp
buff to the bpf program, so it is not clear why would you hit the
memove in bpf_xdp_adjust_head().

> [ 7270.033014]  bpf_xdp_adjust_head+0x68/0x80
> [ 7270.037126]  bpf_prog_7d719f00afcf8e6c_xdp_l2fwd_prog+0x198/0xa10
> [ 7270.043284]  mlx5e_xdp_handle+0x55/0x500 [mlx5_core]
> [ 7270.048277]  mlx5e_skb_from_cqe_linear+0xf0/0x1b0 [mlx5_core]
> [ 7270.054053]  mlx5e_handle_rx_cqe+0x64/0x140 [mlx5_core]
> [ 7270.059297]  mlx5e_poll_rx_cq+0x8c8/0xa30 [mlx5_core]
> [ 7270.064373]  mlx5e_napi_poll+0xdc/0x6a0 [mlx5_core]
> [ 7270.069260]  net_rx_action+0x13d/0x3d0
> [ 7270.073020]  __do_softirq+0xdd/0x2d0
> 
> 
> git bisect chased it to
>   13209a8f7304 ("Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
> 

Are you testing vanilla kernel ? 

what does the xdp program do with the frame/xdp_buff other than
bpf_xdp_adjust_head()/ i mean which other bpf helper is it calling ?

> but that brings in a LOT of changes. Anyone have ideas on recent
> changes
> that could be the root cause?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ