[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52d793f86d36baac455630a03d76f09a388e549f.camel@mellanox.com>
Date: Tue, 26 May 2020 21:23:11 +0000
From: Saeed Mahameed <saeedm@...lanox.com>
To: "dsahern@...il.com" <dsahern@...il.com>,
"daniel@...earbox.net" <daniel@...earbox.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"brouer@...hat.com" <brouer@...hat.com>
Subject: Re: bpf-next/net-next: panic using bpf_xdp_adjust_head
On Tue, 2020-05-26 at 13:04 -0600, David Ahern wrote:
> bpf-next and net-next are panicing when a bpf program uses
> adjust_head -
> e.g., popping a vlan header.
>
> [ 7269.886684] BUG: kernel NULL pointer dereference, address:
> 0000000000000004
> [ 7269.893676] #PF: supervisor read access in kernel mode
> [ 7269.898821] #PF: error_code(0x0000) - not-present page
> [ 7269.903970] PGD 0 P4D 0
> [ 7269.906516] Oops: 0000 [#1] SMP PTI
> [ 7269.910021] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G
> I 5.7.0-rc6+ #221
> [ 7269.919076] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS
> 1.6.12 11/20/2018
> [ 7269.926661] RIP: 0010:__memmove+0x24/0x1a0
> [ 7269.930766] Code: cc cc cc cc cc cc 48 89 f8 48 39 fe 7d 0f 49 89
> f0
> 49 01 d0 49 39 f8 0f 8f a9 00 00 00 48 83 fa 20 0f 82 f5 00 00 00 48
> 89
> d1 <f3> a4 c3 48 81 fa a8 02 00 00 72 05 40 38 fe 74 3b 48 83 ea 20
> 48
> [ 7269.949548] RSP: 0018:ffff9c09cca04c68 EFLAGS: 00010282
> [ 7269.954781] RAX: 0000000000000008 RBX: ffff9c09cca04d78 RCX:
> ffff8bfc475a20fc
> [ 7269.961927] RDX: ffff8bfc475a20fc RSI: 0000000000000004 RDI:
> 0000000000000008
> [ 7269.969068] RBP: ffff8bfc475a2104 R08: ffff8bfc475a2100 R09:
> ffff8bfc475a211c
> [ 7269.976229] R10: 0000000000000012 R11: 0000000000000008 R12:
> 0000000000000004
> [ 7269.983376] R13: ffff9c09cc9f57b8 R14: ffff8bfc475a2100 R15:
> 0000000000000008
> [ 7269.990518] FS: 0000000000000000(0000) GS:ffff8c011f240000(0000)
> knlGS:0000000000000000
> [ 7269.998623] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7270.004381] CR2: 0000000000000004 CR3: 0000001a72a0a004 CR4:
> 00000000007626e0
> [ 7270.011523] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 7270.018682] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 7270.025824] PKRU: 55555554
> [ 7270.028539] Call Trace:
> [ 7270.030990] <IRQ>
looks like: xdp->data_meta has some invalid value.
and i think its boundaries should be checked on
bpf_xdp_adjust_head() regardless of the issue that you are seeing.
Anyway I can't figure out the reason for this without extra digging
since in mlx5 we do xdp_set_data_meta_invalid(); before passing the xdp
buff to the bpf program, so it is not clear why would you hit the
memove in bpf_xdp_adjust_head().
> [ 7270.033014] bpf_xdp_adjust_head+0x68/0x80
> [ 7270.037126] bpf_prog_7d719f00afcf8e6c_xdp_l2fwd_prog+0x198/0xa10
> [ 7270.043284] mlx5e_xdp_handle+0x55/0x500 [mlx5_core]
> [ 7270.048277] mlx5e_skb_from_cqe_linear+0xf0/0x1b0 [mlx5_core]
> [ 7270.054053] mlx5e_handle_rx_cqe+0x64/0x140 [mlx5_core]
> [ 7270.059297] mlx5e_poll_rx_cq+0x8c8/0xa30 [mlx5_core]
> [ 7270.064373] mlx5e_napi_poll+0xdc/0x6a0 [mlx5_core]
> [ 7270.069260] net_rx_action+0x13d/0x3d0
> [ 7270.073020] __do_softirq+0xdd/0x2d0
>
>
> git bisect chased it to
> 13209a8f7304 ("Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
>
Are you testing vanilla kernel ?
what does the xdp program do with the frame/xdp_buff other than
bpf_xdp_adjust_head()/ i mean which other bpf helper is it calling ?
> but that brings in a LOT of changes. Anyone have ideas on recent
> changes
> that could be the root cause?
Powered by blists - more mailing lists