[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <874kq2y2cy.fsf@cloudflare.com>
Date: Mon, 20 Jul 2020 11:14:53 +0200
From: Jakub Sitnicki <jakub@...udflare.com>
To: Lorenzo Bianconi <lorenzo@...nel.org>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, davem@...emloft.net,
ast@...nel.org, brouer@...hat.com, daniel@...earbox.net,
lorenzo.bianconi@...hat.com, kuba@...nel.org
Subject: Re: [PATCH bpf-next] bpf: cpumap: fix possible rcpu kthread hung
On Sun, Jul 19, 2020 at 05:52 PM CEST, Lorenzo Bianconi wrote:
> Fix the following cpumap kthread hung. The issue is currently occurring
> when __cpu_map_load_bpf_program fails (e.g if the bpf prog has not
> BPF_XDP_CPUMAP as expected_attach_type)
>
> $./test_progs -n 101
> 101/1 cpumap_with_progs:OK
> 101 xdp_cpumap_attach:OK
> Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
> [ 369.996478] INFO: task cpumap/0/map:7:205 blocked for more than 122 seconds.
> [ 369.998463] Not tainted 5.8.0-rc4-01472-ge57892f50a07 #212
> [ 370.000102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 370.001918] cpumap/0/map:7 D 0 205 2 0x00004000
> [ 370.003228] Call Trace:
> [ 370.003930] __schedule+0x5c7/0xf50
> [ 370.004901] ? io_schedule_timeout+0xb0/0xb0
> [ 370.005934] ? static_obj+0x31/0x80
> [ 370.006788] ? mark_held_locks+0x24/0x90
> [ 370.007752] ? cpu_map_bpf_prog_run_xdp+0x6c0/0x6c0
> [ 370.008930] schedule+0x6f/0x160
> [ 370.009728] schedule_preempt_disabled+0x14/0x20
> [ 370.010829] kthread+0x17b/0x240
> [ 370.011433] ? kthread_create_worker_on_cpu+0xd0/0xd0
> [ 370.011944] ret_from_fork+0x1f/0x30
> [ 370.012348]
> Showing all locks held in the system:
> [ 370.013025] 1 lock held by khungtaskd/33:
> [ 370.013432] #0: ffffffff82b24720 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x28/0x1c3
>
> [ 370.014461] =============================================
>
> Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap")
> Reported-by: Jakub Sitnicki <jakub@...udflare.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@...nel.org>
> ---
> kernel/bpf/cpumap.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> index 4c95d0615ca2..f1c46529929b 100644
> --- a/kernel/bpf/cpumap.c
> +++ b/kernel/bpf/cpumap.c
> @@ -453,24 +453,27 @@ __cpu_map_entry_alloc(struct bpf_cpumap_val *value, u32 cpu, int map_id)
> rcpu->map_id = map_id;
> rcpu->value.qsize = value->qsize;
>
> + if (fd > 0 && __cpu_map_load_bpf_program(rcpu, fd))
> + goto free_ptr_ring;
> +
I realize it's a code move, but fd == 0 is a valid descriptor number.
The check is too strict, IMHO.
> /* Setup kthread */
> rcpu->kthread = kthread_create_on_node(cpu_map_kthread_run, rcpu, numa,
> "cpumap/%d/map:%d", cpu, map_id);
> if (IS_ERR(rcpu->kthread))
> - goto free_ptr_ring;
> + goto free_prog;
>
> get_cpu_map_entry(rcpu); /* 1-refcnt for being in cmap->cpu_map[] */
> get_cpu_map_entry(rcpu); /* 1-refcnt for kthread */
>
> - if (fd > 0 && __cpu_map_load_bpf_program(rcpu, fd))
> - goto free_ptr_ring;
> -
> /* Make sure kthread runs on a single CPU */
> kthread_bind(rcpu->kthread, cpu);
> wake_up_process(rcpu->kthread);
>
> return rcpu;
>
> +free_prog:
> + if (rcpu->prog)
> + bpf_prog_put(rcpu->prog);
> free_ptr_ring:
> ptr_ring_cleanup(rcpu->queue, NULL);
> free_queue:
Hung task splat is gone:
Tested-by: Jakub Sitnicki <jakub@...udflare.com>
Powered by blists - more mailing lists