lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Fri, 11 Aug 2023 18:22:19 +0800
From: Hou Tao <houtao@...weicloud.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>,
 bpf@...r.kernel.org
Cc: netdev@...r.kernel.org, "David S . Miller" <davem@...emloft.net>,
 Jakub Kicinski <kuba@...nel.org>, Jesper Dangaard Brouer <hawk@...nel.org>,
 John Fastabend <john.fastabend@...il.com>,
 Björn Töpel <bjorn.topel@...il.com>,
 Martin KaFai Lau <martin.lau@...ux.dev>,
 Alexei Starovoitov <alexei.starovoitov@...il.com>,
 Andrii Nakryiko <andrii@...nel.org>, Song Liu <song@...nel.org>,
 Hao Luo <haoluo@...gle.com>, Yonghong Song <yonghong.song@...ux.dev>,
 Daniel Borkmann <daniel@...earbox.net>, KP Singh <kpsingh@...nel.org>,
 Stanislav Fomichev <sdf@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
 houtao1@...wei.com
Subject: Re: [RFC PATCH bpf-next 1/2] bpf, cpumap: Use queue_rcu_work() to
 remove unnecessary rcu_barrier()

Hi,

On 8/10/2023 6:16 PM, Toke Høiland-Jørgensen wrote:
> Hou Tao <houtao@...weicloud.com> writes:
>
>> From: Hou Tao <houtao1@...wei.com>
>>
>> As for now __cpu_map_entry_replace() uses call_rcu() to wait for the
>> inflight xdp program and NAPI poll to exit the RCU read critical
>> section, and then launch kworker cpu_map_kthread_stop() to call
>> kthread_stop() to handle all pending xdp frames or skbs.
>>
>> But it is unnecessary to use rcu_barrier() in cpu_map_kthread_stop() to
>> wait for the completion of __cpu_map_entry_free(), because rcu_barrier()
>> will wait for all pending RCU callbacks and cpu_map_kthread_stop() only
>> needs to wait for the completion of a specific __cpu_map_entry_free().
>>
>> So use queue_rcu_work() to replace call_rcu(), schedule_work() and
>> rcu_barrier(). queue_rcu_work() will queue a __cpu_map_entry_free()
>> kworker after a RCU grace period. Because __cpu_map_entry_free() is
>> running in a kworker context, so it is OK to do all of these freeing
>> procedures include kthread_stop() in it.
>>
>> After the update, there is no need to do reference-counting for
>> bpf_cpu_map_entry, because bpf_cpu_map_entry is freed directly in
>> __cpu_map_entry_free(), so just remove it.
>>
>> Signed-off-by: Hou Tao <houtao1@...wei.com>
> I think your analysis is correct, and this is a nice cleanup of what is
> really a bit of an over-complicated cleanup flow - well done!
>
> I have a few nits below, but with those feel free to resend as non-RFC
> and add my:
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@...hat.com>

Thanks for the review.
>
>> ---
>>  kernel/bpf/cpumap.c | 93 +++++++++++----------------------------------
>>  1 file changed, 23 insertions(+), 70 deletions(-)
>>
SNIP
>> -static void __cpu_map_entry_free(struct rcu_head *rcu)
>> +static void __cpu_map_entry_free(struct work_struct *work)
>>  {
>>  	struct bpf_cpu_map_entry *rcpu;
>>  
>> @@ -503,30 +454,33 @@ static void __cpu_map_entry_free(struct rcu_head *rcu)
>>  	 * new packets and cannot change/set flush_needed that can
>>  	 * find this entry.
>>  	 */
>> -	rcpu = container_of(rcu, struct bpf_cpu_map_entry, rcu);
>> +	rcpu = container_of(to_rcu_work(work), struct bpf_cpu_map_entry, free_work);
>>  
>>  	free_percpu(rcpu->bulkq);
> Let's move this free down to the end along with the others.

Will do in v1.
>
>> -	/* Cannot kthread_stop() here, last put free rcpu resources */
>> -	put_cpu_map_entry(rcpu);
>> +
>> +	/* kthread_stop will wake_up_process and wait for it to complete */
> Suggest adding to this comment: "cpu_map_kthread_run() makes sure the
> pointer ring is empty before exiting."

Will do in v1.
>
>> +	kthread_stop(rcpu->kthread);
>> +
>> +	if (rcpu->prog)
>> +		bpf_prog_put(rcpu->prog);
>> +	/* The queue should be empty at this point */
>> +	__cpu_map_ring_cleanup(rcpu->queue);
>> +	ptr_ring_cleanup(rcpu->queue, NULL);
>> +	kfree(rcpu->queue);
>> +	kfree(rcpu);
>>  }
>>  
>>  /* After xchg pointer to bpf_cpu_map_entry, use the call_rcu() to
>> - * ensure any driver rcu critical sections have completed, but this
>> - * does not guarantee a flush has happened yet. Because driver side
>> - * rcu_read_lock/unlock only protects the running XDP program.  The
>> - * atomic xchg and NULL-ptr check in __cpu_map_flush() makes sure a
>> - * pending flush op doesn't fail.
>> + * ensure both any driver rcu critical sections and xdp_do_flush()
>> + * have completed.
>>   *
>>   * The bpf_cpu_map_entry is still used by the kthread, and there can
>> - * still be pending packets (in queue and percpu bulkq).  A refcnt
>> - * makes sure to last user (kthread_stop vs. call_rcu) free memory
>> - * resources.
>> + * still be pending packets (in queue and percpu bulkq).
>>   *
>> - * The rcu callback __cpu_map_entry_free flush remaining packets in
>> - * percpu bulkq to queue.  Due to caller map_delete_elem() disable
>> - * preemption, cannot call kthread_stop() to make sure queue is empty.
>> - * Instead a work_queue is started for stopping kthread,
>> - * cpu_map_kthread_stop, which waits for an RCU grace period before
>> + * Due to caller map_delete_elem() is in RCU read critical section,
>> + * cannot call kthread_stop() to make sure queue is empty. Instead
>> + * a work_struct is started for stopping kthread,
>> + * __cpu_map_entry_free, which waits for a RCU grace period before
>>   * stopping kthread, emptying the queue.
>>   */
> I think the above comment is a bit too convoluted, still. I'd suggest
> just replacing the whole thing with this:
>
> /* After the xchg of the bpf_cpu_map_entry pointer, we need to make sure the old
>  * entry is no longer in use before freeing. We use queue_rcu_work() to call
>  * __cpu_map_entry_free() in a separate workqueue after waiting for an RCU grace
>  * period. This means that (a) all pending enqueue and flush operations have
>  * completed (because or the RCU callback), and (b) we are in a workqueue
>  * context where we can stop the kthread and wait for it to exit before freeing
>  * everything.
>  */
Much better. Thanks for the rephrasing.  Will update it in v1.
>>  static void __cpu_map_entry_replace(struct bpf_cpu_map *cmap,
>> @@ -536,9 +490,8 @@ static void __cpu_map_entry_replace(struct bpf_cpu_map *cmap,
>>  
>>  	old_rcpu = unrcu_pointer(xchg(&cmap->cpu_map[key_cpu], RCU_INITIALIZER(rcpu)));
>>  	if (old_rcpu) {
>> -		call_rcu(&old_rcpu->rcu, __cpu_map_entry_free);
>> -		INIT_WORK(&old_rcpu->kthread_stop_wq, cpu_map_kthread_stop);
>> -		schedule_work(&old_rcpu->kthread_stop_wq);
>> +		INIT_RCU_WORK(&old_rcpu->free_work, __cpu_map_entry_free);
>> +		queue_rcu_work(system_wq, &old_rcpu->free_work);
>>  	}
>>  }
>>  
>> -- 
>> 2.29.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ