lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87il2rdnxs.fsf@toke.dk>
Date: Wed, 14 Feb 2024 17:13:03 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, bpf@...r.kernel.org,
 netdev@...r.kernel.org
Cc: Björn Töpel <bjorn@...nel.org>, "David S. Miller"
 <davem@...emloft.net>,
 Alexei Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>,
 Daniel Borkmann <daniel@...earbox.net>, Eric Dumazet
 <edumazet@...gle.com>, Hao Luo <haoluo@...gle.com>, Jakub Kicinski
 <kuba@...nel.org>, Jesper Dangaard Brouer <hawk@...nel.org>, Jiri Olsa
 <jolsa@...nel.org>, John Fastabend <john.fastabend@...il.com>, Jonathan
 Lemon <jonathan.lemon@...il.com>, KP Singh <kpsingh@...nel.org>, Maciej
 Fijalkowski <maciej.fijalkowski@...el.com>, Magnus Karlsson
 <magnus.karlsson@...el.com>, Martin KaFai Lau <martin.lau@...ux.dev>,
 Paolo Abeni <pabeni@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Song Liu <song@...nel.org>, Stanislav Fomichev <sdf@...gle.com>, Thomas
 Gleixner <tglx@...utronix.de>, Yonghong Song <yonghong.song@...ux.dev>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH RFC net-next 1/2] net: Reference bpf_redirect_info via
 task_struct on PREEMPT_RT.

Sebastian Andrzej Siewior <bigeasy@...utronix.de> writes:

> The XDP redirect process is two staged:
> - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the
>   packet and makes decisions. While doing that, the per-CPU variable
>   bpf_redirect_info is used.
>
> - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info
>   and it may also access other per-CPU variables like xskmap_flush_list.
>
> At the very end of the NAPI callback, xdp_do_flush() is invoked which
> does not access bpf_redirect_info but will touch the individual per-CPU
> lists.
>
> The per-CPU variables are only used in the NAPI callback hence disabling
> bottom halves is the only protection mechanism. Users from preemptible
> context (like cpu_map_kthread_run()) explicitly disable bottom halves
> for protections reasons.
> Without locking in local_bh_disable() on PREEMPT_RT this data structure
> requires explicit locking to avoid corruption if preemption occurs.
>
> PREEMPT_RT has forced-threaded interrupts enabled and every
> NAPI-callback runs in a thread. If each thread has its own data
> structure then locking can be avoided and data corruption is also avoided.
>
> Create a struct bpf_xdp_storage which contains struct bpf_redirect_info.
> Define the variable on stack, use xdp_storage_set() to set a pointer to
> it in task_struct of the current task. Use the __free() annotation to
> automatically reset the pointer once function returns. Use a pointer which can
> be used by the __free() annotation to avoid invoking the callback the pointer
> is NULL. This helps the compiler to optimize the code.
> The xdp_storage_set() can nest. For instance local_bh_enable() in
> bpf_test_run_xdp_live() may run NET_RX_SOFTIRQ/ net_rx_action() which
> also uses xdp_storage_set(). Therefore only the first invocations
> updates the per-task pointer.
> Use xdp_storage_get_ri() as a wrapper to retrieve the current struct
> bpf_redirect_info.
>
> This is only done on PREEMPT_RT. The !PREEMPT_RT builds keep using the
> per-CPU variable instead. This should also work for !PREEMPT_RT but
> isn't needed.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>

[...]

> diff --git a/net/core/dev.c b/net/core/dev.c
> index de362d5f26559..c3f7d2a6b6134 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3988,11 +3988,15 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
>  		   struct net_device *orig_dev, bool *another)
>  {
>  	struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress);
> +	struct bpf_xdp_storage *xdp_store __free(xdp_storage_clear) = NULL;
>  	enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_INGRESS;
> +	struct bpf_xdp_storage __xdp_store;
>  	int sch_ret;
>  
>  	if (!entry)
>  		return skb;
> +
> +	xdp_store = xdp_storage_set(&__xdp_store);
>  	if (*pt_prev) {
>  		*ret = deliver_skb(skb, *pt_prev, orig_dev);
>  		*pt_prev = NULL;
> @@ -4044,12 +4048,16 @@ static __always_inline struct sk_buff *
>  sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
>  {
>  	struct bpf_mprog_entry *entry = rcu_dereference_bh(dev->tcx_egress);
> +	struct bpf_xdp_storage *xdp_store __free(xdp_storage_clear) = NULL;
>  	enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_EGRESS;
> +	struct bpf_xdp_storage __xdp_store;
>  	int sch_ret;
>  
>  	if (!entry)
>  		return skb;
>  
> +	xdp_store = xdp_storage_set(&__xdp_store);
> +
>  	/* qdisc_skb_cb(skb)->pkt_len & tcx_set_ingress() was
>  	 * already set by the caller.
>  	 */


These, and the LWT code, don't actually have anything to do with XDP,
which indicates that the 'xdp_storage' name misleading. Maybe
'bpf_net_context' or something along those lines? Or maybe we could just
move the flush lists into bpf_redirect_info itself and just keep that as
the top-level name?

-Toke


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ