linux-kernel - Re: [PATCH net-next 12/24] seg6: Use nested-BH locking for seg6_bpf_srh

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8d155ec7d43bf3308fcfa3387dc16d1723617c6.camel@redhat.com>
Date: Mon, 18 Dec 2023 09:33:39 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, 
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Cc: "David S. Miller" <davem@...emloft.net>, Boqun Feng
 <boqun.feng@...il.com>,  Daniel Borkmann <daniel@...earbox.net>, Eric
 Dumazet <edumazet@...gle.com>, Frederic Weisbecker <frederic@...nel.org>,
 Ingo Molnar <mingo@...hat.com>, Jakub Kicinski <kuba@...nel.org>, Peter
 Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>,
 Waiman Long <longman@...hat.com>, Will Deacon <will@...nel.org>, Alexei
 Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>, David
 Ahern <dsahern@...nel.org>, Hao Luo <haoluo@...gle.com>,  Jiri Olsa
 <jolsa@...nel.org>, John Fastabend <john.fastabend@...il.com>, KP Singh
 <kpsingh@...nel.org>,  Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu
 <song@...nel.org>, Stanislav Fomichev <sdf@...gle.com>, Yonghong Song
 <yonghong.song@...ux.dev>, bpf@...r.kernel.org
Subject: Re: [PATCH net-next 12/24] seg6: Use nested-BH locking for
 seg6_bpf_srh_states.

On Fri, 2023-12-15 at 18:07 +0100, Sebastian Andrzej Siewior wrote:
> The access to seg6_bpf_srh_states is protected by disabling preemption.
> Based on the code, the entry point is input_action_end_bpf() and
> every other function (the bpf helper functions bpf_lwt_seg6_*()), that
> is accessing seg6_bpf_srh_states, should be called from within
> input_action_end_bpf().
> 
> input_action_end_bpf() accesses seg6_bpf_srh_states first at the top of
> the function and then disables preemption. This looks wrong because if
> preemption needs to be disabled as part of the locking mechanism then
> the variable shouldn't be accessed beforehand.
> 
> Looking at how it is used via test_lwt_seg6local.sh then
> input_action_end_bpf() is always invoked from softirq context. If this
> is always the case then the preempt_disable() statement is superfluous.
> If this is not always invoked from softirq then disabling only
> preemption is not sufficient.
> 
> Replace the preempt_disable() statement with nested-BH locking. This is
> not an equivalent replacement as it assumes that the invocation of
> input_action_end_bpf() always occurs in softirq context and thus the
> preempt_disable() is superfluous.
> Add a local_lock_t the data structure and use local_lock_nested_bh() in
> guard notation for locking. Add lockdep_assert_held() to ensure the lock
> is held while the per-CPU variable is referenced in the helper functions.
> 
> Cc: Alexei Starovoitov <ast@...nel.org>
> Cc: Andrii Nakryiko <andrii@...nel.org>
> Cc: David Ahern <dsahern@...nel.org>
> Cc: Hao Luo <haoluo@...gle.com>
> Cc: Jiri Olsa <jolsa@...nel.org>
> Cc: John Fastabend <john.fastabend@...il.com>
> Cc: KP Singh <kpsingh@...nel.org>
> Cc: Martin KaFai Lau <martin.lau@...ux.dev>
> Cc: Song Liu <song@...nel.org>
> Cc: Stanislav Fomichev <sdf@...gle.com>
> Cc: Yonghong Song <yonghong.song@...ux.dev>
> Cc: bpf@...r.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> ---
>  include/net/seg6_local.h |  1 +
>  net/core/filter.c        |  3 ++
>  net/ipv6/seg6_local.c    | 59 ++++++++++++++++++++++------------------
>  3 files changed, 36 insertions(+), 27 deletions(-)
> 
> diff --git a/include/net/seg6_local.h b/include/net/seg6_local.h
> index 3fab9dec2ec45..0f22771359f4c 100644
> --- a/include/net/seg6_local.h
> +++ b/include/net/seg6_local.h
> @@ -20,6 +20,7 @@ extern bool seg6_bpf_has_valid_srh(struct sk_buff *skb);
>  
>  struct seg6_bpf_srh_state {
>  	struct ipv6_sr_hdr *srh;
> +	local_lock_t bh_lock;
>  	u16 hdrlen;
>  	bool valid;
>  };
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 1737884be52f8..c8013f762524b 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -6384,6 +6384,7 @@ BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff *, skb, u32, offset,
>  	void *srh_tlvs, *srh_end, *ptr;
>  	int srhoff = 0;
>  
> +	lockdep_assert_held(&srh_state->bh_lock);
>  	if (srh == NULL)
>  		return -EINVAL;
>  
> @@ -6440,6 +6441,7 @@ BPF_CALL_4(bpf_lwt_seg6_action, struct sk_buff *, skb,
>  	int hdroff = 0;
>  	int err;
>  
> +	lockdep_assert_held(&srh_state->bh_lock);
>  	switch (action) {
>  	case SEG6_LOCAL_ACTION_END_X:
>  		if (!seg6_bpf_has_valid_srh(skb))
> @@ -6516,6 +6518,7 @@ BPF_CALL_3(bpf_lwt_seg6_adjust_srh, struct sk_buff *, skb, u32, offset,
>  	int srhoff = 0;
>  	int ret;
>  
> +	lockdep_assert_held(&srh_state->bh_lock);
>  	if (unlikely(srh == NULL))
>  		return -EINVAL;
>  
> diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
> index 24e2b4b494cb0..ed7278af321a2 100644
> --- a/net/ipv6/seg6_local.c
> +++ b/net/ipv6/seg6_local.c
> @@ -1380,7 +1380,9 @@ static int input_action_end_b6_encap(struct sk_buff *skb,
>  	return err;
>  }
>  
> -DEFINE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states);
> +DEFINE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states) = {
> +	.bh_lock	= INIT_LOCAL_LOCK(bh_lock),
> +};
>  
>  bool seg6_bpf_has_valid_srh(struct sk_buff *skb)
>  {
> @@ -1388,6 +1390,7 @@ bool seg6_bpf_has_valid_srh(struct sk_buff *skb)
>  		this_cpu_ptr(&seg6_bpf_srh_states);
>  	struct ipv6_sr_hdr *srh = srh_state->srh;
>  
> +	lockdep_assert_held(&srh_state->bh_lock);
>  	if (unlikely(srh == NULL))
>  		return false;
>  
> @@ -1408,8 +1411,7 @@ bool seg6_bpf_has_valid_srh(struct sk_buff *skb)
>  static int input_action_end_bpf(struct sk_buff *skb,
>  				struct seg6_local_lwt *slwt)
>  {
> -	struct seg6_bpf_srh_state *srh_state =
> -		this_cpu_ptr(&seg6_bpf_srh_states);
> +	struct seg6_bpf_srh_state *srh_state;
>  	struct ipv6_sr_hdr *srh;
>  	int ret;
>  
> @@ -1420,41 +1422,44 @@ static int input_action_end_bpf(struct sk_buff *skb,
>  	}
>  	advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
>  
> -	/* preempt_disable is needed to protect the per-CPU buffer srh_state,
> -	 * which is also accessed by the bpf_lwt_seg6_* helpers
> +	/* The access to the per-CPU buffer srh_state is protected by running
> +	 * always in softirq context (with disabled BH). On PREEMPT_RT the
> +	 * required locking is provided by the following local_lock_nested_bh()
> +	 * statement. It is also accessed by the bpf_lwt_seg6_* helpers via
> +	 * bpf_prog_run_save_cb().
>  	 */
> -	preempt_disable();
> -	srh_state->srh = srh;
> -	srh_state->hdrlen = srh->hdrlen << 3;
> -	srh_state->valid = true;
> +	scoped_guard(local_lock_nested_bh, &seg6_bpf_srh_states.bh_lock) {
> +		srh_state = this_cpu_ptr(&seg6_bpf_srh_states);
> +		srh_state->srh = srh;
> +		srh_state->hdrlen = srh->hdrlen << 3;
> +		srh_state->valid = true;

Here the 'scoped_guard' usage adds a lot of noise to the patch, due to
the added indentation. What about using directly
local_lock_nested_bh()/local_unlock_nested_bh() ?

Cheers,

Paolo