netdev - Re: [ovs-dev] [PATCH net-next 11/18] openvswitch: Use nested-BH locking for ovs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250313115018.7xe77nJ-@linutronix.de>
Date: Thu, 13 Mar 2025 12:50:18 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Ilya Maximets <i.maximets@....org>
Cc: netdev@...r.kernel.org, linux-rt-devel@...ts.linux.dev,
	dev@...nvswitch.org, Eric Dumazet <edumazet@...gle.com>,
	Simon Horman <horms@...nel.org>, Jakub Kicinski <kuba@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Paolo Abeni <pabeni@...hat.com>,
	"David S. Miller" <davem@...emloft.net>
Subject: Re: [ovs-dev] [PATCH net-next 11/18] openvswitch: Use nested-BH
 locking for ovs_actions.

On 2025-03-10 17:56:09 [+0100], Ilya Maximets wrote:
> >>> +		local_lock_nested_bh(&ovs_actions.bh_lock);
> >>
> >> Wouldn't this cause a warning when we're in a syscall/process context?
> > 
> > My understanding is that is only invoked in softirq context. Did I
> > misunderstood it?
> 
> It can be called from the syscall/process context while processing
> OVS_PACKET_CMD_EXECUTE request.
> 
> > Otherwise that this_cpu_ptr() above should complain
> > that preemption is not disabled and if preemption is indeed not disabled
> > how do you ensure that you don't get preempted after the
> > __this_cpu_inc_return() in several tasks (at the same time) leading to
> > exceeding the OVS_RECURSION_LIMIT?
> 
> We disable BH in this case, so it should be safe (on non-RT).  See the
> ovs_packet_cmd_execute() for more details.

Yes, exactly. So if BH is disabled then local_lock_nested_bh() can
safely acquire a per-CPU spinlock on PREEMPT_RT here. This basically
mimics the local_bh_disable() behaviour in terms of exclusive data
structures on a smaller scope.

> >>> +		ovs_act->owner = current;
> >>> +	}
> >>> +
> >>>  	level = __this_cpu_inc_return(ovs_actions.exec_level);
> >>>  	if (unlikely(level > OVS_RECURSION_LIMIT)) {
> >>>  		net_crit_ratelimited("ovs: recursion limit reached on datapath %s, probable configuration error\n",
> >>> @@ -1710,5 +1718,10 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
> >>>  
> >>>  out:
> >>>  	__this_cpu_dec(ovs_actions.exec_level);
> >>> +
> >>> +	if (level == 1) {
> >>> +		ovs_act->owner = NULL;
> >>> +		local_unlock_nested_bh(&ovs_actions.bh_lock);
> >>> +	}
> >>
> >> Seems dangerous to lock every time the owner changes but unlock only
> >> once on level 1.  Even if this works fine, it seems unnecessarily
> >> complicated.  Maybe it's better to just lock once before calling
> >> ovs_execute_actions() instead?
> > 
> > My understanding is this can be invoked recursively. That means on first
> > invocation owner == NULL and then you acquire the lock at which point
> > exec_level goes 0->1. On the recursive invocation owner == current and
> > you skip the lock but exec_level goes 1 -> 2.
> > On your return path once level becomes 1, then it means that dec made it
> > go 1 -> 0, you unlock the lock.
> 
> My point is: why locking here with some extra non-obvious logic of owner
> tracking if we can lock (unconditionally?) in ovs_packet_cmd_execute() and
> ovs_dp_process_packet() instead?  We already disable BH in one of those
> and take appropriate RCU locks in both.  So, feels like a better place
> for the extra locking if necessary.  We will also not need to move around
> any code in actions.c if the code there is guaranteed to be safe by holding
> locks outside of it.

I think I was considering it but dropped it because it looks like one
can call the other.
ovs_packet_cmd_execute() is an unique entry to ovs_execute_actions().
This could the lock unconditionally.
Then we have ovs_dp_process_packet() as the second entry point towards
ovs_execute_actions() and is the tricky one. One originates from
netdev_frame_hook() which the "normal" packet receiving.
Then within ovs_execute_actions() there is ovs_vport_send() which could
use internal_dev_recv() for forwarding. This one throws the packet into
the networking stack so it could come back via netdev_frame_hook().
Then there is this internal forwarding via internal_dev_xmit() which
also ends up in ovs_execute_actions(). Here I don't know if this can
originate from within the recursion.

After looking at this and seeing the internal_dev_recv() I decided to
move it to within ovs_execute_actions() where the recursion check itself
is.

> > The locking part happens only on PREEMPT_RT because !PREEMPT_RT has
> > softirqs disabled which guarantee that there will be no preemption.
> > 
> > tools/testing/selftests/net/openvswitch should cover this?
> 
> It's not a comprehensive test suite, it covers some cases, but it
> doesn't test anything related to preemptions specifically.

>From looking at the traces, everything originates from
netdev_frame_hook() and there is sometimes one recursion from within
ovs_execute_actions(). I haven't seen anything else.

> >> Also, the name of the struct ovs_action doesn't make a lot of sense,
> >> I'd suggest to call it pcpu_storage or something like that instead.
> >> I.e. have a more generic name as the fields inside are not directly
> >> related to each other.
> > 
> > Understood. ovs_pcpu_storage maybe?
> 
> It's OK, I guess, but see also a point about locking inside datapath.c
> instead and probably not needing to change anything in actions.c.

If you say that adding a lock to ovs_dp_process_packet() and another to
ovs_packet_cmd_execute() then I can certainly update. However based on
what I wrote above, I am not sure.

> >> Best regards, Ilya Maximets.

Sebastian