netdev - Re: [ovs-dev] [PATCH net-next 11/18] openvswitch: Use nested-BH locking for ovs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f7tzfhpxboe.fsf@redhat.com>
Date: Thu, 13 Mar 2025 10:11:45 -0400
From: Aaron Conole <aconole@...hat.com>
To: Ilya Maximets <i.maximets@....org>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
  netdev@...r.kernel.org,  linux-rt-devel@...ts.linux.dev,
  dev@...nvswitch.org,  Eric Dumazet <edumazet@...gle.com>,  Simon Horman
 <horms@...nel.org>,  Jakub Kicinski <kuba@...nel.org>,  Thomas Gleixner
 <tglx@...utronix.de>,  Paolo Abeni <pabeni@...hat.com>,  "David S. Miller"
 <davem@...emloft.net>,  Eelco Chaudron <echaudro@...hat.com>
Subject: Re: [ovs-dev] [PATCH net-next 11/18] openvswitch: Use nested-BH
 locking for ovs_actions.

Ilya Maximets <i.maximets@....org> writes:

> On 3/13/25 12:50, Sebastian Andrzej Siewior wrote:
>> On 2025-03-10 17:56:09 [+0100], Ilya Maximets wrote:
>>>>>> +		local_lock_nested_bh(&ovs_actions.bh_lock);
>>>>>
>>>>> Wouldn't this cause a warning when we're in a syscall/process context?
>>>>
>>>> My understanding is that is only invoked in softirq context. Did I
>>>> misunderstood it?
>>>
>>> It can be called from the syscall/process context while processing
>>> OVS_PACKET_CMD_EXECUTE request.
>>>
>>>> Otherwise that this_cpu_ptr() above should complain
>>>> that preemption is not disabled and if preemption is indeed not disabled
>>>> how do you ensure that you don't get preempted after the
>>>> __this_cpu_inc_return() in several tasks (at the same time) leading to
>>>> exceeding the OVS_RECURSION_LIMIT?
>>>
>>> We disable BH in this case, so it should be safe (on non-RT).  See the
>>> ovs_packet_cmd_execute() for more details.
>> 
>> Yes, exactly. So if BH is disabled then local_lock_nested_bh() can
>> safely acquire a per-CPU spinlock on PREEMPT_RT here. This basically
>> mimics the local_bh_disable() behaviour in terms of exclusive data
>> structures on a smaller scope.
>
> OK.  I missed that is_softirq() returns true when BH is disabled manually.
>
>> 
>>>>>> +		ovs_act->owner = current;
>>>>>> +	}
>>>>>> +
>>>>>>  	level = __this_cpu_inc_return(ovs_actions.exec_level);
>>>>>>  	if (unlikely(level > OVS_RECURSION_LIMIT)) {
>>>>>>  		net_crit_ratelimited("ovs: recursion limit reached on datapath %s, probable configuration error\n",
>>>>>> @@ -1710,5 +1718,10 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>>>>>>  
>>>>>>  out:
>>>>>>  	__this_cpu_dec(ovs_actions.exec_level);
>>>>>> +
>>>>>> +	if (level == 1) {
>>>>>> +		ovs_act->owner = NULL;
>>>>>> +		local_unlock_nested_bh(&ovs_actions.bh_lock);
>>>>>> +	}
>>>>>
>>>>> Seems dangerous to lock every time the owner changes but unlock only
>>>>> once on level 1.  Even if this works fine, it seems unnecessarily
>>>>> complicated.  Maybe it's better to just lock once before calling
>>>>> ovs_execute_actions() instead?
>>>>
>>>> My understanding is this can be invoked recursively. That means on first
>>>> invocation owner == NULL and then you acquire the lock at which point
>>>> exec_level goes 0->1. On the recursive invocation owner == current and
>>>> you skip the lock but exec_level goes 1 -> 2.
>>>> On your return path once level becomes 1, then it means that dec made it
>>>> go 1 -> 0, you unlock the lock.
>>>
>>> My point is: why locking here with some extra non-obvious logic of owner
>>> tracking if we can lock (unconditionally?) in ovs_packet_cmd_execute() and
>>> ovs_dp_process_packet() instead?  We already disable BH in one of those
>>> and take appropriate RCU locks in both.  So, feels like a better place
>>> for the extra locking if necessary.  We will also not need to move around
>>> any code in actions.c if the code there is guaranteed to be safe by holding
>>> locks outside of it.
>> 
>> I think I was considering it but dropped it because it looks like one
>> can call the other.
>> ovs_packet_cmd_execute() is an unique entry to ovs_execute_actions().
>> This could the lock unconditionally.
>> Then we have ovs_dp_process_packet() as the second entry point towards
>> ovs_execute_actions() and is the tricky one. One originates from
>> netdev_frame_hook() which the "normal" packet receiving.
>> Then within ovs_execute_actions() there is ovs_vport_send() which could
>> use internal_dev_recv() for forwarding. This one throws the packet into
>> the networking stack so it could come back via netdev_frame_hook().
>> Then there is this internal forwarding via internal_dev_xmit() which
>> also ends up in ovs_execute_actions(). Here I don't know if this can
>> originate from within the recursion.

Just a note that it still needs to go through `ovs_dp_process_packet()`
before it will enter the `ovs_execute_actions()` call by way of the
`ovs_vport_receive()` call.  So keeping the locking in datapath.c should
not be a complex task.

> It's true that ovs_packet_cmd_execute() can not be re-intered, while
> ovs_dp_process_packet() can be re-entered if the packet leaves OVS and
> then comes back from another port.  It's still better to handle all the
> locking within datapath.c and not lock for RT in actions.c and for non-RT
> in datapath.c.

+1

>> 
>> After looking at this and seeing the internal_dev_recv() I decided to
>> move it to within ovs_execute_actions() where the recursion check itself
>> is.
>> 
>>>> The locking part happens only on PREEMPT_RT because !PREEMPT_RT has
>>>> softirqs disabled which guarantee that there will be no preemption.
>>>>
>>>> tools/testing/selftests/net/openvswitch should cover this?
>>>
>>> It's not a comprehensive test suite, it covers some cases, but it
>>> doesn't test anything related to preemptions specifically.

Yes, this would be good to enhance, and the plan is to improve it as we
can.  If during the course of this work you identify a nice test case,
please do feel empowered to add it.

>> From looking at the traces, everything originates from
>> netdev_frame_hook() and there is sometimes one recursion from within
>> ovs_execute_actions(). I haven't seen anything else.
>> 
>>>>> Also, the name of the struct ovs_action doesn't make a lot of sense,
>>>>> I'd suggest to call it pcpu_storage or something like that instead.
>>>>> I.e. have a more generic name as the fields inside are not directly
>>>>> related to each other.
>>>>
>>>> Understood. ovs_pcpu_storage maybe?
>>>
>>> It's OK, I guess, but see also a point about locking inside datapath.c
>>> instead and probably not needing to change anything in actions.c.
>> 
>> If you say that adding a lock to ovs_dp_process_packet() and another to
>> ovs_packet_cmd_execute() then I can certainly update. However based on
>> what I wrote above, I am not sure.
>
> I think, it's better if we keep all the locks in datapath.c and let
> actions.c assume that all the operations are always safe as it was
> originally intended.

Agreed - reading through actions code can be complex enough without need
to remember to do recursions checks there.

> Cc: Aaron and Eelco, in case they have some thoughts on this as well.

Thanks Ilya - I think you covered the major points.

> Best regards, Ilya Maximets.