linux-kernel - Re: [PATCH 5.9 RT] net: openvswitch: Fix using smp_processor

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <EB6A6F4F-4778-41D3-B314-4587DE7A7658@redhat.com>
Date:   Mon, 12 Oct 2020 15:36:14 +0200
From:   "Eelco Chaudron" <echaudro@...hat.com>
To:     "Sebastian Andrzej Siewior" <bigeasy@...utronix.de>
Cc:     "Juri Lelli" <juri.lelli@...hat.com>, tglx@...utronix.de,
        linux-rt-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        bristot@...hat.com, williams@...hat.com, atheurer@...hat.com
Subject: Re: [PATCH 5.9 RT] net: openvswitch: Fix using smp_processor_id() in
 preemptible code



On 12 Oct 2020, at 10:21, Sebastian Andrzej Siewior wrote:

> On 2020-10-12 10:14:42 [+0200], Eelco Chaudron wrote:
>>
>>
>> On 9 Oct 2020, at 17:41, Sebastian Andrzej Siewior wrote:
>>
>>> On 2020-10-09 14:47:59 [+0200], Juri Lelli wrote:
>>>> This happens because openvswitch/flow_table::flow_lookup() accesses
>>>> per-cpu data while being preemptible (and migratable).
>>>>
>>>> Fix it by adding get/put_cpu_light(), so that, even if preempted, 
>>>> the
>>>> task executing this code is not migrated (operation is also guarded
>>>> by
>>>> ovs_mutex mutex).
>>>
>>> This warning is not limited to PREEMPT_RT it also present upstream 
>>> since
>>> commit
>>>    eac87c413bf97 ("net: openvswitch: reorder masks array based on
>>> usage")
>>>
>>> You should be able to reproduce it there, too.
>>> The path ovs_flow_tbl_lookup() -> flow_lookup() is guarded by 
>>> ovs_lock()
>>> I can't say that this true for
>>>    ovs_vport_receive() -> ovs_dp_process_packet() ->
>>>    ovs_flow_tbl_lookup_stats() -> flow_lookup()
>>>
>>> (means I don't know but it looks like coming from NAPI).
>>>
>>> Which means u64_stats_update_begin() could have two writers. This 
>>> must
>>> not happen.
>>> There are two reader which do u64_stats_fetch_begin_irq(). Disabling
>>> interrupts makes no sense since they perform cross-CPU access.
>>>
>>> -> You need to ensure that there is only one writer at a time.
>>>
>>> If mask_array gains a spinlock_t for writer protection then you can
>>> acquire the lock prio grabbing ->masks_usage_cntr. But as of now 
>>> there
>>> is one `ma->syncp'.
>>
>> I’m not too familiar with the RT kernel, but in the none RT kernel, 
>> this
>> function is called in run to completion parts only, hence does not 
>> need a
>> lock. Actually, this was designed in such a way that it does not need 
>> a lock
>> at all.
>
> _no_ As explained above, this is not RT specific.
> What guaranties that you don't have two flow_lookup() invocations on 
> the
> same CPU? You are using u64_stats_update_begin() which must not be
> preempted. This means even if preemption is disabled you must not have
> another invocation in BH context. This is due to the
> write_seqcount_begin() in u64_stats_update_begin().
> If preemption / CPU migration is not a problem in the above part, you
> can use annotation to disable the warning that led to the warning. But
> the u64_stats invocation looks still problematic.
>
>> So maybe this needs a get_cpu() instead of the light variant in the 
>> RT case?

Hi Sebastian,

I was not reading the splat correctly and thought it was from the NAPI 
path but it looks like it's from the netlink part. I think this could be 
fixed with the following patch, so both paths, NAPI, and netlink become 
non-preemptive:

--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -850,9 +850,14 @@ struct sw_flow *ovs_flow_tbl_lookup(struct 
flow_table *tbl,
         struct mask_array *ma = rcu_dereference_ovsl(tbl->mask_array);
         u32 __always_unused n_mask_hit;
         u32 __always_unused n_cache_hit;
+       struct sw_flow *flow;
         u32 index = 0;

-       return flow_lookup(tbl, ti, ma, key, &n_mask_hit, &n_cache_hit, 
&index);
+       preempt_disable();
+       flow = flow_lookup(tbl, ti, ma, key, &n_mask_hit, &n_cache_hit, 
&index);
+       preempt_enable();
+
+       return flow;
  }

  struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,

Note that choosing to do this in ovs_flow_tbl_lookup() and not 
flow_lookup() allows the fast-path, through NAPI, not having to call 
preempt_disable().

Any comments? If not I can sent a proper patch trough netdev.

//Eelco