[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <160faa16-322b-cd65-9c12-a3cfe0f02e11@huawei.com>
Date: Fri, 16 Apr 2021 11:56:11 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: "zhudi (J)" <zhudi21@...wei.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"kuba@...nel.org" <kuba@...nel.org>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"Chenxiang (EulerOS)" <rose.chen@...wei.com>
Subject: Re: [PATCH] net: fix a data race when get vlan device
On 2021/4/16 11:27, zhudi (J) wrote:
>> dependencyOn 2021/4/15 11:35, zhudi wrote:
>>> From: Di Zhu <zhudi21@...wei.com>
>>>
>>> We encountered a crash: in the packet receiving process, we got an
>>> illegal VLAN device address, but the VLAN device address saved in
>>> vmcore is correct. After checking the code, we found a possible data
>>> competition:
>>> CPU 0: CPU 1:
>>> (RCU read lock) (RTNL lock)
>>> vlan_do_receive() register_vlan_dev()
>>> vlan_find_dev()
>>>
>>> ->__vlan_group_get_device() ->vlan_group_prealloc_vid()
>>>
>>> In vlan_group_prealloc_vid(), We need to make sure that kzalloc is
>>> executed before assigning a value to vlan devices array, otherwise we
>>
>> As my understanding, there is a dependency between calling kzalloc() and
>> assigning the address(returned from kzalloc()) to vg->vlan_devices_arrays,
>> CPU and compiler can see the dependency, why can't it handling the
>> dependency before adding the smp_wmb()?
>>
>> See CONTROL DEPENDENCIES section in Documentation/memory-
>> barriers.txt:
>>
>> However, stores are not speculated. This means that ordering -is- provided
>> for load-store control dependencies, as in the following example:
>>
>> q = READ_ONCE(a);
>> if (q) {
>> WRITE_ONCE(b, 1);
>> }
>>
>
> Maybe I didn't make it clear. This memory isolation is to ensure the order of
> memset(object, 0, size) in kzalloc() operations and the subsequent array assignment statements.
>
> kzalloc()
> ->memset(object, 0, size)
>
> smp_wmb()
>
> vg->vlan_devices_arrays[pidx][vidx] = array;
>
> Because __vlan_group_get_device() function depends on this order
Thanks for clarify, it would be good to mention this in the
commit log too.
Also, __vlan_group_get_device() is used in the data path, it would
be to avoid the barrier op too. Maybe using rcu to avoid the barrier
if the __vlan_group_get_device() is already protected by rcu_lock.
>
>>
>>
>>> may get a wrong address from the hardware cache on another cpu.
>>>
>>> So fix it by adding memory barrier instruction to ensure the order of
>>> memory operations.
>>>
>>> Signed-off-by: Di Zhu <zhudi21@...wei.com>
>>> ---
>>> net/8021q/vlan.c | 2 ++
>>> net/8021q/vlan.h | 3 +++
>>> 2 files changed, 5 insertions(+)
>>>
>>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index
>>> 8b644113715e..4f541e05cd3f 100644
>>> --- a/net/8021q/vlan.c
>>> +++ b/net/8021q/vlan.c
>>> @@ -71,6 +71,8 @@ static int vlan_group_prealloc_vid(struct vlan_group
>> *vg,
>>> if (array == NULL)
>>> return -ENOBUFS;
>>>
>>> + smp_wmb();
>>> +
>>> vg->vlan_devices_arrays[pidx][vidx] = array;
>>> return 0;
>>> }
>>> diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h index
>>> 953405362795..7408fda084d3 100644
>>> --- a/net/8021q/vlan.h
>>> +++ b/net/8021q/vlan.h
>>> @@ -57,6 +57,9 @@ static inline struct net_device
>>> *__vlan_group_get_device(struct vlan_group *vg,
>>>
>>> array = vg->vlan_devices_arrays[pidx]
>>> [vlan_id /
>> VLAN_GROUP_ARRAY_PART_LEN];
>>> +
>>> + smp_rmb();
>>> +
>>> return array ? array[vlan_id % VLAN_GROUP_ARRAY_PART_LEN] :
>> NULL; }
>>>
>>>
>
Powered by blists - more mailing lists