[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1a57a633-d3c7-44b6-b704-044038d6b3e2@huaweicloud.com>
Date: Tue, 12 Nov 2024 20:22:12 +0800
From: Xu Kuohai <xukuohai@...weicloud.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, Eduard Zingerman <eddyz87@...il.com>,
Yonghong Song <yonghong.song@...ux.dev>, Kui-Feng Lee <thinker.li@...il.com>
Subject: Re: [PATCH bpf-next 2/2] selftests/bpf: Add test for struct_ops map
release
On 11/12/2024 5:30 AM, Martin KaFai Lau wrote:
> On 11/9/24 12:40 AM, Xu Kuohai wrote:
>> On 11/9/2024 3:39 AM, Martin KaFai Lau wrote:
>>> On 11/8/24 12:26 AM, Xu Kuohai wrote:
>>>> -static void bpf_testmod_test_2(int a, int b)
>>>> +static void bpf_dummy_unreg(void *kdata, struct bpf_link *link)
>>>> {
>>>> + WRITE_ONCE(__bpf_dummy_ops, &__bpf_testmod_ops);
>>>> }
>>>
>>> [ ... ]
>>>
>>>> +static int run_struct_ops(const char *val, const struct kernel_param *kp)
>>>> +{
>>>> + int ret;
>>>> + unsigned int repeat;
>>>> + struct bpf_testmod_ops *ops;
>>>> +
>>>> + ret = kstrtouint(val, 10, &repeat);
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + if (repeat > 10000)
>>>> + return -ERANGE;
>>>> +
>>>> + while (repeat-- > 0) {
>>>> + ops = READ_ONCE(__bpf_dummy_ops);
>>>
>>> I don't think it is the usual bpf_struct_ops implementation which only uses READ_ONCE and WRITE_ONCE to protect the registered ops. tcp-cc uses a refcnt+rcu. It seems hid uses synchronize_srcu(). sched_ext seems to also use kthread_flush_work() to wait
>>> for all ops calling finished. Meaning I don't think the current bpf_struct_ops unreg implementation will run into this issue for sleepable ops.
>>>
>>
>> Thanks for the explanation.
>>
>> Are you saying that it's not the struct_ops framework's
>> responsibility to ensure the struct_ops map is not
>> released while it may be still in use? And the "bug" in
>> this series should be "fixed" in the test, namely this
>> patch?
>
> Yeah, it is what I was trying to say. I don't think there is thing to fix. Think about extending a subsystem by a kernel module. The subsystem will also do the needed protection itself during the unreg process. There is already a bpf_try_module_get() to
> help the subsystem.
>
Got it
>>> The current synchronize_rcu_mult(call_rcu, call_rcu_tasks) is only needed for the tcp-cc because a tcp-cc's ops (which uses refcnt+rcu) can decrement its own refcnt. Looking back, this was a mistake (mine). A new tcp-cc ops should have been introduced
>>> instead to return a new tcp-cc-ops to be used.
>>
>> Not quite clear, but from the description, it seems that
>> the synchronize_rcu_mult(call_rcu, call_rcu_tasks) could
>
> This synchronize_rcu_mult is only need for the tcp_congestion_ops (bpf_tcp_ca.c). May be it is cleaner to just make a special case for "tcp_congestion_ops" in st_ops->name in map_alloc and only set free_after_mult_rcu_gp to TRUE for this one case, then it
> won't slow down other struct_ops map freeing also.
>
OK, will git it a try, thanks.
> imo, the test in this patch is not needed in its current form also since it is not how the kernel subsystem implements unreg in struct_ops.
>
>> be just removed in some way, no need to do a cleanup to
>> switch it to call_rcu.
>>
>>>
>>>> + if (ops->test_1)
>>>> + ops->test_1();
>>>> + if (ops->test_2)
>>>> + ops->test_2(0, 0);
>>>> + }
>>>> +
>>>> + return 0;
>>>> +}
>>
Powered by blists - more mailing lists