netdev - Re: [PATCH bpf-next 2/2] selftests/bpf: Add test for struct

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1a57a633-d3c7-44b6-b704-044038d6b3e2@huaweicloud.com>
Date: Tue, 12 Nov 2024 20:22:12 +0800
From: Xu Kuohai <xukuohai@...weicloud.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org,
 Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
 Andrii Nakryiko <andrii@...nel.org>, Eduard Zingerman <eddyz87@...il.com>,
 Yonghong Song <yonghong.song@...ux.dev>, Kui-Feng Lee <thinker.li@...il.com>
Subject: Re: [PATCH bpf-next 2/2] selftests/bpf: Add test for struct_ops map
 release

On 11/12/2024 5:30 AM, Martin KaFai Lau wrote:
> On 11/9/24 12:40 AM, Xu Kuohai wrote:
>> On 11/9/2024 3:39 AM, Martin KaFai Lau wrote:
>>> On 11/8/24 12:26 AM, Xu Kuohai wrote:
>>>> -static void bpf_testmod_test_2(int a, int b)
>>>> +static void bpf_dummy_unreg(void *kdata, struct bpf_link *link)
>>>>   {
>>>> +    WRITE_ONCE(__bpf_dummy_ops, &__bpf_testmod_ops);
>>>>   }
>>>
>>> [ ... ]
>>>
>>>> +static int run_struct_ops(const char *val, const struct kernel_param *kp)
>>>> +{
>>>> +    int ret;
>>>> +    unsigned int repeat;
>>>> +    struct bpf_testmod_ops *ops;
>>>> +
>>>> +    ret = kstrtouint(val, 10, &repeat);
>>>> +    if (ret)
>>>> +        return ret;
>>>> +
>>>> +    if (repeat > 10000)
>>>> +        return -ERANGE;
>>>> +
>>>> +    while (repeat-- > 0) {
>>>> +        ops = READ_ONCE(__bpf_dummy_ops);
>>>
>>> I don't think it is the usual bpf_struct_ops implementation which only uses READ_ONCE and WRITE_ONCE to protect the registered ops. tcp-cc uses a refcnt+rcu. It seems hid uses synchronize_srcu(). sched_ext seems to also use kthread_flush_work() to wait 
>>> for all ops calling finished. Meaning I don't think the current bpf_struct_ops unreg implementation will run into this issue for sleepable ops.
>>>
>>
>> Thanks for the explanation.
>>
>> Are you saying that it's not the struct_ops framework's
>> responsibility to ensure the struct_ops map is not
>> released while it may be still in use? And the "bug" in
>> this series should be "fixed" in the test, namely this
>> patch?
> 
> Yeah, it is what I was trying to say. I don't think there is thing to fix. Think about extending a subsystem by a kernel module. The subsystem will also do the needed protection itself during the unreg process. There is already a bpf_try_module_get() to 
> help the subsystem.
>

Got it

>>> The current synchronize_rcu_mult(call_rcu, call_rcu_tasks) is only needed for the tcp-cc because a tcp-cc's ops (which uses refcnt+rcu) can decrement its own refcnt. Looking back, this was a mistake (mine). A new tcp-cc ops should have been introduced 
>>> instead to return a new tcp-cc-ops to be used.
>>
>> Not quite clear, but from the description, it seems that
>> the synchronize_rcu_mult(call_rcu, call_rcu_tasks) could
> 
> This synchronize_rcu_mult is only need for the tcp_congestion_ops (bpf_tcp_ca.c). May be it is cleaner to just make a special case for "tcp_congestion_ops" in st_ops->name in map_alloc and only set free_after_mult_rcu_gp to TRUE for this one case, then it 
> won't slow down other struct_ops map freeing also.
>

OK, will git it a try, thanks.

> imo, the test in this patch is not needed in its current form also since it is not how the kernel subsystem implements unreg in struct_ops.
> 
>> be just removed in some way, no need to do a cleanup to
>> switch it to call_rcu.
>>
>>>
>>>> +        if (ops->test_1)
>>>> +            ops->test_1();
>>>> +        if (ops->test_2)
>>>> +            ops->test_2(0, 0);
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>